I've written an interpreter for a simple assembly-like language and it's performing slower than I would like.
It's split into 3 files: the Parser that converts the source to a vector of ints, the VM that actually runs the bytecode, and Tests that has a bubble sort written in the language.
It sorts 100 numbers in about 6 seconds in GHCi. The profiler doesn't tell me much except that the most time is spent inside the step function as it's expected.
The Parser file isn't that needed because it's only run once so it doesn't affect performance.
Another thing to note is that it takes around 250 000 ticks (instructions executed) to do it so I'm pretty sure it could be much faster than 6 seconds.
Is there anything obvious that I could improve?
Parser
module Parser where import Data.Vector (Vector, fromList) import Data.Char (toUpper) import Data.List (sort) type ByteCode = [Int] data OpCode = Push | Pop | Add | Sub | Mult | Div | Store | Load | Jmp | Cmp | Not | Br | Dup | Inc | Dec | Swp deriving (Enum, Read, Show, Ord, Eq) arity :: Vector Int arity = (fromList . map snd . sort) $ zip [Push, Store, Load] [1, 1..] ++ zip [Pop, Add, Sub, Mult, Div] [0, 0..] charIsNumeric :: Char -> Bool charIsNumeric c = '0' <= c && '9' >= c stringIsNumeric :: String -> Bool stringIsNumeric ('-' : s) = all charIsNumeric s stringIsNumeric s = all charIsNumeric s capitalize :: String -> String capitalize [] = [] capitalize (x : xs) = toUpper x : xs wordToByteCode :: String -> Int wordToByteCode str = if stringIsNumeric str then read str else fromEnum opCodeEnum where opCodeEnum :: OpCode opCodeEnum = read $ capitalize str stringToByteCode :: String -> ByteCode stringToByteCode = map wordToByteCode . words sourceToByteCode :: String -> ByteCode sourceToByteCode = map wordToByteCode . concatMap words . lines
VM
module VM where import Parser (ByteCode, OpCode(..), arity) import qualified Data.IntMap as IM import Data.Vector (Vector, (!)) import qualified Data.Vector as Vector import Data.List (intercalate) import Utility data VM = VM { byteCode :: Vector Int, programCounter :: Int, stack :: [Int], memory :: IM.IntMap Int } deriving (Show) fromCode :: ByteCode -> VM fromCode code = VM { byteCode = Vector.fromList code, programCounter = 0, stack = [], memory = IM.empty } step :: VM -> VM step vm = next where bc = byteCode vm pc = programCounter vm st = stack vm mm = memory vm inst = toEnum $ bc ! pc pop1 = tail st pop2 = tail pop1 top1 = head st top2 = head pop1 nextPc = pc + 1 next = case inst of Pop -> vm { stack = pop1, programCounter = nextPc } Push -> vm { stack = bc ! nextPc : st, programCounter = pc + 2 } Add -> vm { stack = (top1 + top2) : pop2, programCounter = nextPc } Sub -> vm { stack = (top2 - top1) : pop2, programCounter = nextPc } Mult -> vm { stack = (top1 * top2) : pop2, programCounter = nextPc } Div -> vm { stack = (top2 `div` top1) : pop2, programCounter = nextPc } Store -> vm { stack = pop2, programCounter = nextPc, memory = IM.insert top1 top2 mm } Load -> vm { stack = mm IM.! top1 : pop1, programCounter = nextPc } Jmp -> vm { stack = pop1, programCounter = top1 } Cmp -> vm { stack = signum (top2 - top1) : pop2, programCounter = nextPc } Not -> vm { stack = (if top1 > 0 then -1 else 1) : pop1, programCounter = nextPc } Br -> vm { stack = pop2, programCounter = if top2 > 0 then top1 else nextPc } Dup -> vm { stack = top1 : st, programCounter = nextPc } Inc -> vm { stack = (top1 + 1) : pop1, programCounter = nextPc } Dec -> vm { stack = (top1 - 1) : pop1, programCounter = nextPc } Swp -> vm { stack = top2 : top1 : pop2, programCounter = nextPc } endState :: VM -> Bool endState vm = programCounter vm == Vector.length (byteCode vm) run :: VM -> VM run = until endState step runCount :: VM -> (Int, VM) runCount = untilCount endState step debug :: (VM -> String) -> VM -> (VM, [String]) debug watch vm = if endState vm then (vm, []) else (nextVm, watch vm : logs) where (nextVm, logs) = debug watch (step vm) instructionLogger :: VM -> String instructionLogger vm = show (toEnum $ byteCode vm ! programCounter vm :: OpCode) watch :: Int -> VM -> String watch n vm = case IM.lookup n (memory vm) of Nothing -> "undefined" Just a -> show a composeLoggers :: [VM -> String] -> VM -> String composeLoggers loggers vm = (intercalate " " . map ($ vm)) loggers printDebug :: (VM -> String) -> VM -> IO () printDebug f v = putStr $ unlines $ snd $ debug f v
Tests
module Tests where import qualified Parser as Parser import qualified VM as VM import qualified Data.IntMap as IM bubble = unlines [ "push 0", "push 1000", "store", "push 0", "push 1001", "store", "push 1000", "load", "load", "push 1001", "load", "load", "cmp", "push 38", "br", "push 1000", "load", "load", "push 1001", "load", "load", "push 1000", "load", "store", "push 1001", "load", "store", "push 1001", "load", "inc", "dup", "push 1001", "store", "push 100", "cmp", "not", "push 10", "br", "push 0", "push 1001", "store", "push 1000", "load", "inc", "dup", "push 1000", "store", "push 100", "cmp", "not", "push 10", "br" ] vm = VM.fromCode $ Parser.sourceToByteCode bubble vmWithData = vm { VM.memory = IM.fromList $ zip [0..100] [100, 99..0] } main = print $ VM.run $ vmWithData dbg = VM.printDebug (VM.composeLoggers [VM.instructionLogger, VM.watch 101, show . VM.programCounter]) vmWithData
Utility
module is absent.\$\endgroup\$Utility
was only used foruntilCount
, which I reimplemented. Compiled with -O2, this runs basically instantly.\$\endgroup\$