I want present to you my little Virtual Machine. It nothing really sophisticated. It is an stack only virtual machine with the exception for some 'global' variables which i would count as registers but besides that all calculation done by my VM is done on stack.
What i planning now is to show the most interesting parts of the project How i implement this VM.
Reason why I do not post the complete code directly is you can find it on GitHub where you can clone it and test out locally. I recommend if you really interested download the project, execute and take a look the code and THEN comeback and read the post.
It makes it probably easier to understand. It's not that i do have a lot of files but to keep the code readable I think that this is the best choice. Let me know if you think otherwise!
Also Unfortunately i am not good writer, not just only because English is my second language... I try my best to correct as much orthographic and grammatical mistakes as possible.
Lets go through my code.
ByteCode.h
#include <memory> #include <functional> namespace vm { class VM; enum ByteCodes : unsigned long long { NOP=0, IADD, ISUB, IMUL, ILT, //Integer Less than IEQ, //Integer EQUAL IPUSH, //Integer FADD, FSUB, FMUL, FLT, //Float Less than FEQ, //Float EQUAL FPUSH, //Floating Point PRINT, HALT, POP, GSTORE, //Stores Value Globally GLOAD, STORE, //Stores Value Locally LOAD, BRANCH, // unconditional branch BRTRUE, // conditional branches BRFALSE, CALL, //Function call RET, //Returns to the Point after the Call MAXCODE }; struct Instruction { std::string mName; int mNumericName; int mOperandCount; std::function<void()> mInstruction; Instruction() = default; Instruction(const std::string& OpCode, int Value, int operands); Instruction(const std::string& OpCode, int Value); Instruction(const std::string& OpCode, int Value, int operands, std::function<void()> lambda); Instruction(const std::string& OpCode, int Value, std::function<void()> lambda); virtual ~Instruction() { }; static void Init( VM*); }; extern std::unique_ptr<Instruction[]> InstructionCode; }
This header i believe is pretty straight forward to understand. You find three important parts in it.
- Enumeration of the Bytes which could trigger actions
- Very small Abstraction of an Instruction containing some Meta Information
- A table of Instruction where the indices correspond the values of the enumeration
What is more interesting is now how such an instruction is implemented. I will not show the whole implementation file but instead a small section of it:
ByteCode.cpp
namespace vm { std::unique_ptr<Instruction[]> InstructionCode = std::make_unique<Instruction[]>(MAXCODE);; Instruction::Instruction(const std::string& OpCode, int Value, int operands) : mName(OpCode), mNumericName(Value), mOperandCount(operands) { InstructionCode[Value] = (*this); } Instruction::Instruction(const std::string& OpCode, int Value) : mName(OpCode), mNumericName(Value), mOperandCount(0) { InstructionCode[Value] = (*this); } Instruction::Instruction(const std::string& OpCode, int Value, int operands, std::function<void()> lambda) : mName(OpCode), mNumericName(Value), mOperandCount(operands), mInstruction(lambda) { InstructionCode[Value] = (*this); } Instruction::Instruction(const std::string& OpCode, int Value, std::function<void()> lambda) : mName(OpCode), mNumericName(Value), mOperandCount(0), mInstruction(lambda) { InstructionCode[Value] = (*this); } void Instruction::Init( VM* Machine) { Instruction("NOP", 0, []() {}); Instruction("IADD", IADD, [Machine]( ) { auto rhs = Machine->stack[Machine->stackPointer--]; //Right-Hand-Side operand auto lhs = Machine->stack[Machine->stackPointer--]; //Left-Hand-Side operand Machine->stack.pop_back(); Machine->stack.pop_back(); Machine->stack.push_back(VM::Type{ VM::Type::INT,{ (lhs.mValue.mValue + rhs.mValue.mValue) } }); Machine->stackPointer++; }); }
What you see is the that each instruction if constructed is automatically be stored in the global array you have seen in the header file. This shall insure that every instruction ever instantiated is accessible via this very array just mentioned. This is important later on.
An Instruction
saves 4 kinds of information
- Name as String
- Name as Number Code
- amount of Operands
- Implementation of the Instruction
The really interesting part start now with the static function Init
. Init
is called from the VM at beginning of it's life cycle. It does constructed all instruction just by calling the constructor of the class Instruction
.
The Init
function expects just the pointer of the current VM
instance( a class shown later) which is given to an lambda who implements the behaviour of a certain instruction.
The lambda which contains the behaviour of the instruction captures Inits
s function argument, which is the pointer to the current instance of the VM
. Through this i enable the lambda to access the VM's representation of the stack where all operation happen.
You see two Instruction
s implemented here. NOP and IADD. NOP simply does nothing so the lambda is empty. IADD on the other hand adds to integer variables which are stored on the stack before this instruction is called. if nothing is there, well that would be a problem then.
This scheme just repeats for every instruction defined in the enumeration in the header file.
VM.h
The header contains the nested struct Type
. Type
represent any values possible in the VM. To make life easier anything which is not an Pointer is stored within an double regardless if less bytes are need. Efficiency was not the purpose here. In essence Type
is a tagged union. Because Every Instruction or Opcode is represented by an Byte, ergo is an Value, it can be expressed as an Type
in the "Code Section" of the VM.
namespace vm { class VM { public: struct Type { enum types { INT, POINTER, FLOAT, CHAR } mObjecttype; union Value { double mValue; size_t mPointer; Value() = default; Value(double i) : mValue(i) {}; Value(size_t i) : mPointer(i) {}; }mValue; Type() = default; Type(types t, Value val) : mObjecttype(t), mValue(val) {} Type(const Type& t) : mObjecttype(t.mObjecttype), mValue(t.mValue) {} }; public: typedef std::vector<Type> Memory; typedef std::unique_ptr<char[]> HeapMemory; private: friend Instruction; //<-- Ugly but it works, to my surprise //Memory of the Machine //HeapMemory heap; Memory stack; Memory globals; Memory code; //!<-- Copy of the Code to execute Statistics stats; //!<-- Does Statistics about the execution of the Code //Rudimentary special purpose register size_t instructionPointer = -1; size_t stackPointer = -1; size_t framePointer = -1; //Debug bool trace = true; protected: void pushOntoStack(Type T); Type popOffStack(); size_t popPointer(); size_t HeapSize() const { return Kilobytes(512); } std::string disassemble() const; std::string stackString() const; public: VM(const std::vector<Type>& code, int main); virtual ~VM(); void cpu(); void operator()() { cpu(); } }; }
VM.cpp
The Constructor of the VM is quite simple it expects an vector of Types which stores simply a copy of the code. The second argument is the "line" number or the index from the vector from which the VM shall start executing those sweet little bytes of Opcode/Instruction. You also find the call to the static function Init
from the class Instructions
earlier explained.
namespace vm { VM::VM(const std::vector<Type>& code, int main) : globals(10)//,heap(std::make_unique<char[]>(HeapSize())) { this->code = code; this->instructionPointer = main; Instruction::Init(this); } VM::~VM() { }
The member function cpu() is quite interesting and a bit difficult to read. It first checks if the stacktrace is enabled; if so it prints the first line as you can read.
Then we enter our interpreting loop; for each instruction contained in our code given to the constructor earlier. this loop runs as long as our instruction pointer is less than the code size, otherwise we would try to access code which does not exist.
The first step is to fetch now our first Opcode and increase our IP by one. We than check if the recived opcode is valid and if so proceed.
void VM::cpu() { if (trace) std::cout << std::left << std::setw(44) << "Instruction & Results:" << std::right << "Stack Memory:\n"; while (instructionPointer < code.size()) { try { auto opcode = code[instructionPointer].mValue.mValue;// fetch OpCode instructionPointer++; if (opcode < 0) throw std::runtime_error("Opcode Mismatch No Negative instructions please "); else if (opcode > MAXCODE) throw std::runtime_error("Opcode not supported: " + std::to_string(opcode) + "! at Instruction Pointer(" + std::to_string(instructionPointer - 1) + ")."); else if(opcode == 0) throw std::runtime_error("Opcode 0 Does not exist yet!\n ");
If the trace is enabled we print the current state of the stack as well as the instruction which causes the current stack.
if (trace) { std::cout << std::left << std::setw(40) << disassemble() << std::right << stackString() << "\n";
The function AddMeasurement
from the class Statistic
does what you expect. it stores some kind of Measurement(here time). It's not really well coded and if you interested please look up the code at my repository it is in the same project. The Argument given to AddMeasurement
calls the lambda of our Opcode/Instruction and Measures how long it takes to execute and stores this information for analysis later. I did this because it seems interesting to me to know just how fast is my Virtual Machine.
stats.AddMeasurement({ InstructionCode[static_cast<size_t>(opcode)].mName, (measure<std::chrono::nanoseconds>:: duration( InstructionCode[static_cast<size_t>(opcode)].mInstruction )).count() }); //Decode and Execute Opcode } else { stats.AddMeasurement({ InstructionCode[static_cast<size_t>(opcode)].mName, (measure<std::chrono::nanoseconds>:: duration( InstructionCode[static_cast<size_t>(opcode)].mInstruction )).count() });//Decode and Execute Opcode }
What i personally find awesome is I do not have an awfull crazy long switch statement which chooses what code must be executed. In an earlier version i had such an switch statement and it drove me crazy looking at it. So i came up the with global array. The Array you find the Bytecode.h file; where the bytecode corresponds to the index of the array for appropriate instruction need if accessed.
} catch (std::out_of_range& e) { std::cerr << e.what(); } catch (std::exception& e) { std::cerr << e.what(); } catch (...) { std::cerr << "Totally lost... do not know what happend but i'm dead for now!!!\n"; } } }
The Dissamble function takes the current Opcode pointed by the IP and prints the meta information stored in it including the line number within the code as well as the Name and Value of the operands.
std::string VM::disassemble() const { std::stringstream buffer; auto opcode = static_cast<size_t>(code[instructionPointer-1].mValue.mValue); if (opcode != 0 && opcode < MAXCODE) //Print Code section { buffer << std::showbase << std::setbase(10) << std::setw(4 * 2) << std::left << instructionPointer-1 << std::dec << ": \t" << std::setw(4 * 2) << std::left << InstructionCode[static_cast<size_t>(opcode)].mName; if (InstructionCode[static_cast<size_t>(opcode)].mOperandCount > 0) { auto nargs = InstructionCode[static_cast<size_t>(opcode)].mOperandCount; std::vector<std::string> operands; for (auto i = instructionPointer ; i <= instructionPointer + nargs-1; i++) { switch (code[i].mObjecttype) { case Type::POINTER: operands.push_back(std::to_string(code[i].mValue.mPointer)); break; default: operands.push_back(std::to_string(code[i].mValue.mValue)); break; } } for (auto item = 0; item < operands.size(); item++) { if (item > 0) buffer << std::left << ", "; buffer << std::setw(2) << std::left << " " << operands[item]; } } } return buffer.str(); }
Simply prints the current state of the Stack
std::string VM::stackString() const { std::stringstream buffer; buffer << "\tstack["; if (stack.size() != 0) { for (size_t i = 0; i <= stackPointer; ++i) { switch (stack[i].mObjecttype) { case Type::INT: buffer << " " << (long long)stack[i].mValue.mValue; break; case Type::FLOAT: buffer << " " << stack[i].mValue.mValue; break; } } } buffer << " ]"; return buffer.str(); } void VM::pushOntoStack(Type T) { stack.push_back(T); stackPointer++; } auto VM::popOffStack() -> VM::Type { auto tmp = stack[stackPointer--]; stack.pop_back(); return tmp; } size_t VM::popPointer() { auto tmp = popOffStack(); if (tmp.mObjecttype != Type::POINTER) throw std::runtime_error("Tried to access a pointer but received:" + std::to_string(tmp.mObjecttype) + "!"); return tmp.mValue.mPointer; } }
The main programm
Here you see the snippet containing an array of Type. This shall be the code executed by the VM. I tried to make the definition of this array more readable by using c-style macros. The project is not yet at the point that i can read from a file and execute the code stored there. Any code must be hard coded right know.
int main() { //Recursive Type factorial[] = { //Instruction // ADDRESS ///Function factorial(var N) // ARGS=1, LOCALS=0 INSTR(LOAD, -3.0,INT), // [0] Copy Given Argument N Into Current StackFrame INSTR(IPUSH, 2.0,INT), // [2] Push Constant (2) Onto Stack SINSTR(ILT), // [4] IF ((N < 2) == TRUE) Push TRUE ELSE Push FALSE Onto Stack INSTR(BRFALSE, 10ull,POINTER), // [5] Jump to Addr 10 if on Stack lays FALSE INSTR(IPUSH, 1.0,INT), // [7] Push Constant (1) Onto Stack SINSTR(RET), // [9] Return Constant (1)[7] // // RETURN N * factorial( N-1 ) INSTR(LOAD, -3.0,INT), // [10] Copy Given Argument N Into Current StackFrame INSTR(LOAD, -3.0,INT), // [12] Copy Given Argument N Into Current StackFrame INSTR(IPUSH, 1.0,INT), // [14] Push Constant (1) Onto Stack SINSTR(ISUB), // [16] Substract Constant (1) of Value (N) DINSTR(CALL, 0ull, 1.0,POINTER,INT), // [17] Call Function: factorial(var N) SINSTR(IMUL), // [20] Multiplicate Result of (factorial(var N))[17] times Value (N)[10] SINSTR(RET), // [21] Return Result[20] /// Function Main() // ARGS=0, LOCALS=0 // <-- MAIN METHOD! INSTR(IPUSH, N,INT), // [22] Push Constant (5) Onto StacK DINSTR(CALL, 0ull, 1.0,POINTER,INT), // [25] Call Function: factorial(var N) SINSTR(PRINT), // [26] Print Stack Top SINSTR(HALT) // [27] Abort }; VM::Memory CodeInstructions(std::begin(factorial), std::end(factorial)); VM Machine(CodeInstructions, 22); try { ///auto exe = &VM::cpu; auto avg = (measure<std::chrono::nanoseconds>::duration(Machine) ); //Machine.cpu(); } catch (std::exception& e) { std::cerr << e.what() << "\n"; } std::cin.get(); return 0; }
The Result of the programm you have seen above:
Instruction & Results: Stack Memory: 22 : IPUSH 5.000000 stack[ ] 24 : CALL 0, 1.000000 stack[ 5 ] 0 : LOAD -3.000000 stack[ 5 1 ] 2 : IPUSH 2.000000 stack[ 5 1 5 ] 4 : ILT stack[ 5 1 5 2 ] 5 : BRFALSE 10 stack[ 5 1 0 ] 10 : LOAD -3.000000 stack[ 5 1 ] 12 : LOAD -3.000000 stack[ 5 1 5 ] 14 : IPUSH 1.000000 stack[ 5 1 5 5 ] 16 : ISUB stack[ 5 1 5 5 1 ] 17 : CALL 0, 1.000000 stack[ 5 1 5 4 ] 0 : LOAD -3.000000 stack[ 5 1 5 4 1 ] 2 : IPUSH 2.000000 stack[ 5 1 5 4 1 4 ] 4 : ILT stack[ 5 1 5 4 1 4 2 ] 5 : BRFALSE 10 stack[ 5 1 5 4 1 0 ] 10 : LOAD -3.000000 stack[ 5 1 5 4 1 ] 12 : LOAD -3.000000 stack[ 5 1 5 4 1 4 ] 14 : IPUSH 1.000000 stack[ 5 1 5 4 1 4 4 ] 16 : ISUB stack[ 5 1 5 4 1 4 4 1 ] 17 : CALL 0, 1.000000 stack[ 5 1 5 4 1 4 3 ] 0 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 ] 2 : IPUSH 2.000000 stack[ 5 1 5 4 1 4 3 1 3 ] 4 : ILT stack[ 5 1 5 4 1 4 3 1 3 2 ] 5 : BRFALSE 10 stack[ 5 1 5 4 1 4 3 1 0 ] 10 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 ] 12 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 3 ] 14 : IPUSH 1.000000 stack[ 5 1 5 4 1 4 3 1 3 3 ] 16 : ISUB stack[ 5 1 5 4 1 4 3 1 3 3 1 ] 17 : CALL 0, 1.000000 stack[ 5 1 5 4 1 4 3 1 3 2 ] 0 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 ] 2 : IPUSH 2.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 ] 4 : ILT stack[ 5 1 5 4 1 4 3 1 3 2 1 2 2 ] 5 : BRFALSE 10 stack[ 5 1 5 4 1 4 3 1 3 2 1 0 ] 10 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 ] 12 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 ] 14 : IPUSH 1.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 2 ] 16 : ISUB stack[ 5 1 5 4 1 4 3 1 3 2 1 2 2 1 ] 17 : CALL 0, 1.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 ] 0 : LOAD -3.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 1 ] 2 : IPUSH 2.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 1 1 ] 4 : ILT stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 1 1 2 ] 5 : BRFALSE 10 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 1 1 ] 7 : IPUSH 1.000000 stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 1 ] 9 : RET stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 1 1 ] 20 : IMUL stack[ 5 1 5 4 1 4 3 1 3 2 1 2 1 ] 21 : RET stack[ 5 1 5 4 1 4 3 1 3 2 1 2 ] 20 : IMUL stack[ 5 1 5 4 1 4 3 1 3 2 ] 21 : RET stack[ 5 1 5 4 1 4 3 1 6 ] 20 : IMUL stack[ 5 1 5 4 1 4 6 ] 21 : RET stack[ 5 1 5 4 1 24 ] 20 : IMUL stack[ 5 1 5 24 ] 21 : RET stack[ 5 1 120 ] 27 : Print stack[ 120 ] 120 28 : HALT stack[ ] Name ---> | Min | Max | Average | Sum | count(ct) | avrg/ct | ================================================================================================================================== ISUB ---> | 3.9e+03 ns | 4.3e+03 ns | 4.1e+03 ns | 1.7e+04 ns | 4 | 1036.13 ns/ct | IMUL ---> | 3.6e+03 ns | 4.3e+03 ns | 3.9e+03 ns | 1.6e+04 ns | 4 | 986.875 ns/ct | ILT ---> | 3.9e+03 ns | 4.7e+03 ns | 4.2e+03 ns | 2.1e+04 ns | 5 | 836.8 ns/ct | IPUSH ---> | 2e+03 ns | 6.7e+03 ns | 2.9e+03 ns | 3.2e+04 ns | 11 | 264.248 ns/ct | Print ---> | 1.5e+04 ns | 1.5e+04 ns | 1.5e+04 ns | 1.5e+04 ns | 1 | 15395 ns/ct | LOAD ---> | 2.4e+03 ns | 8.3e+03 ns | 3.6e+03 ns | 4.7e+04 ns | 13 | 280.278 ns/ct | BRFALSE ---> | 2e+03 ns | 2.8e+03 ns | 2.3e+03 ns | 1.1e+04 ns | 5 | 457.92 ns/ct | CALL ---> | 4.3e+03 ns | 1.5e+04 ns | 7.8e+03 ns | 3.9e+04 ns | 5 | 1563.08 ns/ct | ----------------------------------------------------------------------------------------------------------------------------------- Sum | 37498 ns | 61973 ns | 44326.34 ns | 198549 ns | 48 | 20820.33 | Average | 781.2083 ns/ct | 1291.104 ns/ct | 923.4655 ns/ct | 4136.438ns/ct | | |
enum ByteCodes : unsigned long long
, but with a type that is way too large for the few constants defined? Aunsigned char
would be plenty large.\$\endgroup\$