Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That code has a lot of branching. The switch statement has to jump to the corresponding case, the break statement branches to the bottom, and then there is third branch to get back to the top of the while loop. Three branches just to hit one instruction.

That's a bit unfair. Not all branches are equal. Only the instruction fetch branch is going to be often mispredicted. Predicted branches, like that while loop, aren't that expensive. Mispredicted branches cost 10-20x more.

Of course less branches and less code in general is better.

One big issue writing interpreters in C/C++ is that compiler register allocation can't usually follow the data flow, and needs to keep unnecessarily loading and storing from/to memory same common variables.

Interpreters need to also be careful not to exceed 32 kB L1 code cache limits.

All this means to write a truly efficient interpreter, you'll need to do it in assembler.

The step after that is to write a simple JIT that does away with data dependent (= VM instruction) branches altogether.

Then you'll notice you don't need to update some VM registers every time, but can coalesce for example program counter updates to certain points.

Eventually you'll find you have a full fledged JIT compiler doing instruction scheduling and register allocation, etc.

Been down that rabbit hole, except for the last step. That's where it becomes a true challenge.

LuaJIT (http://luajit.org/) project followed all the way through, and studying it is a great resource for anyone interested on the topic. Kudos to Mike Pall.



I’m writing a C port of the LuaJIT VM at the moment. I’m hoping that will help people to understand the overall design and make it easier to interpret the asm code (amongst other things). Link: https://github.com/raptorjit/raptorjit/pull/199


> Predicted branches, like that while loop, aren't that expensive.

The dispatch loop conditional used to be quite expensive. Since Haswell it has become much less expensive, such that it wasn't even worth threading dispatch. But now with Spectre mitigations that cost may rise again.

Also, it depends on how tight the VM loop is. The prediction buffers are only so big, so if your ops are doing alot of work you can end up back with pre-Haswell performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: