So... can I run Erlang code on it efficiently? (This question also disqualifies LLVM, the JVM, the CLR, etc. When most people say "universal VM" they really mean "universal VM for running mostly-serial code that might at most have 5 or 10 threads, and where killing and restarting the entire VM (to, e.g., spin up a newer version) isn't a big problem for the clients talking to it." This is not future-proof.)
I can't tell from the precis whether it supports hot code loading (the JVM doesn't, so it won't without heavy magic); and it definitely doesn't work in a soft-realtime manner ("In Erjang, every node runs on a single heap, and so global GC will sometimes happen.") Without these two things, you're not "running Erlang code efficiently"; you're just running a pretty-useless-for-its-intended-purpose subset of the language.
It's a bit like those benchmarks that show that a language that's a subset of Ruby, performs much faster than actual Ruby. Well, yes, that's nice--but we use Ruby precisely for all the inherently-really-slow parts. Once you add all the slow stuff, and have to check whether to branch to it at every call, your speed advantage stops being nearly so impressive.
Erlang is a painful language to code in; you don't do it because it's pretty. You do it for what the BEAM VM and the OTP libraries give you: the ability to spawn 100k processes representing, say, people's phone calls, or their MMO game sessions, and not drop any connections for something piffling like a garbage collection pass, or a bugfix. I would expect no less of any "universal" VM. (Though I might, of course, expect more. It should probably be able to do CPU-bound stuff like large matrix multiplications without dragging the rest of the VM down, for example.)
> the JVM doesn't, so it won't without heavy magic
The JVM does support hot code loading. In statically typed languages like Java it's not so easy to use, but there's no reason not to use it for Erlang.
> and it definitely doesn't work in a soft-realtime manner
You're misinterpreting. While the JVM, unlike Beam, has a single heap (which brings significant performance advantages over Erlang's copying), its GC is so much more advanced than Beam's, that in spite of this limitation there's no reason it shouldn't behave as well as, or better, than Beam.
Also, this is a solved problem. If you want to absolutely guarantee no pauses at all, use a commercial JVM that guarantees that.
> you're just running a pretty-useless-for-its-intended-purpose subset of the language.
This is a huge exaggeration. Beam has its issues as well, and if you can deliver better performance overall a lot of people would gladly take it in exchange for the occasional pause (or pay money to eliminate the pause).
> Erlang is a painful language to code in; you don't do it because it's pretty. You do it for what the BEAM VM and the OTP libraries give you: the ability to spawn 100k processes representing, say, people's phone calls, or their MMO game sessions, and not drop any connections for something piffling like a garbage collection pass, or a bugfix. I would expect no less of any "universal" VM. (Though I might, of course, expect more. It should probably be able to do CPU-bound stuff like large matrix multiplications without dragging the rest of the VM down, for example.)
The JVM can deliver all that, with awesome performance to boot.
I have the utmost respect for Beam, but the JVM can give Erlang a different set of tradeoffs that is very attractive in many circumstances (I would argue in most). No one can take Erlang's valuable concepts away from it. That doesn't mean running it on a different VM implementation is not beneficial for both Erlang and JVM communities.
I'm confused. At some points you talk about "The" JVM as if there is only one, yet in other places you suggest there are multiple ones that offer different tradeoffs.
When someone says "The JVM does that", are they talking about all JVM implementations? Just a subset of them? How do you tell them apart?
Most JVM implementations are similar in their capabilities (with the exceptions being some hard real-time JVMs). So when people say "the JVM" they either mean all of them (the well-known ones, at least), or the HotSpot or OpenJDK VMs, which are almost identical. Usually it doesn't make a difference.
There exists at least one commercial JVM (that I know of) that guarantees no GC pauses over a few milliseconds. It is based on HotSpot, so it offers HotSpot's advantages as well. Another implementation, based on IBM's J9 JVM, also guarantees no pauses but may require configuration (that one also supports hard real-time uses IIRC).
Perhaps when it comes to nomenclature it's sort of like Lisp. There are many Lisps: Scheme, EmacsLisp, etc. But there is only one Lisp, and that is Common Lisp.
Likewise there are many JVMs. But there is only one "the JVM", and that is Hotspot.
> It's a bit like those benchmarks that show that a language that's a subset of Ruby, performs much faster than actual Ruby.
I think this is unfair to my work. It's true that it is an incomplete implementation of Ruby, but to a greater or lesser extent so are Topaz and JRuby. Charles Nutter independently identified the tricky parts of Ruby that you are talking about, and said that people should implement these before speaking about performance numbers. We went through that list and implemented all of them to at least the same standard as JRuby, with the exception of C extensions, we have a GIL and we can't yet run Ruby on Rails.
So for example you can redefine Fixnum#+, or do any other crazy monkey patching, and we will run at exactly the same speed. This is actually unlike JRuby, which has a special fastpath for call sites calling these methods that if you monkey patch will be turned off.
This is all covered if you watch the full video. If that's still unclear, please join us on the Graal mailing list and we can chat. http://openjdk.java.net/projects/graal/
I actually wasn't referring to Graal at all (I didn't realize it had anything to do with Ruby.) I was thinking more of IronRuby (I think?) back when that project first started, before they had much implemented, when they came out with performance numbers. Whichever one it was, it was what spurred the controversy that Charles Nutter was reacting to by doing that.
Not knowing much at all about erlang (other than its apparent software uptime), could you briefly tell me how erlang achieves such amazing performance and reliability? Is it due to the way erlang code is written (which is pretty functional looking to me) more than the vm's inherent structure, or a bit of both?
I really like Erlang, but the last thing you can say about it is that it has amazing performance. It's actually quite slow (that's why much of the libraries' functionality is implemented in C), but good enough for many purposes.
Erlang gets its much deserved praise for reliability because of one crucial thing: an almost perfect isolation of processes (actors). What one process does cannot (unless it's consuming a shared resource) affect another.
Erlang isn't really performant in the way I think you're using the word. It has okay speed, but you don't want to write your game's physics engine in it. On the other hand, its concurrent performance is great--you can load up a VM with tons and tons of processes, and as long as none of them individually are pegging the CPU, the VM will efficiently get them all evaluated.
Basically, the key to this is a combination of Erlang's use of tail-calls as the only means of looping, and the VM's use of what I might call (I'm not actually sure of the name for it) "hybrid budget-based cooperative/pre-emptive scheduling."
Since all loops are implemented as tail-calls, every Erlang function effectively is a straight O(N)-time-for-N-instructions shot that ends in either another call (which either adds, or reuses a stack frame) or a return. This gives the VM the opportunity to act like a pre-emptive machine, while gaining the advantages of cooperative multitasking.
In cooperatively multitasked VMs, where coroutines have to explicitly "yield" to pass the baton, you get a huge advantage: since the instruction-set architecture can be designed to ensure that memory is in a well-defined state whenever you yield, you don't have to do the expensive context-switch thing that pre-emptive architectures do: stashing and unstashing registers, switching out memory descriptors, etc. It can literally just be a jmp instruction.
BEAM is basically a cooperatively-multitasked VM, except that every "call" instruction is also an implicit "yield". (More specifically, it's a "yield if this process has executed >= 2000 call instructions since it received control.") So you get everything nice about cooperative multitasking (you never decode only 0.8 of an audio frame before being interrupted), and everything nice about pre-emptive multitasking (nothing can hog the processor forever[1]), together; thus, soft real-time.
Another place where this design comes into play is in hot code loading. The design of the loader itself is pretty simple: modules are kept in a hash-table, keyed by name. A module is referenced by address on the stack, and in loaded (threaded) bytecode; and by key in unloaded (abstract) bytecode. When you upgrade a module, you just replace the value of the key in the module dictionary. Functions that call other functions in their own module as they're running stay in the previous version of a module; other functions that try to jump into the module--or functions from the module that make a "remote" (fully-qualified) call back into the module--jump into the new version.
This would be a lot harder and more complex, if we didn't have that guarantee that every looping construct in Erlang is implemented in terms of tail-calls. Since they are, we don't have to worry about "interrupting" a process to upgrade it when it still has some dirty state; we can just wait around, and every process will yield after a few microseconds, and we can upgrade it then, when its state is well-defined.
---
[1] --unless you call into C and that C code does something that takes a million years. This is why people tell you to not do CPU-bound things in Erlang. The semantics of BEAM's instruction-set architecture (and therefore its non-optimal speed) is essential to how it multitasks; you have to insert explicit process-yielding checks in your C code if you want it to be "non-blocking." Or you can write your C code as a "port", which means letting the OS manage it as a separate process--which, if you're writing one global matrix-transformer process or one per-MMO-zone physics-engine process, isn't that bad; but if you're writing a per-user speech-to-text analyzer, you've handed your OS the job of managing 100k real native processes. Better to just write it in plain Erlang, let the VM do its concurrency thing, take the speed hit, and scale horizontally a bit sooner than you would have had to with C. (Erlang is great at scaling horizontally.)
Curiously enough, all of the features you've mentioned are doable on the JVM, even without TCO support (I know this because I've worked on implementing them in https://github.com/puniverse/quasar – well, except for the "budget-based yield" but that's a minor change). The general idea is that instead of providing the implementation baked into the VM, you inject it into the compiled code using bytecode instrumentation (at load or compile time).
What truly separates BEAM from the JVM is the almost total process isolation, particularly, as you've mentioned, in the case of memory allocation and reclamation. This difference, however, entails tradeoffs – sometimes you'd want the one and sometimes the other.
Not so curious; this is the true meaning of Turing-completeness at work (not the pop meaning programmers use in reference to programming languages.) You can always write a virtual machine with semantics Y, that both executes on top of another Turing-complete abstract machine with semantics X, and reads X machine-code--and thus get semantics Y on machine X.
What you've effectively done is to just skip the naive VM-emulator step, and move to the optimization of dynamic recompilation, where you move the state-transitions and additional semantics from the X interpreter, into the chunks of X machine-code it would operate on. You've still implemented a Y VM; it's just distributed throughout the code output by the compiler.
[For the same reason, I'm planning to port BEAM to asm.js. Why? Because pre-emptive concurrency is just an abstract-machine semantic, and you can get it from a non-pre-emptively concurrent platform using the exact logic above. No more callbacks! (If everything that uses asm.js has Web Worker support, though, they could be used as run queues vis. BEAM's SMP support, leaving the UI thread a lot less stressed.)]
Well, Yes, but the result still benefits from the JVMs awesome performance with regards to optimizations and memory management, and its terrific monitoring and management tools. So, true, you lose some of BEAM's process isolation, but you gain performance and tooling.
So I don't know if the JVM is universal in the sense that it can provide an efficient implementation for all known or future languages, but it can certainly serve as a very good Erlang VM.
Actually, Javascript is a bigger challenge than Erlang for the JVM. In Erlang there's no dynamic dispatch, while in Javascript you have nothing but. So a different kind of JIT might be better suited for Javascript.
Sure, I didn't mean "what the JVM can do in extremis"; just "what the JVM's designers imagined the average-case use of the VM to be--the one worth optimizing and creating fast-paths for."
Most sequential programs that rely on native threads, whether in C or Java or whatever, just use 5-10 "big" threads, where each thread is still multiplexing work on its own large corral of data. In a game, for example, you'll see one "graphics thread", one "network thread", one "AI thread", etc. Not "one graphics thread per voxel"; not "one network thread per peer connection"; not "one AI thread per NPC". These designs are bad, in a traditional sequentially-oriented VM, precisely because they will blow up on you in the worst case (e.g. when you look to the horizon on a Minecraft server.)
> can I run Erlang code on it efficiently? (This question also disqualifies LLVM
Are you saying LLVM can't be used to efficiently implement Erlang-style processes/tasks? I thought Rust was aiming to be able to handle a large number of lightweight tasks, and its based on LLVM?
Erlang actually uses LLVM as a JIT, but it does so by doing the same layering-on of semantics that I talk about here (https://news.ycombinator.com/item?id=6359716). By itself, LLVM IR has neither a "send" instruction, nor a "call" instruction with yield-semantics, and LLVM offers no tail-call elimination support. The JIT has to do a lot of pre-munging (dynamic recompilation) to produce code that LLVM can treat "blindly" as sequential machine-code. (LLVM also has a specific calling-convention implemented just for Erlang code, similar to the one it uses for Haskell code, since both generally avoid mutable state, which confuses LLVM's regular register-allocator.)
Rust is a lot more like Go, with its multitasking primitives being based on channels (shared-memory atomic data structures) rather than message-passing-by-copying, and each actor/thread of execution allowing mutable state within itself, as long as its mutations don't leak across actor boundaries. In other words, it doesn't really change the semantics overmuch from that of the non-NUMA multicore Von Neumann machine (e.g. x86) it's running on; it just does "green threads" the traditional way, with some elegant builtins for getting data from one to another. Rust's and Go's semantics are both isomorphic to x86/LLVM/JVM/CLR semantics, as far as I can tell; you can translate directly from one to the other without any overhead from maintaining an implicit VM state machine (in est, a monad) across the compiled code chunks.
One clarification: Rust's concurrency primitives aren't based on shared-memory data structures, because that leaves you vulnerable to data races and can require either a stop-the-world or concurrent GC. Instead it makes use of linear types to keep track of ownership between tasks, which is more restrictive but also means that sending data between any two tasks is memory-safe and never more expensive than the cost of copying a pointer.
This got me excited yeah, but the main issue is always that of the ETA. I do love Ruby, wouldn't use any other language really, but damn would a little speed boost be nice.
And of course I want it for free. Once all the goodness for JRuby, Rubinius and that Oracle stuff get shaken for a long enough time, I guess we should see the benefits in our everyday-coding.
I read "Oracle" and "VM" in the title and immediately thought that VirtualBox was going to be OpenOffice/MySQL-ized. I don't know how VirtualBox fits into Oracle's strategy, but I'm glad that they're not messing with it.
Always good to see such well reasoned technical discussion here on HN. You make a particularly well-reasoned point about the text in the title and how it affects the technical aspects of the VM.
Isn't think supposedly what parrot[1] is supposed to be for? Granted the perl 6 stuff under parrot is practically a laughing stock amongst much of the tech community, but they are trying to make something awesome.
Why would people "panic" if Oracle builds them a better VM or the tools to build a better Ruby/Python/Java VM?
It's quite different to the CLR. The CLR lets you implement any language you like, as long as it can basically be mapped to C# constructs, not that different from the JVM.
What Snoracle is working on is more generic with much higher performance.
The CLR supports things that C# the language doesn't. C# isn't the common denominator here, it's just one of the languages running on the CLR (and probably? the most well-known). VB.Net and F# are equals, not to mention all the non-MS languages that exist.
C# isn't special and doesn't represent the CLR's potential. It certainly might drive the evolution of the CLR to a point.
Actually, Oracle is a company famous for its relational database. Also, it's notable for its hostility towards open-source and free software.
My bet is that their website must be accessible to the corporate desktops of companies that buy Oracle stuff. That means it has to work with IE6, most likely.
he's talking about their website... and oracle isn't just a database company the last I checked. I mean, they make sparc servers. And VirtualBox. And Java.
If he was talking about the company, it would be an even stranger thing to say. As we are not talking about a technology the only way it could be interpreted is that Oracle (the company) does not 'tolerate' HTML5 as if they have something against it. If he was talking about another of their technologies he should have said so.