Your mission seems to be matching with what we are trying with Graal and Truffle. The Truffle API aims to be stable as well. We also provide basic building blocks like an object model. I am curious how you plan to support speculative optimizations that need to deoptimize and reconstruct interpreter stack frames? In my experience that's essential for building high performance dynamic language implementations.
I wonder whether a "bridge-across-the-void" approach would work. The idea is that a speculative optimization would not discard its original form, so that deoptimization is as cheap as a rollback. Then, when speculative optimizations are performed, they form a kind of bridging branching structure which reaches across the void of invalid low-level IRs, trying to find a safe point at which the optimization can complete.
I know that Truffle is based on partial evaluation; how close are you to Futamura #2? That'd relieve you of the burden of having to care about what an "interpreter stack frame" even looks like. OTOH I can confirm that the same kind of speculation in optimization has to occur during self-application.
The interpreter I'm building right now will essentially be a JIT, but it will JIT to an internal IR instead of machine code. My plan is to eventually have it operate on compact and optimized stack frames as native code would.
> essential for high performance dynamic language implementations
Is it? Objective-C is a dynamic language (AOT-compiled) that doesn't have these features, and it is possible to write very high performance code with it.
I don't have any numbers, but I think it's generally possible to write very high performance code in Objective C... by not using the dynamic language features of it such as message sends.
Objective C I believe does a globally cached method lookup for every message send (!) and so can't inline through message sends (!), and since inlining is the mother of all optimisations I would imagine this would severely limit performance if you tried to use a lot of message sends in your inner loops. We should actually think about doing that experiment to see what the cost would be.
In a language like Ruby almost all operators, even basic arithmetic, are dynamic method calls, so you can't avoid using message sends anywhere. I think if you tried to do that in Objective C things might grind to a halt.
Objective C also lacks many dynamic language features which are the ones solved through speculative optimisation, such as integer overflow, access to frames as objects, and so on.
I'm not an expert on Object C though so happy to be corrected.
> but I think it's generally possible to write very high performance code in Objective C... by not using the dynamic language features of it such as message sends.
Yes, that's a common misconception...with a grain of truth. In my experience, you get the best performance by judiciously mixing dynamic and static features. And yes, that means eschewing some dynamic features in some inner loops (The 97:3 rule applies). However, you can also often gain significant performance by hiding behind a polymorphic dispatch.
For example, I reimplemented Apple's binary plist parsers+generators in Objective-C (from C) for a significant speed boost: the polymorphic implementation allowed me to put in override points for things such as lazy loading, and interface-based (de-)serialization removes the need for a generic intermediate representation. Compared to those advantages, the cost of message-sends is negligible (and optimizable if it becomes a problem).
> Objective C I believe does a globally cached method lookup for every message send(!)
Yes. It's quite fast and despite what people fret about rarely a problem.
> and so can't inline through message sends (!),
If it does become a problem (measure, measure, measure!), there are techniques to avoid the lookup: IMP-cache, convert to C function call, convert to inline function, convert to Macro.
> since inlining is the mother of all optimisations
Hmm...the mother of all optimizations is measuring and removing unnecessary code. Then comes eliminating/reducing and "sequentializing" memory access.
How do you think Objective C would perform if every operator was a dynamic method call as it is in Ruby? Surely then you'd start to get frustrated with the overhead? That's why languages like Ruby need the speculative optimisations.
> How do you think Objective C would perform if every operator was a dynamic method call as it is in Ruby?
Depends very much on what you mean with "every": the 97:3 rule applies, and is almost certainly even more highly skewed today[1]. So for the vast majority of code, it wouldn't matter. Correction: doesn't matter. For example, Apple's Swift language produces code that is incredibly slow when non-optimized, loops and the like can easily be 1000x (a thousand times!) slower than optimized, and yet Xcode's debug builds default to non-optimized and people don't report that their debug builds are unusable.
Another example: I implemented the central re-pagination loop in my BookLightning imposition app[2] in my Objective-Smalltalk language[3], which currently has just about the slowest implementation imaginable (an AST-walker inefficiently implemented in Objective-C), at least an order of magnitude slower than Ruby. Despite that, BookLightning is at least an order of magnitude faster than the OS-X print system, which is largely written in C. Why? It computes by page (which is sufficient for this task), rather than by individual PDF graphical element. That difference is so great that the steering code controlling the computation just doesn't matter.
> Surely then you'd start to get frustrated with the overhead?
As long as Objective-C were still a hybrid language: probably not, because I could always eliminate the overhead in the (very) few places that mattered, and could do so reliably/predictably [4]. In fact, for Objective-Smalltalk I am very much leaning towards that approach (Smalltalk-ish by default, optimizations optional), and so far things are looking good.
> That's why languages like Ruby need the speculative optimisations.
Or C libraries, which is what I believe high performance Ruby code does.
p.s.: I think Truffle and Graal are awesome, and as a researcher I wish I'd come up with them. When doing actual practical performance work, I prefer simpler and more predictable tech.
But going back to the original argument that was being made - you say that you don't need speculative optimisations to make something like Objective C fast. But you say to do that you don't use the dynamic features where you need performance - use macros and C functions instead.
So yes you don't need speculative optimisations... as long as you apply similar optimisations manually in the source code yourself. I'm not convinced therefore :)
> you don't need speculative optimisations to make something like Objective C fast.
There is a difference between "make X fast" and "write fast code in X", a distinction that is hugely important in practice.
So no, I don't need speculative optimization to write fast code in Objectice-C, but I would need speculative optimization to make Objective-C code fast [without touching the Objective-C source code].
> But you say to do that you don't use the dynamic features where you need performance
No, I did not say that at all. I said I mix dynamic features and non-dynamic features where necessary, and that I need both to make things fast. And (more importantly), that the most important optimizations have nothing to do with either (which counters the assertion that inlining is "the mother of all optimizations").
> So yes you don't need speculative optimisations... as long as you apply similar optimisations manually in the source code yourself.
The point being that (a) those optimizations are such a small part of the overall optimization process, which in turn is applied to such a tiny part of the overall code-base, that automation is not needed. Which invalidates the assertion that these optimizations are "necessary". Nice to have? Yes. Necessary? No.
The other point (b) is that optimizations by the compiler/JIT aren't as good as those applied manually for many reasons, one being that the compiler/JIT has to make the transformations indistinguishable at a fairly low-level, whereas the author has a higher-level overview and can adjust the semantics to fit. So doing it manually is also worth it.
The third point is that I cannot rely on the compiler/JIT making those optimizations, there is no guarantee that they will be applied. And with today's performance landscape being what it is, reliability is paramount, meaning that being able to guarantee even a slightly higher bound is more important than meeting a lower bound some of the time or even on average.
> I'm not convinced therefore :)
As long as I am capable of applying those optimizations manually, I have made my original point which is that having these optimizations done automatically is not necessary for performance, but at best "nice to have". Which doesn't mean that more compiler support wouldn't be nice, the process in Objective-C is too ad-hoc.
I'm wondering how different languages implemented on top of this VM could be, especially at the semantic level.
Having different syntaxes on top of the same VM is nice and all the JVM languages show that there is room for a good variety and for different paradigms.
On the other hand, the choice of a VM sets some constraints while providing important features. Think for instance at the differences between the languages running on the JVM and the languages running on the BEAM (Erlang's VM).
I think there is a lot of untapped potential for innovative ideas at the VM-level but for some reason most of the effort goes into proposing new syntaxes that reuse the same concepts all the other languages are using.
Not familiar with Neko, but superficially, it seems that VM is designed for a specific programming language. It uses a bytecode IR whereas Zeta uses a textual one. Neko is also farther along, more mature and feature complete than Zeta.
I intend to take a more experimental direction with Zeta. It won't have an FFI, for instance. It will only provide a small set of minimalist APIs. The VM will be intentionally designed to avoid code rot and breaking changes.
Another thing I would like to experiment with is transparent compilation of pixel shaders to run on GPUs, the ability to run code in any language running on Zeta (given some restrictions) on both the CPU and GPU. I believe I have found a way to make this work, based on my type-specialization research.
Hmm, I haven't heard this idea before that the FFI or surface area of libraries in a language causes bitrot. Could you elaborate on the connection? I can take version 1 of the C sources of Vim from back in '92 and compile them without trouble. I'm not aware of dynamic languages like Javascript or Python 2 having any bitrot issues either. Backwards compatibility seems like a pretty big constraint for everyone.
Edit 16 minutes later: I just tried your benchmark with my VM-like language (https://github.com/akkartik/mu), and the time taken was almost identical. Interesting exercise! Here's the Mu and Plush/0 programs side by side:
$ cat fib.mu
def fib n:num -> result:num [
local-scope
load-ingredients
base-case?:bool <- lesser-or-equal n, 1
return-if base-case?, n
n <- subtract n, 1
fib-n-1:num <- fib n
n <- subtract n, 1
fib-n-2:num <- fib n
result <- add fib-n-1, fib-n-2
]
def main [
local-scope
x:num <- fib 29
$print x, 10/newline
]
$ cat benchmarks/fib29.pls
#language "lang/plush/0"
var fib = function (n)
{
if (n < 2)
return n;
return fib(n-1) + fib(n-2);
};
var r = fib(29);
print(r);
C has been fairly stable across time, but you gave to admit, a C program from 1992 still compiling as-is, that's an exception rather than the rule. The reason Vim from 1992 might still compile is actually because it doesn't have that much dependencies apart from standard C and POSIX APIs.
JavaScript has massive bitrot issues. The HTML DOM is huge and constantly changing. I have had my own web apps break multiple times over the years. As for Python and C, if you stick to the core language, and minimize dependencies, you might be Ok. The problem is that the more dependencies you have, if any one of them break, they can render your program broken... And if you're not there to fix it, your program remains broken forever.
My argument for reducing API surface and keeping APIs low-level and minimalist is that the smaller an API, the more difficult it is to implement it wrong. It's easier for two implementations to implement a smaller API and have the same behavior. It's also easier to test small APIs for conformance, etc.
I think I see what you mean. However, all these previous languages would claim that the parts they control have not suffered from bitrot. The dependencies you're thinking of are not considered part of the language in each case. Is that right? How would you keep people from creating new libraries in your language for unanticipated use cases? Say a self-driving car library, or a command-and-control module for all the IoT devices in a house from 2029?
It seems to me that bitrot is fundamentally a result of change in the outside world. The only way to opt out of bitrot when the world changes rapidly seems to be to disengage from the world and become irrelevant. I'm fairly certain Mu will not suffer from bitrot in 30 years -- but it'll only be because nobody ever built anything with it :)
> How would you keep people from creating new libraries in your language for unanticipated use cases? Say a self-driving car library, or a command-and-control module for all the IoT devices in a house from 2029?
I can't prevent people from doing that. I can only put together the conditions to make it easy to write software that doesn't break, as much as possible.
> It seems to me that bitrot is fundamentally a result of change in the outside world.
Yes and no. On the one hand, some breakage is inevitable with changing conditions. On the other hand, the computers we have now provide the same basic facilities as computers did in 1969. It ought to be able to provide some stable building blocks, and hope that people will use them.
Currently, it seems to me that software breaks all the time, at a rate that is unacceptably high.
I have come to the same conclusion as tachyonbeam, it is interaction with the outside world that causes bitrot. Rather than have APIs, we need communication protocol with simple semantics. The VM should interact over a constrained, well defined protocol, that protocol could implement a POSIX io model, but that would be up to the client to offer that abstraction over messages.
The surface area of POSIX is too high to build systems that can run for 10s or hundreds of years. Look at the design of Lua over Python. The assumptions it makes of the underlying platform are much much cleaner, thus it is more portable and easier to debug platform issues.
Interesting. Can you show an example of how Lua's interface to the underlying platform improves on Python? Doesn't replacing APIs with protocols merely shift the breakage to higher-level logical errors rather than lower-level mechanical ones?
Lua is under-coupled with the base system. For the longest time it wouldn't even support dynamically loaded modules because it wasn't portable. It wasn't part of C89.
So when one uses Lua, it is up to the user to supply IO. This threading through the needle or hour-glass allows the users (embedder) to define how the scripts interact with the base system. This makes them more resilient over time, the interactions are better specified and highly mediated.
Somewhat analogous to the library vs framework dichotomy. Libraries survive better than frameworks do.
having fairly recently redeployed a MUD I worked on in the early 90's (telnet mud.legitimatesounding.com 4000 - https://github.com/JerrySievert/SillyMUD), I can't agree more.
I spent many hours fixing compile issues, memory problems that should have taken it down in the early 90's but didn't seem to matter at the time, and moving it to a modern socket API. I was quite surprised how much things had changed in 25 years.
The IR, once past an initial prototyping state, will not change. Things will be added to it later, but not removed. The IR and semantics will be designed, as much as possible, to eliminate corner cases and undefined behaviors. Zeta will favor robustness over tiny performance gains. I realize that completely eliminating undefined behaviors isn't necessarily possible, but we will do our best.
The outside world will be accessible through a set of APIs that are small, have limited surface area. For instance, sound output will be done by writing raw samples. Graphics will be done with pixel shaders (functions which given coordinates return a pixel color). New APIs will be added over time, new versions of APIs will be added, but the older APIs will remain functional.
There will eventually be a package manager, and it will have immutable package versioning. That is, you can publish v5 of libtern, and I can depend on it. You can later publish v6 of libtern, but you can't change v5, so you can't break my code which depends on v5 when you publish new versions of your lib.
Obvious caveat: Zeta will have no FFI. You won't be able to immediately use all the libraries you could use in another system. I still think that building a huge amount of code that doesn't break could turn out to be a huge advantage. We won't try to satisfy every use case out there... But I think there are a lot of interesting and useful programs which don't need much more than basic keyboard and mouse input, audio and video output, network and file access (think photoshop, audio editing software, IDEs, etc). We will probably end up adding APIs for other things like MIDI, etc later on.
In the audio space, it is vital to support FFI for a plugin architecture because I can guarantee you your language isn't the-best-thing-since-sliced-bread for somebody who has stuff you really want available to you. Audio software lives and dies on the VST specification. (Audio Units are substitutable for VSTs--but the same problem remains.)
I admire your efforts, but perhaps you should understand the use cases of the things you want to use as examples before you try to employ them. It doesn't instill confidence.
You can make useful audio software for a very tiny slice of a very picky market without a plugin architecture. Even crap like Audacity has a plugin architecture for a reason: because they know they don't support what people need and want.
And yes, you can build your own plugins. You cannot replace the thousands upon thousands of VSTs out there. You aren't replacing, for example, RealStrat, which is a supersampled virtual instrument, without re-recording the entire thing (do you know how? Is anyone going to bother doing it when they can use existing, conventional software?) and simultaneously reverse-engineering the algorithms used internal to it. You aren't replacing even something comparatively simple like an Amplitube without heavy-hitting, lots-of-money algorithmic research in the first place. (And, on top of that, I sure wouldn't want to deal with the patent encumbrances!)
And pardon my skepticism, but emulating latency-sensitive, CPU-intensive stuff like audio processing strikes me as a pretty foolish idea. When milliseconds count, your emulation layer just gets in the way. What benefit are you bringing to make additional failure cases and additional latency worth a user's time?
You aren't reinventing the entire audio world; as with many other fields, the lingua franca is C and not you and no matter how much you yell at the mountain the mountain's not gonna move. If you want to play, you have to play with (good metaphor for music in general, now that I think about it). Maybe you should rethink some of your decisions, or better constrain your ambitions. Because right now--and this may sound harsh but I'm saying it because I like seeing people succeed--it sounds like you're doing the software engineer That Guy thing and handwaving away difficult things because they aren't conducive to the postulates you're starting with.
I'm OK with not being able to replace every audio program ever written. That's an impossible goal. I also know that I can never satisfy every possible use case.
I'm also quite sure that an audio program with a plugin architecture will be feasible in Zeta, once it's more mature. The system is built for scripting, multi-language support and dynamically loading packages. It will be an excellent platform for plugins. You'll be able to define your own custom plugin language(s) easily, and then you'll be able to dynamically browse and load plugins from the package manager.
Let's put it this way: WHEN (and IF) Zeta works, people can discuss its suitability for audio work, and how it would cater to the VST crowds. Until then, this is beyond premature discussion...
I think people fail to understand, too, that I'm OK with having modest goals for Zeta. This system is going to be an experiment. If it doesn't become hugely popular, it's not the end of the world. If some things don't work out, we can try something different. At the end of the day, I think some things really have to be tried out before we can decide if they're viable/useful or not.
I suspect that some of that confusion comes from the fact that you directly compare yourself to LLVM, and that you haven't mentioned much about ParrotVM or NekoVM, which are both efforts in this space. It's also worth saying that the CLR and DLR are both efforts here as well.
When you compare yourself to LLVM, you phrase the comparison as only an advantage (Zeta will be easier to use than LLVM), rather than a tradeoff (Zeta will aim for ease of use before performance, as opposed to LLVM).
I don't suggest that you start hedging your writing, but am just pointing out that if you make big comparisons, people will assume great ambition on your part.
@eropple Replying to myself since HN doesn't let me reply to your comment.
As previously stated, my goal with Zeta is not to replace all existing VMs and to satisfy every use case. It's an experiment, it's an opinionated system. I'm going to do things differently from other implementations and try to demonstrate interesting things along the way. Even if Zeta never becomes popular, I know it will make people talk, and I hope it inspires other systems.
elm is a good example here - there is no ffi for somewhat similar reasons (specifically, that it can make safety guarantees about not crashing at run-time if you don't allow arbitrary javascript code to be called), and that has both helped write reliable code, and hindered it's usefulness in larger applications that have to play with the existing world of js libraries and components
I'm not telling you not to have opinions, to be clear. I'm suggesting maybe you shouldn't adjust those pince-nez and pronounce so hard. Many systems exist the way they do for reasons that Software Developer That Guy cannot divine from outside; you just happened to hit on one that I know a little bit about.
(Photoshop is another example; professional-level tools require professional-level expansion. Because everybody's definition of "professional" is different.)
What about WebAssembly? It's hit 1.0 now, & it's designed so that 1.0 wasm will never become deprecated
My own thoughts have been considering a wasm vm in kernel space. No need to ever exit kernel space; the wasm vm provides sufficient sandboxing. Wasm binaries could run on any architecture. Hopefully come up with a method that browsers can efficiently run wasm in directly in the kernel's sandboxing rather than having wasm-in-wasm. Syscalls could be exposed through wasm's import/export system. Kind of like that Metal idea in "The Birth & Death of JavaScript"
WASM may remain stable, but it provides no APIs to the outside world AFAIK. The DOM keeps changing, and if the APIs are implementation-dependent, then WASM doesn't guarantee non-breaking at all.
Another advantage that Zeta has is that it will come with useful building blocks for dynamic languages. With WASM, you have to implement your own dynamic typing, GC, JIT, etc, and the current WASM implementation doesn't yet support JITting.
Author here: sometimes you want to get the tag of a value, and operate on that tag as a value, but I guess this may not be necessary in this system, given that the get_tag instruction will produce an immutable string.