I know about all this — I actually began implementing my own JVM language a few days ago. I know Android uses Dalvik btw. But I guess a lot of people can use this info; infodump is always good. I do that.
btw I actually have messed around with libgcc-jit and I think at least on x86, it makes zero difference. I once did a test:
– Find /e/ with MAWK -> 0.9s
– Find /e/ with JAWK -> 50s.
No shit! It’s seriously slow.
Now compare this with go-awk: 19s.
Go has reference counting and heap etc, basically a ‘compiled VM’. I think if you want fast code, ditch runtime.
Actually, Android doesn’t really use Dalvik anymore. They still use the bytecode format, but built a new runtime. The architecture of that runtime is detailed on the page I linked. IIRC, Dalvik didn’t cache JIT compilation results and had to redo it every time the application was run.
FWIW, I’ve heard libgcc-jit doesn’t generate particularly high quality code. If the AOT compiled code was compiled with aggressive optimizations and a specific CPU in mind, of course it’ll be faster. JIT compiled code can meet or exceed native performance, but it depends on a lot of variables.
As for mawk vs JAWK vs go-awk, a JIT is not going to fix bad code. If it were a true apples to apples comparison, I’d expect a difference of maybe 30-50%, not ~2 orders of magnitude. A performance gap that wide suggests fundamental differences between the different implementations, maybe bad cache locality or inefficient use of syscalls in the latter two.
On top of that, you’re not really comparing the languages or runtimes so much as their regular expression engines. Java’s isn’t particularly fast, and neither is Go’s. Compare that to Javascript and Perl, both languages with heavyweight runtimes, but which perform extraordinarily well on this benchmark thanks to their heavily optimized regex engines.
It looks like mawk uses its own bespoke regex engine, which is honestly quite impressive in that it performs that well. However, it only supports POSIX regular expressions, and doesn’t even implement braces, at least in the latest release listed on the site: https://github.com/ThomasDickey/mawk-20140914
(The author creates a new Github repo to mirror each release, which shows just how much they refuse to learn to use Git. That’s a respectable level of contempt right there.)
Meanwhile, Java’s regex engine is a lot more complex with more features, such as lookahead/behind and backreferences, but that complexity comes at a cost. Similarly, if go-awk is using Go’s regexp package, it’s using a much more complex regex engine than is strictly necessary. And Golang admits in their own FAQ that it’s not nearly as optimized as other engines like PCRE.
Thus, it’s really not an apples to apples comparison. I suspect that’s where most of the performance difference arises.
Go has reference counting and heap etc, basically a ‘compiled VM’.
This statement is completely wrong. Like, to a baffling degree. It kinda makes me wonder if you’re trolling.
Go doesn’t use any kind of VM, and has never used reference counting for memory management as far as I can tell. It compiles directly to native machine code which is executed directly by the processor, but the binary comes with a runtime baked in. This runtime includes a tracing garbage collector and manages the execution of goroutines and related things like non-blocking sockets.
Additionally, heap management is a core function of any program compiled for a modern operating system. Programs written in C and C++ use heap allocations constantly unless they’re specifically written to avoid them. And depending on what you’re doing and what you need, a C or C++ program could end up with a more heavyweight collective of runtime dependencies than the JVM itself.
At the end of the day, trying to write the fastest code possible isn’t usually the most productive approach. When you have a job to do, you’re going to welcome any tool that makes that job easier.
This statement is completely wrong. Like, to a baffling degree. It kinda makes me wonder if you’re trolling.
No I just struggle at getting my meaning across + these stuff are new to me. What I meant was ‘Go does memory management LIKE a VM does’. Like ‘baking in the GC’. Does that make sense? Or am I still wrong?
I know about all this — I actually began implementing my own JVM language a few days ago. I know Android uses Dalvik btw. But I guess a lot of people can use this info; infodump is always good. I do that.
btw I actually have messed around with libgcc-jit and I think at least on x86, it makes zero difference. I once did a test:
– Find /e/ with MAWK -> 0.9s – Find /e/ with JAWK -> 50s.
No shit! It’s seriously slow.
Now compare this with go-awk: 19s.
Go has reference counting and heap etc, basically a ‘compiled VM’. I think if you want fast code, ditch runtime.
Actually, Android doesn’t really use Dalvik anymore. They still use the bytecode format, but built a new runtime. The architecture of that runtime is detailed on the page I linked. IIRC, Dalvik didn’t cache JIT compilation results and had to redo it every time the application was run.
FWIW, I’ve heard libgcc-jit doesn’t generate particularly high quality code. If the AOT compiled code was compiled with aggressive optimizations and a specific CPU in mind, of course it’ll be faster. JIT compiled code can meet or exceed native performance, but it depends on a lot of variables.
As for mawk vs JAWK vs go-awk, a JIT is not going to fix bad code. If it were a true apples to apples comparison, I’d expect a difference of maybe 30-50%, not ~2 orders of magnitude. A performance gap that wide suggests fundamental differences between the different implementations, maybe bad cache locality or inefficient use of syscalls in the latter two.
On top of that, you’re not really comparing the languages or runtimes so much as their regular expression engines. Java’s isn’t particularly fast, and neither is Go’s. Compare that to Javascript and Perl, both languages with heavyweight runtimes, but which perform extraordinarily well on this benchmark thanks to their heavily optimized regex engines.
It looks like mawk uses its own bespoke regex engine, which is honestly quite impressive in that it performs that well. However, it only supports POSIX regular expressions, and doesn’t even implement braces, at least in the latest release listed on the site: https://github.com/ThomasDickey/mawk-20140914
(The author creates a new Github repo to mirror each release, which shows just how much they refuse to learn to use Git. That’s a respectable level of contempt right there.)
Meanwhile, Java’s regex engine is a lot more complex with more features, such as lookahead/behind and backreferences, but that complexity comes at a cost. Similarly, if go-awk is using Go’s
regexp
package, it’s using a much more complex regex engine than is strictly necessary. And Golang admits in their own FAQ that it’s not nearly as optimized as other engines like PCRE.Thus, it’s really not an apples to apples comparison. I suspect that’s where most of the performance difference arises.
This statement is completely wrong. Like, to a baffling degree. It kinda makes me wonder if you’re trolling.
Go doesn’t use any kind of VM, and has never used reference counting for memory management as far as I can tell. It compiles directly to native machine code which is executed directly by the processor, but the binary comes with a runtime baked in. This runtime includes a tracing garbage collector and manages the execution of goroutines and related things like non-blocking sockets.
Additionally, heap management is a core function of any program compiled for a modern operating system. Programs written in C and C++ use heap allocations constantly unless they’re specifically written to avoid them. And depending on what you’re doing and what you need, a C or C++ program could end up with a more heavyweight collective of runtime dependencies than the JVM itself.
At the end of the day, trying to write the fastest code possible isn’t usually the most productive approach. When you have a job to do, you’re going to welcome any tool that makes that job easier.
No I just struggle at getting my meaning across + these stuff are new to me. What I meant was ‘Go does memory management LIKE a VM does’. Like ‘baking in the GC’. Does that make sense? Or am I still wrong?