Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I recently revisited a language comparison project, a specific benchmark tallying the cycle decompositions in parallel of the 3,715,891,200 signed permutations on 10 letters. I kept a dozen languages as finalists, different philosophies but all choices I could imagine making for my research programming. Rather than "ur" I was looking for best modern realizations of various paradigms. And while I measured performance I also considered ease of AI help, and my willingness to review and think in the code. I worked hard to optimize each language, a form of tourism made possible by AI.

The results surprised me:

             F#  100    19.17s  ±0.04s
            C++   96    19.92s  ±0.13s
           Rust   95    20.20s  ±0.38s
         Kotlin   89    21.51s  ±0.04s
          Scala   88    21.68s  ±0.04s
  Kotlin-native   81    23.69s  ±0.11s
   Scala-native   77    24.72s  ±0.03s
            Nim   69    27.92s  ±0.04s
          Julia   63    30.54s  ±0.08s
          Swift   52    36.86s  ±0.03s
          Ocaml   47    41.10s  ±0.10s
        Haskell   40    47.94s  ±0.06s
           Chez   39    49.46s  ±0.04s
           Lean   10   198.63s  ±1.02s
https://github.com/Syzygies/Compare
 help



Naively this is quite surprising, but the devil is in the details. With the exception of Lean I'd point out they're all fairly close: Chez being 2.5x slower than C++ is not ignorable but it's also quite good for a dynamically-typed JITted language[1]. And I'm not surprised that F# does so well at this particular task. Without looking into it more closely, this seems to be a story about F# on .NET Core having the most mature and painless out-of-the-box parallelism of these languages. I assume this is elapsed time, it would be interesting to see a breakdown of CPU time.

I don't think these results are quite comparable because of slightly differing parallelism strategies; I'd expect the F# implementation of just spinning off threads to be more a little more performant than a Rayon parallel iterator, which presumably has some overhead. But that really just shows how hard it is to do a cross-language comparison; Rust and C++ can certainly be made faster than the F# code by carefully manipulating a ton of low-level OS concurrency primitives. This would arguably also be little misleading. Likewise Chez and Haskell have good C FFI; does that count? It's a tricky and highly qualitative analysis.

[1] FYI, one possible performance improvement with the Chez code is keeping the permutations in fxvectors and replace math operations with the fixnum-specific equivalent - this tells the compiler/interpreter that the data are guaranteed to be machine integers rather than bigints, so they aren't boxed/unboxed. I am not sure without running it myself, but there seems to be avoidable allocations in the Chez implementation. https://cisco.github.io/ChezScheme/csug/objects.html#./objec...


Thank you. I will try your Chez idea. I love Chez, even if coding in Scheme can feel like rubbing sticks together to start a fire on an island, when e.g. Scala has induction ranges. And I didn't try Idris or Racket as they compile to Chez, but perhaps they do so better than I did.

As for parallelism this is a primary concern of mine, and I tried multiple approaches for every language where there was a choice. I used my own work-stealing code only when it beat standard libraries. AI warned me I was in over my head, that writing such a library takes years of experience, but my use case (and my expected use cases in my research) is so uniform that simple can win, minimally touching the required bases such as permuting tasks to avoid false sharing.

I don't believe that the JIT languages (F# on top) do so well because of better parallelism. This is branch optimization. For this use case an AOT compiler with ample benchmark data to influence output should do better. That isn't a thing, and the argument seems to be that few use cases stay consistent. A JIT can adapt.


Yes, Chez improved a bit, at the expensive of readability.

Yeah :/ For a larger program you can pay the readability toll once, via a syntactic form that expands the general vector/arithmetic operations to the fixnum versions, e.g. used something like

  (define (heap-permute! perm j callback)
    (with-context 'fixnum ;; same trick works with 'flonum for 64-bit floats
      (let ([n (length perm)]) ;; actually fxvector-length
       (let generate ([k (- n 1)]) ;; actually fx-
         (if (< k j) ;; fx<
           (callback perm)
           (begin
              (generate (- k 1)) ;; fx-
              (do ([i j (+ i 1)]) ;; fx+
                ((>= i k)) ;; fx>=
                (if (even? (- j k)) ;; fxeven?, fx-
                (swap perm j k)
                (swap perm i k))
                (generate (- k 1)))))))) ;; fx-
Sorry if I borked the indentation. I have been working on stuff like this, and more general macros around dependency injection and inversion of control (e.g. you could write this macro to take the type as a parameter and generate code optimized for 'bigint or 'rational). Maybe check back after the summer :)

And BTW I misspoke earlier, of course Chez is AOT rather than JIT. From one approach it's sort of a hybrid: really fast on-the-fly AOT kinda looks like JIT, tongue-in-cheek you could say "NoT compilation" (nick-of-time). But proper JIT of course has huge advantages. If you reeaaaallly wanted to sabotage readability, Chez makes it easy to invoke the compiler at runtime, so along with the C FFI I think you could hack together some sort of JIT. But wow, what a mess that would be! You'd better be getting a PhD thesis out of it :) And if the performance is that critical you'd be much better off with F#.


I haven't looked into the code, but Lean being so slow may be misleading depending on how you benchmarked it. IMO the fairest test is how "Lean code" (or Rocq code, etc.) is actually run, which is as native C code following extraction.

Given the sane C defaults that are applied by code extraction techniques, the delta really shouldn't be so great. But it's a common pitfall to torture one's own verified code in order to get it proven, and I'm also not sure how good of support there is for parallelism.


None of the other comments are about program benchmarks.

The common theme is diverse notions of a language. When I order from a menu, I don't order based on price, but I prefer to see the prices.

Lean 4 is the most interesting language on my list. I didn't reject it on price.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: