Hacker Newsnew | past | comments | ask | show | jobs | submit | shinypenguin's commentslogin

Benchmark link gives me 404, but I found this link that seems to show the proper benchmarks:

https://fory.apache.org/docs/docs/introduction/benchmark


Is the dataset somewhere accessible? Does anyone know more about the "1T challenge", or is it just the 1B challenge moved up a notch?

Would be interesting to see if it would be possible to handle such data on one node, since the servers they are using are quite beefy.


Hi shinypenguin - the dataset and challenge are detailed here: https://github.com/coiled/1trc

The data is in a publicly accessible bucket, but the requester is responsible for any egress fees...


I suggest linking to that from the article, it is a useful clarification.


Good point - I'll update it...


Hi, thank you for the link and quick response! :)

Do you know if anyone attempted to run this on the least amount of hardware possible with reasonable processing times?


Yes - I also had GizmoSQL (a single-node DuckDB database engine) take the challenge - with very good performance (2 minutes for $0.10 in cloud compute cost): https://gizmodata.com/blog/gizmosql-one-trillion-row-challen...


The One Trillion Row Challenge was proposed by Coiled in 2024. https://docs.coiled.io/blog/1trc.html


Definitely not, I was always in strong technical roles - any pointers where to start with marketing? :)


Thank you for the link, sadly I don't have enough experience with graphdb, so it's outside of my skillset.


My niche is basically - I'm building distributed systems with minimal external dependencies that are fast and work reliably on the minimal amount of hardware/complexity. I do focus mainly on data processing and gathering. The result is, that my client does not need that many servers or that big of a devops team to manage the service and it's reliable and scalable.

For example, I have build events gathering distributed system in Elixir (without external systems) that handled 930m events (33k reqs on peak hours) per day on 2 dedicated servers and that was only because minimal HA was required. It resulted in processing and aggregating of few billions of rows per day, in almost real time (few seconds behind realtime). It's still up to this day, few years later with only outages being updates of OS and Elixir/erlang updates to the app.

I love learning and understanding things - do you know of any niche that would fit mine and where I could go deeper with my knowledge and experience?


I'm deeply engaged in rewriting my own data processing software from Elixir to C. I've already reduced the number of dedicated servers from 3 to 0.1 while scaling traffic and handling larger amounts of data. My goal is to optimize it for Raspberry Pi, just for fun... and it's also more ecologically friendly this way :)

By the way, I'd appreciate a programming partner with whom I can discuss security issues in C code. I would gladly exchange code review sessions. Is anyone interested here?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: