shinypenguin's comments

shinypenguin · 2025-10-28T19:43:38 1761680618

Benchmark link gives me 404, but I found this link that seems to show the proper benchmarks:

https://fory.apache.org/docs/docs/introduction/benchmark

shinypenguin · 2025-10-24T14:02:38 1761314558

Is the dataset somewhere accessible? Does anyone know more about the "1T challenge", or is it just the 1B challenge moved up a notch?

Would be interesting to see if it would be possible to handle such data on one node, since the servers they are using are quite beefy.

philbe77 · 2025-10-24T14:05:15 1761314715

Hi shinypenguin - the dataset and challenge are detailed here: https://github.com/coiled/1trc

The data is in a publicly accessible bucket, but the requester is responsible for any egress fees...

simonw · 2025-10-24T14:40:00 1761316800

I suggest linking to that from the article, it is a useful clarification.

philbe77 · 2025-10-24T14:55:21 1761317721

Good point - I'll update it...

shinypenguin · 2025-10-24T14:08:49 1761314929

Hi, thank you for the link and quick response! :)

Do you know if anyone attempted to run this on the least amount of hardware possible with reasonable processing times?

philbe77 · 2025-10-24T14:10:41 1761315041

Yes - I also had GizmoSQL (a single-node DuckDB database engine) take the challenge - with very good performance (2 minutes for $0.10 in cloud compute cost): https://gizmodata.com/blog/gizmosql-one-trillion-row-challen...

achabotl · 2025-10-24T16:10:47 1761322247

The One Trillion Row Challenge was proposed by Coiled in 2024. https://docs.coiled.io/blog/1trc.html

shinypenguin · on March 14, 2025

Definitely not, I was always in strong technical roles - any pointers where to start with marketing? :)

shinypenguin · on March 14, 2025

Thank you for the link, sadly I don't have enough experience with graphdb, so it's outside of my skillset.

shinypenguin · on March 14, 2025

My niche is basically - I'm building distributed systems with minimal external dependencies that are fast and work reliably on the minimal amount of hardware/complexity. I do focus mainly on data processing and gathering. The result is, that my client does not need that many servers or that big of a devops team to manage the service and it's reliable and scalable.

For example, I have build events gathering distributed system in Elixir (without external systems) that handled 930m events (33k reqs on peak hours) per day on 2 dedicated servers and that was only because minimal HA was required. It resulted in processing and aggregating of few billions of rows per day, in almost real time (few seconds behind realtime). It's still up to this day, few years later with only outages being updates of OS and Elixir/erlang updates to the app.

I love learning and understanding things - do you know of any niche that would fit mine and where I could go deeper with my knowledge and experience?

shinypenguin · on March 27, 2024

I'm deeply engaged in rewriting my own data processing software from Elixir to C. I've already reduced the number of dedicated servers from 3 to 0.1 while scaling traffic and handling larger amounts of data. My goal is to optimize it for Raspberry Pi, just for fun... and it's also more ecologically friendly this way :)

By the way, I'd appreciate a programming partner with whom I can discuss security issues in C code. I would gladly exchange code review sessions. Is anyone interested here?