Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used to work for Splunk. Querying Splunk with SQL is completely plausible, and something that Splunk has made a number of attempts at over the years.

The problem isn't SQL. It's that Splunk's query engine is tied up internally with a "grammar" that is a direct port of a shell pipeline into C++ with no intermediate representation or anything a compiler guy would recognize as a grammar. There was no design, no mathematical underpinning to it.

Splunk's unstructured log capabilities are really domain knowledge about making them semistructured as fast as possible: token indexing, a lot of effort on recognizing character encodings and timestamps intelligently, looking for key=value pairs, and letting people write regexes to extract fields themselves. The query language isn't somehow designed for a different data model.

In EWD1123, Dijkstra showed that the relational calculus and the regularity calculus (which governs regexes) are basically the same thing. My takeaway from that is that the relational model can be reinterpreted as a model over anything you want to match and manipulate with regexes by just changing the field selectors.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: