I’ve been thinking about writing a code analysis and refactoring tool that would expose facts like AST node info, expression typings, etc as queryable collections. To make this fast and interactive, you’d only compute the facts needed for a query and ideally incrementally maintain those as files change.
Rust used a Datalog library inside the borrow checker for this; I’d like to bring similar model to more languages so that some kinds of refactors or lints written as queries could potentially be shared between languages - for example, a lint rule that prohibits optional arguments to functions.
I wonder if GlueSQL would work naturally as this kind of “OSQuery for source code”.
CodeQL is exactly the query side of this - it builds a database of facts about your code, which you can query in their language which compiles to datalog, in order to write lints (sometimes in a language-agnostic manner). It doesn't have anything for refactoring, however.
Also check out (if you haven’t already), the timely and differential dataflow crates, and what Readyset/Noria are doing with incremental compute. Lots of of super cool stuff happening in the space.
There are SQLite(OLTP), DuckDB(OLAP) and some engine-based project like mentioned Apache Arrow(https://arrow.apache.org/)(OLAP): Apache Arrow has many language implementations, some do not include the query engine(for example, Rust implementation, which depends on the DataFusion for more SQL-like analytics) in its own repo, but other do include(for example, C++).
There is a comprehensive benchmark by ClickHouse for OLAP but including kinds of embedding engines: https://benchmark.clickhouse.com/
The more interesting is that, in fact, we have not an embedded HTAP engine. One of my database products already implements 3/4 HTAP at the engine layer, but unfortunately it's still just a free software, not an open source implementation.
I don't want to dump on someone's project, because we need new projects, and who knows what this will turn into, but...
I too am wondering "why would I use this and not SQLite?" or Firebird.
There may well be reasons, but I couldn't find them on the project front page. For all new projects, in situations like this, breaking into a very mature market, I would recommend that some sort of reason is posted on the front page.
"Because it's written in Rust" is a pretty good argument, but you won't be using it instead of SQLite because SQLite is incredibly functional and no new kid on the block can be functional fast enough to overcome that mind share for some time. In the end it will have to have a) close to parity with SQLite in functionality, and b) a better architecture. (b) seems unlikely, and (a) seems unlikely. Plus functionality is not enough -- a fantastic test suite is also needed, and here it's really hard to dethrone SQLite because its best test suite is proprietary. Perhaps provable correctness would be the thing.
> I don't want to dump on someone's project, ...
A lot of these projects are school projects, or personal projects. No need to dump on them. I think it's pretty cool that someone might tackle an RDBMS library, as long as they don't think it's a SQLite killer without understanding the tremendous need for funding that replacing SQLite would require.
Same as for any other embedded database, though I'm afraid my risk tolerance isn't high enough to use one anywhere close to this new for anything I care about. I do think it would benefit significantly from embedded SQL syntax, though (maybe with a procedural macro?).
Note that persistent storage appears to rely on 'sled', and according to sled's github readme, sled's persistent storage format is going to change in the future so you would need to manually migrate any databases.
Seeing the examples, I start to wonder if it would be possible to support real sql syntax from within a rust macro, that maybe even has access to local variables like LINQ?
Rust used a Datalog library inside the borrow checker for this; I’d like to bring similar model to more languages so that some kinds of refactors or lints written as queries could potentially be shared between languages - for example, a lint rule that prohibits optional arguments to functions.
I wonder if GlueSQL would work naturally as this kind of “OSQuery for source code”.