GlueSQL: SQL database engine as a library

jitl · on Oct 23, 2022

I’ve been thinking about writing a code analysis and refactoring tool that would expose facts like AST node info, expression typings, etc as queryable collections. To make this fast and interactive, you’d only compute the facts needed for a query and ideally incrementally maintain those as files change.

Rust used a Datalog library inside the borrow checker for this; I’d like to bring similar model to more languages so that some kinds of refactors or lints written as queries could potentially be shared between languages - for example, a lint rule that prohibits optional arguments to functions.

I wonder if GlueSQL would work naturally as this kind of “OSQuery for source code”.

neandrake · on Oct 23, 2022

This idea sounds similar to Glean. I haven’t looked too much into it yet but tagged it for later.

https://glean.software/

chc4 · on Oct 23, 2022

CodeQL is exactly the query side of this - it builds a database of facts about your code, which you can query in their language which compiles to datalog, in order to write lints (sometimes in a language-agnostic manner). It doesn't have anything for refactoring, however.

infogulch · on Oct 23, 2022

Another "database toolkit" project that I've recently learned about is Apache DataFusion, also written in rust and uses Arrow memory format:

https://github.com/apache/arrow-datafusion/blob/master/READM...

jitl · on Oct 23, 2022

Ooo, this seems more explicitly supportive of my AST query idea

FridgeSeal · on Oct 23, 2022

Also check out (if you haven’t already), the timely and differential dataflow crates, and what Readyset/Noria are doing with incremental compute. Lots of of super cool stuff happening in the space.

jinmingjian · on Oct 23, 2022

Just another embedded SQL engine.

There are SQLite(OLTP), DuckDB(OLAP) and some engine-based project like mentioned Apache Arrow(https://arrow.apache.org/)(OLAP): Apache Arrow has many language implementations, some do not include the query engine(for example, Rust implementation, which depends on the DataFusion for more SQL-like analytics) in its own repo, but other do include(for example, C++).

There is a comprehensive benchmark by ClickHouse for OLAP but including kinds of embedding engines: https://benchmark.clickhouse.com/

The more interesting is that, in fact, we have not an embedded HTAP engine. One of my database products already implements 3/4 HTAP at the engine layer, but unfortunately it's still just a free software, not an open source implementation.

mhuffman · on Oct 23, 2022

>The more interesting is that, in fact, we have not an embedded HTAP engine.

I know it is not meant to be, but I have found DuckDB to be so fast at transactional queries that for many it would work well as an HATP

0xb0565e487 · on Oct 23, 2022

What's the selling point here?

bruce511 · on Oct 23, 2022

I don't want to dump on someone's project, because we need new projects, and who knows what this will turn into, but...

I too am wondering "why would I use this and not SQLite?" or Firebird.

There may well be reasons, but I couldn't find them on the project front page. For all new projects, in situations like this, breaking into a very mature market, I would recommend that some sort of reason is posted on the front page.

cryptonector · on Oct 23, 2022

"Because it's written in Rust" is a pretty good argument, but you won't be using it instead of SQLite because SQLite is incredibly functional and no new kid on the block can be functional fast enough to overcome that mind share for some time. In the end it will have to have a) close to parity with SQLite in functionality, and b) a better architecture. (b) seems unlikely, and (a) seems unlikely. Plus functionality is not enough -- a fantastic test suite is also needed, and here it's really hard to dethrone SQLite because its best test suite is proprietary. Perhaps provable correctness would be the thing.

> I don't want to dump on someone's project, ...

A lot of these projects are school projects, or personal projects. No need to dump on them. I think it's pretty cool that someone might tackle an RDBMS library, as long as they don't think it's a SQLite killer without understanding the tremendous need for funding that replacing SQLite would require.

fooker · on Oct 23, 2022

>"Because it's written in Rust" is a pretty good argument

..no, it isn't?

That's like saying you would buy a specific car because the factory it was made in is better than other car factories.

It is an argument, yes. Not a pretty good argument unless there's something specific about this project that is better because of Rust.

vaughan · on Oct 23, 2022

> factory…

I agree with your point but the analogy is bad. I would definitely buy a car based on factory. Reliability, order time, parts availability, etc.

fooker · on Oct 23, 2022

>based on factory. Reliability, order time, parts availability, etc.

Do you see the nuance here? The argument only makes sense when the details make sense.

cryptonector · on Oct 24, 2022

Don't they though?

Jweb_Guru · on Oct 23, 2022

Same as for any other embedded database, though I'm afraid my risk tolerance isn't high enough to use one anywhere close to this new for anything I care about. I do think it would benefit significantly from embedded SQL syntax, though (maybe with a procedural macro?).

kg · on Oct 23, 2022

Note that persistent storage appears to rely on 'sled', and according to sled's github readme, sled's persistent storage format is going to change in the future so you would need to manually migrate any databases.

yukIttEft · on Oct 23, 2022

Seeing the examples, I start to wonder if it would be possible to support real sql syntax from within a rust macro, that maybe even has access to local variables like LINQ?