Hobbes – A Language and an Embedded JIT Compiler

kthielen · on Dec 27, 2019

Nice to see this here again. :)

I made this project several years back, it's been very fun and interesting. I've been meaning to write about some parts of it that haven't had much attention.

chubot · on Dec 28, 2019

I'd be interested in hearing about it. From a quick look, it seems like there's a focus on interactively making sense of unstructured data and then cleaning it up? And doing that quickly?

That part sort of overlaps with R, i.e. the "comprehensions" part, although R is pretty weak at parsing and dealing with strings in general. And it's pretty slow for unstructured data, although for structured data it's pretty good with data.table.

kthielen · on Dec 29, 2019

Well one way we deal with typical unstructured data is to prevent the unstructuring in the first place. The low-latency logging method (for example) uses something like a printf-style interface but stores all of the source data with exactly the types you intended to print -- you can always erase the structure by printing when you want to, but having the structure when you need it is very useful.

What "structure" means and how it works can have a lot of nuance. With hobbes we basically start with algebraic data types (which map to most C-style data structures and so can be shared without conversion with C/C++ code). It's been a while since I looked at R, but IIRC it's a lot like Scheme (e.g. maybe the data sharing/translation story is more complicated?).

We do have some things that are helpful for dealing with unstructured text data, like a built-in LALR parser generator and regex matching (integrated with general pattern matching), but it's not one of the main use-cases we've been focused on.

chubot · on Dec 30, 2019

Yeah I read a little more of the site after commenting. At first I thought it was about analytics (hence thinking of R), but it's also about embedding in an application to take action (make trades) as well.

It definitely sounds like an interesting language!

bertr4nd · on Dec 28, 2019

I seem to remember you had a clever story behind the name “Hobbes”, but I can’t remember exactly what it was. If you’d care to repeat it here I’d love to be reminded!

kthielen · on Dec 28, 2019

Oh, well there are two stories actually.

The first story is that it follows the tradition of naming programming languages after logicians/mathematicians (e.g. Haskell, Church, Russell, Pascal, ...). Because the Curry-Howard isomorphism (CH) links logic and programming languages, and ideas in programming languages often come from much older work in logic, it's pretty natural to go to logicians for names (IMHO). Thomas Hobbes was one of the first people to connect logic and computation (literally saying "by reasoning I understand computation") and the logical arithmetic he developed inspired other people to continue that work. I think that story is a reasonable justification, works well for a general audience.

The second story (the real story?) is that Morgan Stanley internally has a tradition of naming things after things in popular culture (and at least one or two cartoons), so I named the project after the comic book characters Calvin and Hobbes (CH). Actually the programming language was "Hobbes" and the compiler was "Calvin" because if you know the comic book, Hobbes is really just a figment of Calvin's imagination (which works well to explain how a compiler makes a programming language real). I am a fan of the comics and was reading them frequently with my son at the time. There's a "Calvin and Hobbes ismorphism" (because both characters are different views of the same mind) so it works really well as an extended metaphor for logic and programming languages, I think. :)

codekilla · on Dec 27, 2019

Which parts?

kthielen · on Dec 29, 2019

Well I've done some work to split small header-only libraries out of the project to make it useful in other places without having to link in everything. Low-latency structured logging, data files, a minimal x86_64 backend (LLVM alternative), structured data compression, network/RPC, message conversion, shared memory data structures, ...

The compression method is pretty neat, has useful applications in other contexts as well. Basically the same way you can view types as propositions, you can view them as probabilities as well (which gives you a scheme for representing the bias in any structured data type). This idea works in other contexts too, like "time compression" for pattern-match translation (choosing an order for tests to match the bias in the data you're deconstructing).

It's hard to find time to write about this stuff with my day job though. :T

Quequau · on Dec 27, 2019

This has been here a few times. Here's a link to a discussion a couple of years ago. I've found it to be an interesting project, though well outside my field of expertise.

https://news.ycombinator.com/item?id=14783539

lelf · on Dec 27, 2019

The language is quite interesting https://github.com/Morgan-Stanley/hobbes#evaluation: e. g. anonymous variants, isorecursive types.