This looks very pretty! I like it and it's very minimalist. This is pretty much entirely outside of my area of expertise, but do you have some library for leaflet that you're using from Clojure? Or just JS interop calls?
It's cool to see sleek projects like this. I made an application that needed to make heatmaps, but I just made a grid of colored squares and cropping some GeoJSON contours and I ended up generating SVGs .. A bit goofy reimplementing a mapping library, but I needed to do some heavy math, so this way it was all JVM code
Using leaflet from Clojure is entirely JS interop calls. You can see an example in my other mapping project here (https://github.com/ahmed-machine/mapbh/blob/master/src/app/p...). I'll add that leaflet 2, while still in alpha, is much nicer to use from CLJS as it replaces all factory methods with constructor calls: (e.g. L.marker(latlng) -> new Marker(latlng)). I've been slowly moving my newer mapping projects over to the new version.
Your project sounds really cool, I'd love to read that code. The implementation in this project largely utilises Leaflet's GeoJSON layers (https://leafletjs.com/examples/geojson/) which does render out to SVGs (there's an optional canvas renderer, too). One of the trickier parts was figuring out how to layer each isochrone band so that those closest to the point (i.e. 15 minute band) were painted on top of the bands further away (https://www.geeksforgeeks.org/dsa/painters-algorithm-in-comp...). That and pre-computing the distances per NYC intersection across the tri-state area which required a lot of messing around with OpenTripPlanner configuration files, GTFS data, and parallelising the code to finish in a reasonable time span (few days).
it's honestly nothing too crazy. I don't have any DB or any API calls or anything.
- The gridded data inputs are all "normalized" to GeoTIFF files (you can using gdal and convert netCDF files easily)
- The Java standard library can handle simple GeoTIFF images with BufferedImage
- I do some math of the BufferedImage data and then plot the results using the thing/geom heatmap plot
- Just heatmaps on their own are kinda confusing. You need another layer so you can visualize "where you are". Here I plot coastlines. (you could also do elevation contours)
- There I have contours/coastlines as GeoJSON polygons. With `factual/geo` you can read them in and crop them to your observed region using `geo.jts/intersection`. You can then convert these to a series of SVG Paths (using `thing/geom` hiccup notation) and overlay it on the heatmap
> and parallelising the code to finish in a reasonable time span (few days)
whats the advantage over just manually making a uberjar and using jlink/jpackage?
Do you have the ability to crosscompile to other architectures/OS?
Do you have the ability to generate a plain executable? the jlink/jpackage route ends up generating an "installers" for each system, which i find hard/annoying to test and people are reluctant to install a program you send them
In the past ive ended up distributing an uberjar bc i didnt have the setup to test all the resulting bundles (esp MacOS which requires buying a separate machine). I also found JavaFX to be a bit inconsistent.. though its been a few years and maybe the situation has improved
The main pain point jbundle solves is that jpackage generates installers (.deb, .rpm, .dmg, .msi), not plain executables. jbundle produces a single self-contained binary — just a shell stub concatenated with a compressed payload. You chmod +x it, distribute it, and the user runs ./app. No installation step, no system-level changes.
It also automates the full pipeline (detect build system → build uberjar → download JDK → jdeps → jlink → pack) so you don't need a JDK installed on the build machine — it fetches the exact version from Adoptium. Plus it includes startup optimizations like AppCDS (auto-created on first run, JDK 19+), CRaC checkpoints, and profile-tuned JVM flags for CLI vs server workloads.
Cross-compilation:
Yes — jbundle build --target linux-x64 (or linux-aarch64, macos-x64, macos-aarch64). Since the JAR is platform-independent, it just downloads the appropriate JDK runtime for the target OS/arch from Adoptium and bundles it. You can build a Linux binary from macOS and vice-versa.
Plain executable (not an installer):
That's exactly what jbundle produces. The output is a single file you can scp to a server or hand to someone. On first run it extracts the runtime and jar to ~/.jbundle/cache/ (keyed by content hash), so subsequent runs are instant. No .deb, no .dmg, no "install this first" — just a binary.
For the macOS testing concern: since it's a CLI binary (not a .app bundle), it doesn't require signing/notarization to run. And with --target macos-aarch64 you can build it from a Linux CI without needing a Mac.
usually you reproduce previous research as a byproduct of doing something novel "on top" of the previous result. I dont really see the problem with the current setup.
sometimes you can just do something new and assume the previous result, but thats more the exception. youre almost always going to at least in part reproducr the previous one. and if issues come up, its often evident.
thats why citations work as a good proxy. X number of people have done work based around this finding and nobody has seen a clear problem
theres a problem of people fabricating and fudging data and not making their raw data available ("on request" or with not enough meta data to be useful) which wastes everyones time and almost never leads to negative consequences for the authors
It's often quite common to see a citation say "BTW, we weren't able to reproduce X's numbers, but we got fairly close number Y, so Table 1 includes that one next to an asterisk."
The difficult part is surfacing that information to readers of the original paper. The semantic scholar people are beginning to do some work in this area.
yeah thats a good point. the citation might actually be pointing out a problem and not be a point in favor. its a slog to figure out... but seems like the exact type of problem an LLM could handle
give it a published paper and it runs through papers that have cited it and give you an evaluation
This was a nice intro to AT (though I feel it could have been a bit shorter)
The whole things seems a bit over engineered with poor separation of concerns.
It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that
Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.
Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.
Yes. SSB and ANProto do this. We actually can simply link to a hash of a pubkey+signature which opens to a timestamped hashlink to a record. Everything is a hash lookup this way and thus all nodes can store data.
{:record {:person-key **public-key**
:type :twitter-post
:message "My friend {:person-key **danabramov-public-key**} suggested I make this on this HN post {:link **record-of-hn-post-hash**}. Look at his latest post {:link **danabramov-newtwitter-post-hash** :person-key **danabramov-public-key**} it's very cool!"}
:hash **hash-of-the-record**
:signature **signature-by-author**}
So everything is self-contained. The other features you'd build on top of this basic primitive
- Getting the @danabramov username would be done by having some lookup service that does person-key->username. You could have several. Usernames can be changed with the service.. But you can have your own map if you want, or infer it from github commits :)) There are some interesting ideas about usernames about. How this is done isn't specified by the Record
- Lexicon is also done separately. This is some validation step that's either done by a consumer app/editor of the record or by a server which distributes records (could be based on the :type or something else). Such a server can check if you have less than 300 graphemes and reject the record if it fails. How this is done isn't specified by the Record
- Collection.. This I think is just organizational? How this is done isn't specified by the Record. It's just aggregating all records of the same type from the same author I guess?
- Hashes.. they can point at anything. You can point at a webpage or an image or another record (where you can indicate the author). For dynamic content you'd need to point at webpage that points at a static URL which has the dynamic content. You'd also need to have a hash->content mapping. How this is done isn't specified by the Record
This kind of setup makes the Record completely decoupled from rest of the "stack". It becomes much more of independent moveable "file" (in the original sense that you have at the top) than the interconnected setup you end up with at the end.
- How do you rotate keys? In AT, the user updates the identity document. That doesn't break their old identity or links.
- When you have a link, how do you find its content? In AT, the URL has identity, which resolves to hosting, which you can ask for stuff.
- When aggregating, how do you find all records an application can understand? E.g. how would Bluesky keep track of "Bluesky posts". Does it validate every record just in case? Is there some convention or grouping?
Btw, you might enjoy https://nostr.com/, it seems closer to what you're describing!
1. It's an important problem, but I think this just isn't done at the Record layer. Nor can you? You'd probably want to do that on the person-key->username service (which would have some log-in and way to tie two keys to one username)
2. In a sense that's also not something you think about at the Record level either. It'd be at a different layer of the stack. I'll be honest, I haven't wrapped my head entirely around `did:plc`, but I don't see why you couldn't have essentially the same behavior, but instead of having these unique DID IDs, you'd just use public keys here. pub-key -> DID magic stuff.. and then the rest you can do the same as AT. Or more simply, the server that finds the hashed content uses attached meta-data (like the author) to narrow the search
Maybe there is a good reason the identity `did:plc` layer needs to be baked in to the Record, but I didn't catch it from the post. I'd be curious to here why you feel it needs to be there?
3. I'm not 100% sure I understand the challenge here. If you have a soup of records, you can filter your records based on the type. You can validate them as they arrive. You send your records to the Bluesky server and they validates them as they arrive.
2. The point of the PLC is to avoid tying identity to keys, specifically for the point that if you lose your keys, you lose your identity. In reality, no body wants that as part of the system
3. The soup means you need to index everything. There is no Bluesky server to send things to, only your PDS. Your DID is how I know what PDS to talk to to get your records
I guess what I'm asking is why does connecting multiple public keys to one identity something that has to be part of the Record?
Why is that not happening as a separate process?
I'll try to read up more on this topic! I do feel I'm missing some pieces here
We can have both a directory and use content addressable storage and give people the option of using their own keypairs. They are not mutually exclusive. Bluesky chooses to have a central directory and index.
at://hash at://pub i guess I don't know why we need at:// that seems like something we'd need in Beaker Browser.
After reading the docs at https://atproto.com/specs/at-uri-scheme and building applications that integrate with the Bluesky API, it’s clear why this format is useful for interacting with their system. For developers working outside the ATMosphere, however, it may feel less familiar compared to more conventional REST API patterns.
I only have a limited experience with GUI Widgets - by using JavaFX through `cljfx`
- vui.el seems like the right idea - You have a widget library, and then you add state management, you add layout management etc. It's sort of a blessing widget is simple enough to be reused this way
- ECS and inheritance. I have personally never really hit on this limitation. It's there in the abstract.. but don't GUIs generally fit the OO paradigm pretty well? Looking at the class tree for JavaFX, I don't see any really awkward widgets that are ambiguously placed.
- State Management. This can be bolted on - like with vui.el. But to me this feels like something that should be independent of GUIs. Something like Pathom looks more appealing to me here
- "Not a full reactive framework - Emacs doesn't need that complexity" .. why not? Maybe the library internals are complex, but it makes user code much simpler, no?
On vui.el's approach - yes, the blessing is that widget.el is simple enough to build on. It does the "rendering" and some "behaviour", vui.el handles the rest.
On ECS vs OO - I'll admit I don't have enough experience to speak about UI paradigms in general. But my critique of widget.el is that inheritance hierarchies don't compose well when you need orthogonal behaviors. Composition feels more natural to me - could be just how my brain works, but it scales better in my experience.
On state management being independent - I'd be curious to hear more. Pathom is interesting for data-driven architectures. vui.el's state is intentionally minimal and Emacs-native, but you're right it could potentially be decoupled further.
On "why not full reactive" - to clarify what vui.el has: React-style hooks with explicit dependency tracking (vui-use-effect, vui-use-memo, etc.), state changes trigger re-renders, batching for multiple updates. What it doesn't have: automatic dependency inference or fine-grained reactivity where only specific widgets update. The tradeoff was debuggability - explicit deps are easier to trace than magic. But I'm open to being wrong here. What would you want from a reactive layer?
- "don't compose well when you need orthogonal behavior" ah okay, you've actually hit this case. I guess I haven't done anything gnarly enough to encounter this
> Pathom is interesting for data-driven architectures. vui.el's state is intentionally minimal and Emacs-native, but you're right it could potentially be decoupled further.
I'll be honest, I haven't yet written a Pathom-backed GUI. But I'm hoping to experiment with this in the coming weeks :)) cljfx is structured in such a way that you can either use the provided subscription system or you can roll your own.
> What it doesn't have: automatic dependency inference or fine-grained reactivity where only specific widgets update
So all the derived states are recalculated? Probably in the 95% case this i fine
In the big picture I enjoyed the cljfx subscription system so much, that I'd like to use a "reactive layer" at the REPL and in general applications. You update some input and only the parts that are relevant get updated. With a subscription-style system the downside is that the code is effectively "instrumented" with subscription calls to the state. You aren't left with easily testable function calls and it's a bit uglier.
Pathom kind of solves this and introduces several awesome additional features. Now your "resolvers" can behave like dumb functions that take a map-of-input and return a map-of-output. They're nicer to play with at the REP and are more idiomatic Clojure. On top of that your code turns in to pipelines that can be injected in to at any point (so the API becomes a lot more flexible). And finally, the resolvers can auto parallelized as the engine can see which parts of the dependency graph (for the derived state you're prompting) can be run in parallel.
The downsides are mostly related to caching of results. You need an "engine" that has to run all the time to find "given my inputs, how do I construct the derived state the user wants". In theory these can be cached, but the cache is easily invalidated. You add a key on the input, and the engine has to rerun everything (maybe this can be bypassed somehow?). You also can concoct complex scenarios where the caching of the derived states is non-trivial. Derived state values are cached by the resolvers themselved, but they have a limited view of how often and where they're needed. If two derived states use one intermediary resolver but with different inputs, you need to find a way to adjust the cache size.. Unclear to me how to do this tbh
Thanks for the detailed breakdown on Pathom and cljfx subscriptions - this is exactly the kind of perspective I was hoping to hear.
The resolver model you describe (dumb functions, map-in → map-out, parallelizable) is appealing. It's similar to what I find elegant about React's model too - components as pure functions of props/state. The difference is where the "smarts" live: in the dependency graph engine vs in the reconciliation/diffing layer.
Your point about the 95% case resonates with vui.el's approach. We do have vui-use-memo for explicit memoization — so expensive computations can be cached with declared dependencies. It's the middle ground: you opt-in to memoization where it matters, rather than having an engine track everything automatically.
For typical Emacs UIs (settings panels, todo lists, file browsers), re-rendering the component tree on state change is fast enough that you rarely need it. But when you do — large derived data, expensive transformations — vui-use-memo is there. The tradeoff is explicit deps vs automatic tracking: you tell it what to cache and when to invalidate, rather than the framework inferring it.
That said, I'm planning to build a more complex UI for https://github.com/d12frosted/vulpea (my note-taking library) - browsing/filtering/viewing notes with potentially large datasets. That'll be a real test of whether my performance claims hold up against reality. So ff vui.el ever needs to go there, the component model doesn't preclude adding finer-grained updates later. The should-update hook already lets you short-circuit re-renders, and memoization could be added at the vnode level.
The caching/invalidation complexity you mention is what made me hesitant to start there. "Explicit deps are easier to trace than magic" was the tradeoff I consciously made. But I'm genuinely curious - if you do experiment with Pathom-backed GUI, I'd love to hear how it goes. Especially around the cache invalidation edge cases you mentioned.
I wrote my last message and then thought "oh gosh, it's so long, no one will read it". But I'm glad to find someone thinking about similar problems :))
> It's the middle ground: you opt-in to memoization where it matters, rather than having an engine track everything automatically.
What is the downside of just memoizing everything?
> you tell it what to cache and when to invalidate
I'd be curious to here more on this. I think the conceptual problem I'm hitting is more that these derived state systems (and this goes for a Pathom engine or a subscription model) seem to work optimally on "shallow" derived states. Ex: You have a "username" you fetch his avatar (one derived state) and then render it (maybe another derived state). That's fine.
But if now have a list of users... each has usernames and derived avatars and rendered images.. the list of users is changing with people being added and removed - this get mess.
With Pathom you can make username,avatar,render revolvers for the derived states. It's nicely decoupled and you can even put it in a separate library. With cljfx subscribers you can make subscriptions based on the state + username. It's more coupled, but you nolonger need an engine. Functionally it's the same.
But when it comes to the cache, I don't actually know any system that would handle this "correctly" - clearing cache entries when users are removed. Under the current solutions you seem to only have two solutions:
- You just sort of guess and make a "large enough" FIFO cache. You either eat memory or thrash the cache.
- Make Resolvers/Subscriptions on whole lists. So now you have usernames->avatars->renderings. This makes the derived states "shallow" again. If any entry is changed, the whole lists of derived states are recalculated. Memory optimal and fast to fetch (don't need to look check the memoization cache a ton). But if your list is rapidly changing, then you're doing a ton of recalculation.
I actually haven't worked with React directly, so I'd be curious to know how they propose dealing with this.
> if you do experiment with Pathom-backed GUI, I'd love to hear how it goes
Working in geology, I find the opposite problem. Field work is so highly valued that we're at a place where we have so much data and not enough people really working and analyzing it. My general impression is that in some subfields work that's done exclusively using preexisting data is kind of looked down on. In my opinion tons and tons of money is essentially wasted collecting new data - and then it's poorly catalogued and hard to access. You typically have to email some author and hope they send you the data. People are fiercely protective of their data b/c it took a lot of effort to collect and they want credit and to be in on any derivative work (and not just a reference at the bottom of a paper)
I would say the main workflow is collect some new data nobody has collect before, look at it and see if it shows anything interesting, make up some interesting publishable interpretation.
It feels like it'd be smarter to start with working with existing data and publish that way. If you hit on some specific missing piece, go collect that data, and work from there. But the incentive structures aren't aligned with this
The AI angle is really shoehorned in, but irrelevant to the larger problem. Sure, it allows you to annotate more data. Obviously it's more fun to go do field work than count pollen grains under a microscope. If anything AI make it easier to do more fieldwork and collect even more data b/c now you can in-theory crunch it faster
The current situation with the way big tech plays fast and loose with other people's data, I don't suppose the siloed nature of geological data is going to get better any time soon.
Perhaps creating secure private clouds for scientists, away from AI scrapers etc that scientists can access, with associated counter-surveillance, is the way forward.
I'm a GIS guy working on cloud native tech, but with a focus on privacy. I have a local-first Mac native product nearing beta. I'm thinking a lot about what data sharing options can be at the moment.
i dont see what the problem is. AI is mostly irrelevant. okay they scrape your data.. but then what? If the data isn't offically published and doesnt have a DOI, anything built on that wont be accepted
Some people scrape charts in publications to extract data. This has been done for a while. Maybe AI could automate this step. Thatd be useful
I understand that publications are the currency of academics but they're largely irrelevant in business. Geological data are valuable and if an oil exploration company finds a nice dataset they can scrape, they're not going to publish it.
From a pure business perspective, AI is largely about copyright circumvention. The laws are lagging and people are making serious money from data theft.
Aren't you describing trade secrets? I don't see how AI makes that any better or worse. If your competitor gets his hands on your proprietary dataset you're sunk regardless of AI, right?
I don't see how copyright enters into it. I doubt that "oh hey I published this very valuable and proprietary dataset online but it's copyright me so pretty please don't use it to make money" was ever going to get you anywhere to begin with.
Am I understanding it correctly. So internally if a company is using a competitor's stolen data directly, then if anyone finds out they're in legal trouble. But if they train a model and then use the model, then they're in the clear?
Yes I think there's evidence for that. Looking at recent precedents, even if the data are illegally downloaded, big tech has been getting away with using copyrighted data, for example:
This is largely solved in biomedicine by funders (not journals) and regulatory bodies requiring that human subjects research data be stored with NIH.
I guess there may be a broader and less public-oriented set of funders in geology- and maybe there aren’t as many standardized data types as there are in the world of biology.
what you need is people uploading data in consistent well documented formats. There are all sorts or projects that do this, but there is a strong incentive to not upload things, or sort of half upload it.. but in a way where anyone using it is going to have to reach out to you. Not suggesting bad intentions, Maybe youre still working with the data and expect to publish more and dont want someone swooping in and beating you to the punch. Typically journals require data availability, but its kind of informal and adhoc
It's interesting to contrast with Wikipedia. I'm not deeply involved with either, so I'm talking out of my ass and would be curious to hear other people's thoughts here. But Wikipedia has gone to great lengths to make the data side, Wikidata, and the app/website, decoupled. I'm guessing iNaturalist hasn't?
The OpenStreetMaps model is also interesting. Where they basically only provide the data and expect others to make Apps/Websites
That said, it's also interesting that there hasn't been any big hit with people building new apps on top of Wikidata (I guess the website and Android app are technically different views on the same thing)
I’m not convinced that that’s an accurate view of Wikidata. Wikidata is a basically disconnected project. There is some connection, but it’s really very minimal and only for a small subset of Wikipedia articles. Wikipedia is 99% just text articles, not data combined together.
Frankly, I think the reason people haven’t built apps on top of Wikidata is that the data there isn’t very useful.
I say this not to diss Wikimedia, as the Wikipedia project itself is great and an amazing tool and resource. But Wikidata is simply not there.
I am also frustrated with Wikidata. The one practical use I've seen is a lot of OpenStreetMap places' multilingual names are locked to Wikidata, which makes it harder for a troll to drop in and rename something, and may encourage maintaining and reusing the data.
But I tried to do some Wikidata queries for stuff like: what are all the neighborhoods and districts of Hong Kong, all the counties in Taiwan, and it's piecemeal coverage, tags different from one entity to another, not everything in a group is linked to OSM. It's not a lot of improvement over Wikipedia's Category pages.
Wikidata is a separate project, specifically for structured data in the form of semantic triples [0]. It's essentially the open-source version of Google's KnowledgeGraph; both sourced a lot of their initial data from Metaweb's Freebase [1], which Google acquired in 2010.
> But Wikipedia has gone to great lengths to make the data side, Wikidata, and the app/website, decoupled.
A big part of that is that different language editions of wikipedia are very decoupled. One of the goals of wikidata was to share data between different language wikipedias. It needed to be decoupled so it was equal to all the different languages.
i made a GUI with cljfx which uses JavaFX and I didnt really hit any issues (save for i have one bug on startup that ive had trouble ironing out). The app is snappy and feels as native as anything else
Ended with a very modular functional GUI program
the only thing i wasnt super happy about is that i couldnt package it as a simple .bin/.exe bc the jpackage system forces you into making an installer/application (its been a few years since, so its possible theres a graal native solution now)
i highly recommend cljfx. Its the opposite of clunky
Honor strangely enough doesnt make any efforts to really support Linux
The machine quality is pretty damn good, but Huawei machines are still better. Apple level of quality. And Huawei releases their machines with Linux preinstalled
The company to watch is Wiko. Its their French spin off to sidestep their chip ban. They might put out some very nice laptops, but a bit tbd
It's cool to see sleek projects like this. I made an application that needed to make heatmaps, but I just made a grid of colored squares and cropping some GeoJSON contours and I ended up generating SVGs .. A bit goofy reimplementing a mapping library, but I needed to do some heavy math, so this way it was all JVM code
reply