That's a really sharp parallel. "Did behavior change" is exactly the question in both cases, and the surface-level representation lies to you in both. We normalize ASTs before hashing so reformatting or renaming a local variable doesn't register as a change. Curious what normalization looks like on the agent observability side, feels like a harder problem when the output is natural language instead of code.
Pretty much, yeah. sem graph builds a cross-file entity dependency graph and sem impact does transitive traversal on it. So you can ask "if I change this function, what breaks?" and get a deterministic answer without sending anything to an LLM.
difftastic is solid. The difference is roughly: syntax-aware (difftastic) knows what changed in the tree, sem knows which entity changed and whether it actually matters. difftastic will show you that a node in the AST moved. sem will tell you "the function processOrder was modified, and 3 other functions across 2 files depend on it." difftastic is a better diff. sem is trying to be a different layer on top of git entirely.
Exactly. The only reason line-level diffs survived this long is that text is the lowest common denominator. Once you have fast enough parsers (tree-sitter parses most files in under 1ms), there's no reason to stay at the line level.
sem already does this. sem graph builds a cross-file entity dependency graph and sem impact tells you "if this function changes, these other entities across these many files are affected." It's transitive too, follows the full call chain.
You're right that "semantic" is doing some heavy lifting in the name. We use it to mean "understands code structure" rather than "understands code meaning." sem knows that something like "validateToken" is a function and tracks it as an entity across versions, but it doesn't know what validation means. For the merge use case (weave - https://ataraxy-labs.github.io/weave/), that level of understanding is enough to resolve 100% of false conflicts on our benchmarks. LLM-powered semantic understanding is the next layer, and that's what our review tool (inspect) does, it uses sem's entity extraction to triage what an LLM should look at.
Different project, same author. sem has been sem since the first commit. Beagle looks interesting, storing ASTs directly in a KV store is a different approach. sem stays on top of git so there's zero migration cost, you keep your existing repo and workflows.
Fair point on AST vs semantic. sem sits somewhere in between. It doesn't go as far as checking compiled output equivalence, but it does normalize the AST before hashing (we call it structural_hash), so purely cosmetic changes like reformatting or renaming a local variable won't show as a diff. The goal isn't "would the compiler produce the same binary" but "did the developer change the behavior of this entity." For most practical cases that's the useful boundary. The YAML/JSON ordering point is interesting, we handle JSON keys as entities so reordering doesn't conflict during merges.
Regarding the custom normalization step, that makes sense, and I don't really have much more to add either. Looked into it a bit further since, it seems that specifically with programming languages the topic gets pretty gnarly pretty quick for various language theory reasons. So the solution you settled on is understandable. I might spend some time comparing how various semantic toolings compare, I'd imagine they probably aim for something similar.
> The YAML/JSON ordering point is interesting, we handle JSON keys as entities so reordering doesn't conflict during merges.
Just to clarify, I specifically meant the ordering of elements within arrays, not the ordering of keys within an object. The order of keys in an object is relaxed as per the spec, so normalizing across that is correct behavior. What I'm doing with these other tools is technically a spec violation, but since I know that downstream tooling is explicitly order invariant, it all still works out and helps a ton. It's pretty ironic too, I usually hammer on about not liking there being options, but in this case an option is exactly the right way to go about this; you would not want this as a default.
Ah right, array ordering not key ordering. That's a different beast. You're making a deliberate semantic choice because you know your consumers are order-invariant. We can't really do that at our level since function ordering in code is usually meaningful to the language. Your use case needs domain knowledge about the consumer, which is exactly why an option makes sense there.
If you do end up comparing semantic toolings I'd love to hear what you find. The space is weirdly fragmented between syntax-aware, normalized-AST, and domain-specific (dyff/jd). Everyone calls it "semantic" but they're solving pretty different problems.
reply