Hacker Newsnew | past | comments | ask | show | jobs | submit | ashvardanian's commentslogin

Cool project! And thanks for mentioning "unum-cloud/USearch" among repo examples :)

Every founder probably dreams of a press release like this — complete with testimonials from the CEOs of OpenAI, Anthropic, Meta, xAI, Microsoft, CoreWeave, AWS, Google, Oracle, Dell, HPE, and Lenovo.

There aren’t many technical details about the new GPUs yet, but the notes on the Vera CPU caught my eye. NVIDIA Spatial Multithreading sounds like their take on SMT — something you don’t usually see on Arm-based designs. Native FP8 support is also notable, though it’s still unclear how it will be exposed to developers in practice.

Overall it looks like an interesting CPU, but it doesn’t feel like it’s in the same league as the rumored Apple M5 Ultra.


> though it’s still unclear how it will be exposed to developers in practice.

PTX instructions, and compiler intrinsics, depending on which level of abstraction you’re targeting.


PTX is on the GPU side and is already supported on available models. On the CPU side, it must be some form of an Arm ISA extension, I believe, like NEON-FHM or SVE-AES… I'm just not sure what the scope of those extensions would be and how they will coexist with ARM’s other extensions.

I’m currently using a mix of Zed, Sublime, and VS Code.

The biggest missing piece in Zed for my workflow right now is side-by-side diffs. There’s an open discussion about it, though it hasn’t seen much activity recently: https://github.com/zed-industries/zed/discussions/26770

Stronger support for GDB/LLDB and broader C/C++ tooling would also be a big win.

It’s pretty wild how bloated most software has become. Huge thanks to the people behind Zed and Sublime for actively pushing in the opposite direction!


> The biggest missing piece in Zed for my workflow right now is side-by-side diffs.

> It’s pretty wild how bloated most software has become.

It's a bit ironic to see those two in the same message but I'd argue that right there is an example of why software becomes bloated. There is always someone who says "but it would be great to have X" that in spirit might be tangentially relevant, but it's a whole ordeal of its own.

Diffing text, for example, requires a very different set of tools and techniques than what just a plain text editor would already have. That's why there are standalone products like Meld and the very good Beyond Compare; and they tend to be much better than a jack of all trades editor (at least I was never able to like more the diff UI in e.g. VSCode than the UI of Meld or the customization features of BC).

Same for other tangential stuff like VCS integration; VSCode has something in there, but any special purpose app is miles ahead in ease of use and features.

In the end, the creators of an editor need to spend so much time adding what amounts to suplemental and peripheral features, instead of focusing on the best possible core product. Expectations are so high that the sky is the limit. Everyone wants their own pet sub-feature ("when will it integrate a Pomodoro timer?").


This is a sharp observation, and it goes even further: BeyondCompare easily allows one to hop into an editor at a specific location, while Total Commander, with its side-by-side view of the world, is n excellent trampoline into BeyondCompare. In this kind of ecosystem (where visual tools strive to achieve some Unix-like collaboration), the super power of editors (and IDEs) is their scripting language, and in this arena it is still hard to beat Emacs (with capabilities that were present maybe 40 years ago).

Fully agree.

People call "bloat" the features they don't need, and "deal breakers" the lack of features they want besides good text editing.


I agree with that comment chain about the intelliJ diff view so much

https://github.com/zed-industries/zed/discussions/26770#disc...

I don't even need that to be built into the editor – I would pay for a fast, standalone git UI that is as good as the IntelliJ one. I use Sublime Merge right now and it's kind of ok but definitely not on the same level


I dunno if you are aware, but you can use the diff/merge standalone. My gift to you:

     [mergetool "intellij"]
     cmd = 'intellij-idea-ultimate-edition' merge "$LOCAL" "$REMOTE" "$BASE" "$MERGED"
     trustExitCode = true

I mostly use git from the terminal, but the goodness of the IntelliJ UI for cherry-picking changes is one of the things that has me maintaining my toolbox subscription. Also, IdeaVim is a really solid vim implementation, IMO.

If they factor out the VCS UI into a standalone non-IDE product that starts and runs a little faster than their IDEs and doesn't care about your project setup I'd pay a subscription even

> I’m currently using a mix of Zed, Sublime, and VS Code.

Can you elaborate on when you use which editor? I'd have imagined that there's value in learning and using one editor in-depth, instead of switching around based on use-case, so I'd love to learn more about your approach.


Different user, but I prefer to use different editors for:

- project work, i.e. GUI, multiple files, with LSP integration (zed)

- one-off/drive-by edits, i.e. terminal, small, fast, don't care much about features (vim)

- non-code writing, i. e. GUI, different theme (light), good markdown support (coteditor)

I don't like big complex software, so I stay away from IDEs; ideally, I'd like to drop zed for something simpler, without AI integration, but I haven't found anything that auto-formats as well.


My workflow isn't very common. I typically have 3-5 projects open on the local machines and 2 cloud instances - x86 and Arm. Each project has files in many programming languages (primarily C/C++/CUDA, Python, and Rust), and the average file is easily over 1'000 LOC, sometimes over 10'000 LOC.

VS Code glitches all the time, even when I keep most extensions disabled. A few times a day, I need to restart the program, as it just starts blinking/flickering. Diff views are also painfully slow. Zed handles my typical source files with ease, but lacks functionality. Sublime comes into play when I open huge codebases and multi-gigabyte dataset files.


in my case, I use zed for almost everything, and vscodium for three things:

search across all files; easier to navigate the results with the list of matching lines in the sidebar, and traversing the results with cursor up/down, giving full context

git; side-by-side diff, better handling of staging, and doesn't automatically word-wrap commit messages (I prefer doing that myself)

editing files which have a different type of indentation than what is configured in zed, since zed does not yet have autodetect


It looks like the most recent thread has noted an early feature flag for testing that!

https://github.com/zed-industries/zed/discussions/26770#disc...



Have you looked at the size of Zed binary lately? How is it pushing against bloat, especially compared to Sublime?

Not a fan of Windows either, but playing devil’s advocate here: Apple’s Finder has steadily gotten worse over the last ~16 years, at least in my experience. It increasingly struggles with basic functionality.

There seems to be a pattern where higher market cap correlates with worse ~~tech~~ fundamentals.


why would a company be incentivized to improve the user experience in ways that aren't profitable ? especially after watching the number one tech company literally worsen UX to increase profitability


Thanks a lot for the correction! I'll adjust the references in a bit.


I was just about to ask some friends about it. If I’m not mistaken, Postgres began using ICU for collation, but not string matching yet. Curious if someone here is working in that direction?


Levenshtein distance calculations are a pretty generic string operation, Genomics happens to be one of the domains where they are most used... and a passion of mine :)


This is a very good example! Still, “correct” needs context. You can be 100% “correct with respect to ICU”. It’s definitely not perfect, but it’s the best standard we have. And luckily for me, it also defines the locale-independent rules. I can expand to support locale-specific adjustments in the future, but waiting for the adoption to grow before investing even more engineering effort into this feature. Maybe worth opening a GitHub issue for that :)


Right, nothing wrong with delegating the decision to a bunch of people who have thought long and hard about the best compromise, as long as it’s understood that it’s not perfect.


This article is about the ugliest — but arguably the most important — piece of open-source software I’ve written this year. The write-up ended up long and dense, so here’s a short TL;DR:

I grouped all Unicode 17 case-folding rules and built ~3K lines of AVX-512 kernels around them to enable fully standards-compliant, case-insensitive substring search across the entire 1M+ Unicode range, operating directly on UTF-8 bytes. In practice, this is often ~50× faster than ICU, and also less wrong than most tools people rely on today—from grep-style utilities to products like Google Docs, Microsoft Excel, and VS Code.

StringZilla v4.5 is available for C99, C++11, Python 3, Rust, Swift, Go, and JavaScript. The article covers the algorithmic tradeoffs, benchmarks across 20+ Wikipedia dumps in different languages, and quick starts for each binding.

Thanks to everyone for feature requests and bug reports. I'll do my best to port this to Arm as well — but first, I'm trying to ship one more thing before year's end.


This is exactly the kind of thankless software which the world operates on. It’s unfortunate that such fundamental code hasn’t already been vectorized or the gills, but thank you for doing so! It’s excellent work


Thank you for this, and congrats on your achievement!


> I grouped all Unicode 17 case-folding rules

But why are you using the case-folding rules and not the collation rules?


Yes, CaseFolding.txt. I'm considering using the collation rules for sorting. Now they only target lexicographic comparisons and seem to be 4x faster than Rust's standard quick-sort implementation, but few people use it: https://github.com/ashvardanian/StringWars?tab=readme-ov-fil...


This is a truly amazing accomplishment. Reading these kernels is a joy!


Thank you

do the go bindings require cgo?


The GoLang bindings – yes, they are based on cGo. I realize it's suboptimal, but seems like the only practical option at this point.


In a normal world the Go C FFI wouldn't have insane overhead but what can we do, the language is perfect and it will stay that way until morale improves.

Thanks for the work you do


There are undoubtedly still some optimizations lying around, but the biggest source of Go's FFI overhead is goroutines.

There's only two "easy" solutions I can see: switch to N:N threading model or make the C code goroutine-aware. The former would speed up C calls at the expense of slowing down lots of ordinary Go code. Personally, I can still see some scenarios where that's beneficial, but it's pretty niche. The latter would greatly complicate the use of cgo, and defeat one of its core purposes, namely having access to large hard-to-translate C codebases without requiring extensive modifications of them.

A lot of people compare Go's FFI overhead to that of other natively compiled languages, like Zig or Rust, or to managed runtime languages like Java (JVM) or C# (.NET), but those alternatives don't use green threads (the general concept behind goroutines) as extensively. If you really want to compare apples-to-apples, you should compare against Erlang (BEAM). As far as I can tell, Erlang NIFs [1] are broadly similar to purego [2] calls, and their runtime performance [3] has more or less the same issues as CGo [4].

[1]: https://www.erlang.org/doc/system/nif.html

[2]: https://pkg.go.dev/github.com/ebitengine/purego

[3]: https://erlang.org/documentation/doc-10.1/doc/efficiency_gui...

[4]: https://www.reddit.com/r/golang/comments/12nt2le/when_dealin...


Java has green threads and c#/.net has logical threads


Yes, I have cleaned up the wording a bit. Also, the common implementation of Rust's async is comparable to green threads, and I think Zig is adopting something like it too.

However, the "normal" execution model on all of them is using heavyweight native threads, not green threads. As far as I can tell, FFI is either unsupported entirely or has the same kind of overhead as Go and Erlang do, when used from those languages' green threads.


Genuine question, you make it seem as this is a limitation and they're all in the same bucket but how was Java for example able to scale all the enterprises while having multi threading and good ffi, same with .net.

My impression is that the go ffi is with big overhead because of the specific choices made to not care about ffi because it would benefit the go code more?

My point was that there's other gc languages/envorionments that have good ffi and were somehow able all these decades to create scalable multithreaded applications.


I would suggest gaining a better understanding of the M:N threading model versus the N:N threading model. I do not know that I can do it justice here.

Both Java and Rust flirted with green threads in their early days. Java abandoned them because the hardware wasn't ready yet, and Rust abandoned them because they require a heavyweight runtime that wasn't appropriate for many applications Rust was targeting. And yet, both languages (and others besides) ended up adding something like them in later anyway, albeit sitting beside, rather than replacing, the traditional N:N threading they primarily support.

Your question might just be misdirected; one could view it as operating systems, and not programming languages per se, that screwed it all up. Their threads, which were conservatively designed to be as compatible as possible with existing code, have too much overhead for many tasks. They were good enough for awhile, especially as multicore systems started to enter the scene, but their limitations became apparent after e.g. nginx could handle 10x the requests of Apache httpd on the same hardware. This gap would eventually be narrowed, to some extent, but it required a significant amount of rework in Apache.

If you can answer the question of why ThreadPoolExecutor exists in Java, then you are about halfway to answering the question about why M:N threading exists. The other half is mostly ergonomics; ThreadPoolExecutor is great for fanning out pieces of a single, subdividable task, but it isn't great for handling a perpetual stream of unrelated tasks that ebb and flow over time. EDIT: See the Project Loom proposal for green threads in Java today, which also brings up the ForkJoinPool, another approach to M:N threading: https://cr.openjdk.org/~rpressler/loom/Loom-Proposal.html


In a real (not "normal") world, trade-offs exist and Go choose a specific set of design points that are consequential.


Yes — fuzzy and phonetic matching across languages is part of the roadmap. That space is still poorly standardized, so I wanted to start with something widely understood and well-defined (ICU-style transforms) before layering on more advanced behavior.

Also, as shown in the later tables, the Armenian and Georgian fast paths still have room for improvement. Before introducing higher-level APIs, I need to tighten the existing Armenian kernel and add a dedicated one for Georgian. It’s not a true bicameral script, but some characters are folding fold targets for older scripts, which currently forces too many fallbacks to the serial path.


Even when transliteration is somewhat de-facto standardised, it usually is dependent on the target/host language. So e.g. Arabic & Russian are transliterated differently in e.g. English, French, German, Dutch, etc.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: