Hacker Newsnew | past | comments | ask | show | jobs | submit | philippemnoel's commentslogin

(ParadeDB maintainer here). This is super cool. Congrats on the project, and I'm excited to see ParadeDB be used to power this kind of use case. If there's anything else you need to ship Omni, don't hesitate to reach out to me!

This is a good time to be offering hybrid search extensions. I just did that myself recently with pgvector for a documentation site.

Does ParadeDB work with Render? They seem to have a whitelist of extensions https://render.com/docs/postgresql-extensions


We just made a blueprint for it! https://github.com/paradedb/render-blueprint

One-click deploy with Render, and we're directly in contact with the core team to get it added to their official docs. I hear the PR is up internally :)


Sweet, nice work

Thanks Philippe! You guys have been super helpful on slack!

Anytime! We have some vector search work coming in the next few weeks/months that I expect you'll find interesting. Stay tuned :)

That's true. For this reason, most modern search engines support language-aware stemming and tokenization. Popular tokenizers for CJK languages include Lindera and Jieba.

We (ParadeDB) use a search library called Tantivy under the hood, which supports stemming in Finnish, Danish and many other languages: https://docs.paradedb.com/documentation/token-filters/stemmi...


ParadeDB | https://paradedb.com | SF Onsite + Remote | Full-Time | Rust Database Engineers

ParadeDB is an alternative to Elasticsearch built on Postgres. We're building a Postgres extension in Rust that offers a new index type optimized for full-text search and aggregate/analytics workloads. We solve three problems with Elasticsearch today:

- Lack of read-after-writes guarantees

- Lack of JOINs

- Infrastructure complexity & cost due to syncing Postgres and Elastic.

We're open-source, and our repository is available at https://github.com/paradedb/paradedb. We're a Series A team of 8 distributed across the US and Canada. Most folks on our team have 10+ years of experience in database internals at companies like Twitter, MongoDB, Oracle, Instacart, etc.

You can find our roles and the profiles of our team members here: https://paradedb.notion.site

If you know Rust and/or have experience working on DB internals and want to work on cool systems problems, shoot us a note. We hire with conviction, have lots of room to grow, and exciting technical problems to solve. My email is phil@paradedb.com.


You don't need to. Customers usually deploy us on a standalone replica(s) on their Postgres cluster. If a query were to take it down, it would only take down the replica(s) dedicated to ParadeDB, leaving the primary and all other read replicas dedicated to OLTP safe.


Are you saying that the cluster isn't homogenous? It sounds like you're describing an architecture that involves a cluster that has two entirely different pieces of software on it, and whose roles aren't interchangeable.


Bear with me, this will be a bit of a longer answer. Today, there are two topologies under which people deploy ParadeDB.

- <some managed Postgres service> + ParadeDB. Frequently, customers already use a managed Postgres (e.g. AWS RDS) and want ParadeDB. In that world, they maintain their managed Postgres service and deploy a Kubernetes cluster running ParadeDB on the side, with one primary instance and some number of replicas. The AWS RDS primary sends data to the ParadeDB primary via logical replication. You can see a diagram here: https://docs.paradedb.com/deploy/byoc

In this topology, the OLTP and search/OLAP workloads are fully isolated from each other. You have two clusters, but you don't need a third-party ETL service since they're both "just Postgres".

- <self-hosted Postgres> + ParadeDB. Some customers, typically larger ones, prefer to self-host Postgres and want to install our Postgres extension directly. The extension is installed in their primary Postgres, and the CREATE INDEX commands must be issued on the primary; however, they may route reads only to a subset of the read replicas in their cluster.

In this topology, all writes could be directed to the primary, all OLTP read queries could be routed to a pool of read replicas, and all search/OLAP queries could be directed to another subset of replicas.

Both are completely reasonable approaches and depend on the workload. Hope this helps :)


Which of these two is the higher order bit?

* ParadeDB speaks postgres protocol

* These setups don't have a complex ETL pipeline

If you have a ETL pipeline specialized for PG logical replication (as opposed to generic JVM based Debizium/Kafka setups), you get some fraction of the same benefits. I'm curious about Conduit and its postgres plugin.

That leaves: ParadeDB uses vanilla postgres + rust extension. This is a technology detail. I was looking for an articulation of the customer benefit because of this technologically appealing architecture.


The value prop for customers vs Elasticsearch are:

- ACID w/ JOINs

- Real-time indexing under UPDATE-heavy workloads. Instacart wrote about this, they had to move away from Elasticsearch during COVID because of this problem: https://tech.instacart.com/how-instacart-built-a-modern-sear...

Beyond these two benefits, then the added benefits are:

- Infrastructure simplification (no need for ETL)

- Lower costs

Speaking the wire protocol is nice, but it's not worth much.


they both sound like postgres to me, just with different extensions


Our customers typically deploy ParadeDB in a primary-replicas topology, with one primary Postgres node and 2 or more read replicas, depending on read volume. Queries are executed on a single node today, yes.

We have plans to eventually support distributed queries.


Yes, Figma!


One of the ParadeDB maintainers here -- Being PostgreSQL wire protocol compatible is very different from being built inside Postgres on top of the Postgres pages, which is what ParadeDB does. You still need the "T" in ETL, e.g. transforming data from your source into the format of the sink (in your example CrateDB). This is where ETL costs and brittleness come into play.

You can read more about it here: https://www.paradedb.com/blog/block_storage_part_one


Sounds very interesting! Unfortunately AGPL license makes it hard to bring into projects.


How so? Many popular projects are AGPL. MinIO, Grafana, etc.

We wrote about this here: https://www.paradedb.com/blog/agpl


So, I'm not versed enough in legal matters to be certain about this, so I tend to fallback to caution, but (A) customers I've worked with in the past seem to be wary of such copyleft licenses and (B) the contagious nature of such license would make me think twice about using it in a project of my own as well.

It would be nice to have such notion challenged but I'm not sure what would change my mind.

I would expect that most commercial companies that use Grafana would obtain a commercial license?


> Postgres is not typically considered to "scale well," but oftentimes this is a statement about its tablespaces more than anything; it has foreign data[4] API, which is how you extend Postgres as single point-of-consumption, foregoing some transactional guarantees in the process. This is how pg_analytics[5] brings DuckDB to Postgres, or how Steampipe[6] similarly exposes many Cloud and SaaS applications. Depending on where you stand on this, the so-called alternative SQL engines may seem like moving in the wrong direction. Shrug.

Maintainer of pg_analytics (now part of pg_search) here. I 100% agree that the statements against Postgres are often exaggerated. In practice, we see both the smallest and the largest companies "just use Postgres" while mid-scale companies often overthink their solution.

That said, there are indeed phenomenal "alternate" SQL engines. I've seen many users see great success on tools like ClickHouse, which ParadeDB is not yet competitive with, and sometimes (dare I say) even Elasticsearch. As for whether this one is one of them... That I couldn't say


Hi folks, ParadeDB author here. We had benchmarks, but they were super outdated. We just made new ones, and will soon make a biiiig announcement with big new benchmarks. You can see some existing benchmarks vs Lucene here: https://www.paradedb.com/blog/case_study_alibaba

This comparison isn't super fair -- ParadeDB does not have compatibility issues with Postgres and rather is directly integrated into Postgres block storage, query planner, and query executor


That's true. We have some more ideas for DataFusion in the works, though... Stay tuned!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: