Fwiw the specific case which motivated that section in the post was a set of recursive functions we used to denormalise an irregular graph structure (so not suitable for CTE) into a single blob of JSON to be sent to another data store. 99% of the time there were no issues with this but at times of heavier load and on complex subgraphs, those recursive call stacks contributed to severe replication lag on the replicas they were running on.
Moving the traversal logic into the application and just sending SQL queries to Postgres (we don't use an ORM) eliminated the lag. RTT between the application and the db was a few ms and this wasn't user-facing logic anyway, so extra latency wasn't an issue in this case.
Probably the fundamental problem here was a sub-optimal schema, but sometimes you're just working with what you've got. Plus a commenter on Reddit pointed out that if we used pure SQL functions instead of PL/pgSQL, we'd also have seen better performance then.
"Probably the fundamental problem here was a sub-optimal schema, but sometimes you're just working with what you've got. Plus a commenter on Reddit pointed out that if we used pure SQL functions instead of PL/pgSQL, we'd also have seen better performance then."
So, would the better advice not have been to use simpler SQL instead of complex recursive statements, instead of taking a drastic approach to abandon ship (move logic to a completely new layer)?
Also, if you're doing string concats manually for your Json, this might cause some overhead for larger objects. ??
> So, would the better advice not have been to use simpler SQL instead of complex recursive statements, instead of taking a drastic approach to abandon ship (move logic to a completely new layer)?
Probably, yep! But I didn't know that when I wrote it.
I didn't want to give any concrete advice at all tbh. The entire rationale for the post was that I'm not an expert and I've broken prod in some surprising ways and if I share those ways maybe it will stop other people making similar mistakes in future. But I guess I over-stepped in my discussion for this mistake, sorry about that.
> Also, if you're doing string concats manually for your Json, this might cause some overhead for larger objects. ??
Good point, I hadn't considered that part of it. It wasn't string concats, we were building it with `jsonb_set`, but I can definitely see the JSON structure in memory as being part of the problem now you mention it (although maybe that reinforces the argument for doing it in the application layer).
Moving the traversal logic into the application and just sending SQL queries to Postgres (we don't use an ORM) eliminated the lag. RTT between the application and the db was a few ms and this wasn't user-facing logic anyway, so extra latency wasn't an issue in this case.
Probably the fundamental problem here was a sub-optimal schema, but sometimes you're just working with what you've got. Plus a commenter on Reddit pointed out that if we used pure SQL functions instead of PL/pgSQL, we'd also have seen better performance then.