Those are rhetorical questions. I ask them because these are questions you need to know in order to work well with SQL. I cannot become a master of SQL by only learning SQL. I have to learn the specific implementations, I have to run EXPLAIN... etc...
A good abstraction only requires you to know the abstraction not what lies underneath. What we have with SQL is a leaky abstraction. My argument that a high level leaky abstraction placed over a critical bottleneck in the web is a design mistake.
Lol, of course you need a general understanding of how your database works on the inside. There are a million different ways you can store your data and I would argue that choosing your data storage is the most important and tricky decision we have to make as software engineers.
Back at my first big tech company, I remember reading the best document I have ever read related to software engineering. It was entirely devoted to choosing your database/storage system. The very first paragraph of the document was entirely devoted to engraining in your head that "choosing a database is all about tradeoffs". They even had a picture where it just repeated that sentence over and over to really engrain it in you.
Why? Because every database has different performance characteristics such as consistency, latency, scalability, typing, indexing, data duplication and more. You really need to think about each and every one because choosing the wrong database/not using it correctly usually cause the biggest problems/most work to solve that you will ever have to face.
>Lol, of course you need a general understanding of how your database works on the inside. There are a million different ways you can store your data and I would argue that choosing your data storage is the most important and tricky decision we have to make as software engineers.
You aren't responding to my argument, everything you said is something I already know. So lol to you. You're making a remark and extending the conversation without addressing my main point. I'm saying that the fact that you need "a general understanding of how a database works on the inside" is a design flaw. It's a leaky abstraction.
A C++ for loop has virtually the same performance across all systems/implementations; if I learn C++ I generally don't need to understand implementation details to know about performance metrics. Complexity theory applies here.
For "SELECT * FROM TABLE", I have to understand implementation. This is a highly different design decision from C++. My argument is that this high level language is a bad design choice to be placed over the most critical bottleneck of the web: the database.
The entire reason why we can use slow ass languages like php or python on the web is because the database is 10x slower. Database is the bottleneck. It would be smart to have a language api for the database to be highly optimize-able. The problem with SQL is that it is a high level leaky abstraction so optimizing SQL doesn't involve using complexity theory to write a tighter algorithm. It involves memorizing SQL hacks and gotchas and understanding implementation details. This is why SQL is a bad design choice. Please address this reasoning directly rather than regurgitating common database knowledge.
> A C++ for loop has virtually the same performance across all systems/implementations;
> For "SELECT * FROM TABLE", I have to understand implementation
This is not even remotely an apples-to-apples comparison. One is a fairly simple code construct that executes locally. The other is a call to a remote service.
It doesn't matter if the language you use to write it is XML, JSON, protocol buffers or SQL, any and all calls across an RPC boundary are going to have unknown performance characteristics if you don't understand how the remote service is implemented. If you are the implementer, and you still choose not to understand how it works, that's your choice, not the tool's. Every serious RDBMS comes with a TFM that you can R at any time. And there are quite a few well-known and product-agnostic resources out there, too, such as Use the Index Luke.
Alternatively, feel free to write your own alternative in C++ so that you can understand how it works in detail without having to read any manuals. It was quite a vogue for software vendors to sink a few person-years into such ventures back in the 90s. Some of them were used to build pretty neat products, too. Granted, they've all long since either migrated to a commodity DBMS or disappeared from the market, so perhaps we are due for a new generation to re-learn that lesson the hard way all over again.
>This is not even remotely an apples-to-apples comparison. One is a fairly simple code construct that executes locally. The other is a call to a remote service.
>It doesn't matter if the language you use to write it is XML, JSON, protocol buffers or SQL, any and all calls across an RPC boundary are going to have unknown performance characteristics if you don't understand how the remote service is implemented.
Dude, then put your database on a local machine and execute it locally or do an http RPC call to your server and have the web app run a for loop. Whether it is a remote call or not the code gets executed on a computer regardless. This is not a factor. RPC is a bottleneck but that's a different type of bottleneck that's handled on a different layer. I'm talking about the slowest part of code executing on a computer not Passing an electronic message across the country.
So whether you use XML, JSON, or SQL it matters because that is the topic of my conversation. Not RPC boundaries.
>If you are the implementer, and you still choose not to understand how it works, that's your choice, not the tool's. Every serious RDBMS comes with a TFM that you can R at any time. And there are quite a few well-known and product-agnostic resources out there, too, such as Use the Index Luke.
I choose to understand a SQL implementation it because I have no choice. Like how a front end developer has no choice but to deal with the headache that is CSS or javascript.
Do you try to understand how C++ compiles down into assembler? For virtually every other language out there in existence I almost never ever have to understand the implementation to write an efficient algorithm. SQL DBs are the only technologies that force me to do this on a Regular Basis. Heck they even devoted a keyword called 'EXPLAIN' to let you peer under the hood. Good api's and good abstractions hide implementation details from you. SQL does not fit this definition of a good API.
If that doesn't stand out like a red flag to you, then I don't know what will.
>Alternatively, feel free to write your own alternative in C++ so that you can understand how it works in detail without having to read any manuals. It was quite a vogue for software vendors to sink a few person-years into such ventures back in the 90s. Some of them were used to build pretty neat products, too. Granted, they've all long since either migrated to a commodity DBMS or disappeared from the market, so perhaps we are due for a new generation to re-learn that lesson the hard way all over again.
In the 90s? Have you heard of NOSQL? This was done after the 90s and is still being done right now. There are alternative implementations to database API's that DON'T INVOLVE SQL. The problem isn't about re-learning, the problem is about learning itself. Learn a new paradigm rather than remark to every alternative opinion with a sarcastic suggestion: "Hey you don't like Airplanes well build your own Airplane then... "
> I'm saying that the fact that you need "a general understanding of how a database works on the inside" is a design flaw. It's a leaky abstraction.
And I just said that "choosing a database is all about tradeoffs" which you need to understand (aka: the leaky abstractions).
> A C++ for loop has virtually the same performance across all systems/implementations
> For "SELECT * FROM TABLE", I have to understand implementation.
No you don't, it has the same performance: a for loop. However, by grouping all of your data onto 1 server, for loops are much more costly than the likely orders of magnitude more regular servers you have than a database. Fortunately, your SQL database supports indexes which speed up those queries. Granted, I'm no database expert, but adding the right indexes and making sure your queries utilize them have solved pretty much every scaling problem I have thrown at them.
> It would be smart to have a language api for the database to be highly optimize-able. The problem with SQL is that it is a high level leaky abstraction so optimizing SQL doesn't involve using complexity theory to write a tighter algorithm. It involves memorizing SQL hacks and gotchas and understanding implementation details.
It is optimizable and 90% of those optimizations I have made simply involve adding an index and then running a few explains/tests to make sure you are using them properly.
If you'll only answer me this though, what database would you recommend than? I'm dying to know since you think you know better and google, a company that probably has more scaling problems than anyone else, doubled down on SQL with spanner which from what I have read, requires even more actual fine tuning.
>And I just said that "choosing a database is all about tradeoffs" which you need to understand (aka: the leaky abstractions).
And I'm saying the tradeoff of using a leaky abstraction is entirely the wrong choice. A hammer vs a screwdriver each have tradeoffs but when dealing with a nail, use a hammer, when dealing with a screw use a screw driver. SQL is a hammer to a database screw.
>No you don't, it has the same performance: a for loop. However, by grouping all of your data onto 1 server, for loops are much more costly than the likely orders of magnitude more regular servers you have than a database.
See you don't even know what algorithm most SQL implementations use when doing a SELECT call. It really depends on the index but usually it uses an algorithm similar to binary search off of an index that is basically a binary search tree. It's possible to index by a hash map as well, but you don't know any of this because SQL is such a high level language. All you know is that you add an index and everything magically scales.
>Fortunately, your SQL database supports indexes which speed up those queries. Granted, I'm no database expert, but adding the right indexes and making sure your queries utilize them have solved pretty much every scaling problem I have thrown at them.
Ever deal with big data analytics? A typical SQL DB can't handle the million row multi dimensional group bys. Not even your indexes can save you here.
>It is optimizable and 90% of those optimizations I have made simply involve adding an index and then running a few explains/tests to make sure you are using them properly.
I don't have to run an EXPLAIN on any other language that I have ever used. Literally. There is no other language on the face of this planet where I had to regularly go down into the lower level abstraction to optimize it. When I do it's for a rare off case. For SQL it's a regular thing... and given that SQL exists at the bottleneck of all web development this is not just a minor flaw, but a huge flaw.
>If you'll only answer me this though, what database would you recommend than? I'm dying to know since you think you know better and google, a company that probably has more scaling problems than anyone else, doubled down on SQL with spanner which from what I have read, requires even more actual fine tuning.
I don't know if you're aware of CSS or javascript and the glaring flaws everybody complains about front end web development but it's a good analogy to what you're addressing here. Javascript and CSS are universally known to have some really stupid flaws yet both technologies are ubiquitous. No one can recommend any alternative because none exists. SQL is kind of similar. The database implementations and domain knowledge have been around so long that even alternative NO-SQL technologies have a hard time over taking SQL.
Which brings me full circle back to the front end. WASM is currently an emerging contender with javascript for the front end. Yet despite the fact that WASM has a better design then JS (not made in a week) current benchmarks against googles V8 javascript engine indicate that WASM is slower then JS. This is exactly what's going on with SQL and NO-SQL. Google hiring a crack team of genius engineers to optimize V8 for years turning a potato into a potato with a rocket booster has made the potato faster then a formula one race car (WASM)
These problems are not unique to SQL, they are issues with any datastore. And if you're building your architecture correctly, the datastore is always going to be your bottleneck.
I'm not talking about solving the bottleneck. No api can change that. I'm talking about mitigating the effects of this bottleneck. Namely, SQL is a bad design decision for this area of web dev.
A good abstraction only requires you to know the abstraction not what lies underneath. What we have with SQL is a leaky abstraction. My argument that a high level leaky abstraction placed over a critical bottleneck in the web is a design mistake.