But at least you have a fighting chance. What if that exact same data was dumped into a binary file, that you did not know how to decode?
Originally, you had a problem - the data wasn't formatted in a manner that you could parse cleanly.
Now, you have a new problem - not only is the data not formatted properly, it's now in some opaque binary file.
Saying that there are poorly formatted text files isn't a hit against text files, it's a hit against poor formatting. The exact same problem exists if the file is in binary form, and not formatted properly.
> a binary file, that you did not know how to decode
I guess nobody ever advocated putting stuff in a binary file with an undefined format. Databases, syslog-ng, elasticsearch and the systemd journal all have a defined format with plenty of tools to access the data in a more structured way (eg. treating dates as dates and matching on ranges).
I agree the issue at hand is not just binary vs. plain text, it's more "how much you want to structure your data".
The classic syslog format is very loosely defined, with every application defining its own dialect, each with its own way to separate fields and handle escaping. To fix that you could store the log data as JSON as many online services are doing. But once you have JSON, grep is no longer enough to properly handle the data even if it's still plain text. Now that you have both a quite verbose format on disk and the need for custom tools, why not store the log as binary encoded JSON (eg. something like JSONB in PostgreSQL)? Or make it even more efficient with an format optimized for the specific usage? Add some indexes and you get more or less what databases, ElasticSearch and the journal do.
Also keep in mind that most of the logs right now gets rotated and compressed with gzip, I'd doubt that the above binary formats are less resilient to errors than a gzip stream.
That's what the grandparent was explaining though. We have near-ubiquitous tools for dealing with plaintext files. Every Linux admin knows them and uses them in many more situations than just log files. They can be scripted and piped, and an admin worth his salt could easily find the info he needs with them.
A binary file from whatever logging system, OTOH, is effectively proprietary. Even if the logging system provides you with tools to work on them, you have to 1) know that it's a log file for that logging system, and 2) be familiar enough with the tools in order to work with it.
And the specs will be gone in 40 years. While ASCII will stick around.
Why would they be gone? You realize ASCII is a 'spec' too?
If a binary format has an open specification, it's as future proof as ASCII. ASCII's durability is due to a clear and open specification that's easily implemented. Not some magic sauce that makes it instantly human readable.
That text you see? It's not what's actually in the file. That's just 1's and 0's like every other format. There's literally no difference between ASCII and any other "binary" format.
Does that really matter? Log files are often unimportant when they get over a month or two old, what is it in your log files that has to be kept for 40 years?
Longevity of log files hardly seems like a reason to pick an otherwise inferior format.
It is not about reading 40 years old logs, but rather reading logs from today generated by 40 years old system.
For example, many nuclear power plant in the west were built 40 years ago. Amongst the myriad of sensors, devices in a power plant, I think that most of them are outputting ASCII logs. There are still readable today. (Same can be said about avionics, space probes, etc.)
Now imagine yourself 40 years from now on, trying to fix or reverse engineer a very legacy system, you will have to recompile a journalctl from 40 years ago before being able to read anything.
There's a good chance that you'd be reading EBCDIC logs. :)
40 years from now, you will probably be able to invoke journalctl on the system and parse the dumped output as plain text. Or call gunzip on the compressed logs, $DEITY knows if we will be still using gzip by then. And if the system does not boot, you won't be able to connect the peripherals anywhere else... :)
There's no tool out there that generates log files it cant itself read. So there's not going to be any "oh gee I have these files being generated and nothing can read them" situation.
However, there is just about near-zero system out there that generates text logs that it can itself read. Text logs are write-only for most logging systems, while all binary logs I know of are read+write.
Stepping back though this entire argument is absurd. Thinking about "whatever will those people do 40 years from now with the tools of today" is fairly braindead once you understand that the quality of the tools will affect their longevity. So if the logging system becomes an actual, factual problem over time, the tools will die off by naturally-artificial selection.
I have already worked on very basic embedded system where you only way of getting logs is connecting to the device using a serial line, and after fiddling a bit with the baud rate, you can get some readable output.
In this case, you can't really do anything from the device itself.
Arguably, this is not the use case for binary logger but I was originally addressing the "40 years old logs" argument, that do exist in the real world.
> There's no tool out there that generates log files it cant itself read.
There are plenty of tools that don't read their logs - more precisely, computer units where you don't log in, units that you don't operate on console. Embedded devices that perform some function and also keep some log, but which cannot be used for reading that log. You will need to read that log using something else. Plain text (ASCII, and now ISO Latin and UTF-8) is a fairly stable format for everything, and will be for the next 50 years.
People usually read log files because something went wrong, like a system crash, why do you assume the OS that generated the log file will be readily available?
Originally, you had a problem - the data wasn't formatted in a manner that you could parse cleanly.
Now, you have a new problem - not only is the data not formatted properly, it's now in some opaque binary file.
Saying that there are poorly formatted text files isn't a hit against text files, it's a hit against poor formatting. The exact same problem exists if the file is in binary form, and not formatted properly.