Ex Die in Diem

by

File Formats

I’m preparing for the apocalypse. I can’t put my finger on when this began, but at some point I started thinking really long term. Like, Long Now Foundation long term. At some point, Dropbox will disappear. So will Calepin, Amazon, Apple, probably even Trinity College, although I can picture it outlasting them by centuries. And it’s not just things I like that are transient. Microsoft will die one day, and Michael Gove will too.

When technology companies die, they take with them the proprietary secrets of their methods and formats. This is why standards are so important: HTML isn’t a secret, so if computers are left at all, we should be able to read and format these mainstays of the web. This has become a trait that I value. If I see the last days of electronic computers, I’d like to be able to print out everything I’ve considered worth keeping on file, so that a paper copy will outlive the patterns of magnetic domains on the hard drives we use today.

For this to be feasible in an uncertain future, I like to make sure that I use human-readable formats. If you’re currently thinking that all formats are human readable, I invite you to make a copy of a word file on your computer, rename it test.txt, and open it in notepad. Can you even find your text? The reason that you can’t make sense of it is because the vast majority of that file was instructions to the computer on how to display your words.

That’s also why word processor files are so big: a thousand words are encodable in roughly six kilobytes if all the computer stores is the content. In a .doc file, those same thousand words will take up three orders of magnitude more space. So as well as being easier to read outside of the correct program, plain text files are easier to archive and store. However much space you use to store text documents currently, you could get 99.9% of that back by using plain text instead.

I’m not advocating a regression to unformatted text here: there are many, many markup languages available that are almost as easy to read as unformatted text. I write in three; Markdown I’ve mentioned before, HTML is well known, and LaTeX is invaluable for typesetting equations, automating citations, and corralling very long documents. With documents written in any of these, I can simply save the file with a .txt extension and use literally thousands of programs to fine-tune and edit them, including Word.

Then, when the last days of computers come, I can fire up a Unix-like terminal emulator, cat everything into one massive file, and print the whole damn lot off in a tiny font: the ultimate backup.

This is from the