This entry was written 04/21/2005

Topic: Geek stuff

I thought I'd take a look in my OpenOffice files, having heard that the programs use a XML format for the files. I was met with binary rubbish, but I quickly noticed a familiar beginning: this is a Zip file.

Try unzipping an OpenOffice file and you'll find not just one but actually several XML files. For example, a chapter of my master's thesis includes five XML files: there's content, style, meta, settings and a manifest. Meta, for example, contains Dublin Core metadata and statistics like page, paragraph, word and character counts - useful, perhaps? Content has the contents of the file, neatly placed in >p< tags.

The same format is used for presentations and spreadsheets, even for vector art! Everything is described in detail: which element goes where. If you embed something in your document, there's a new object in the file with the same XML files for it. How clever!

There's a project dedicated to the OpenOffice file format. If you're interested, see XML File Format. It's certainly a very elegant and a modern approach to creating a file format.

