Tech Tip: Understanding the Export XML Archive Format

By Nigel Cheshire

When I was a kid, I developed an unfortunate habit of taking things apart to try and find out how they worked. Toys, transistor radios, vacuum cleaners, pretty much anything mechanical or electrical had to come apart for inspection. I say “unfortunate” because, although I learned a lot, and could usually get them back together again, they didn’t always work quite as well after the disassembly/reassembly cycle. And sometimes, much to my parents’ chagrin, I couldn’t get them back together at all. If that sounds even slightly like you, and you’re a user of Teamstudio Export, you may be interested in this post.

If you’ve used Export to create read-only, HTML and/or PDF format versions of your Lotus Notes databases, then you’ll know that the first step in the process is to create an XML archive of the data. If your primary objective is to allow users to continue to access their Notes data in perpetuity, without the need for Notes clients or Domino servers, then you may not have given those XML archives much thought. But it’s helpful to understand that, because the XML archives contain everything that was in the Notes database (including the design), and the format of them is unlikely to change much (if at all) in the future, you probably want to keep those XML archives around for the foreseeable future.

To illustrate the point, consider this. It has been more than three years since we shipped the first version of Export, and in that time we have made many, many improvements to the HTML export process, each of which has added new features to the HTML and PDF archives. In the same time period, we’ve made almost no changes to the format of the XML archives - they are what they are. So, you could take an archive that you created with Export 1.0 in February 2018 and use it to create a fully functional HTML archive with Export 4.1 (release date: February 2021), including all the latest bells and whistles, with no problem.

So what is hidden within those mysterious “.tse” files that Export produces and that contain the XML archive? In fact, the single .tse file that holds the entire contents of a Notes database is nothing more than a ZIP file archive of the many XML files that are contained within it. Because the XML files are all (of course) text based, and they don’t contain the view indexes that tend to bloat the database in its NSF format, they zip up pretty small. For example, the Domino Designer 9.0 Help database weighs in at 7.8 MB in NSF format, the raw XML files measure 4.1 MB and the zipped TSE archive goes down to 683 KB.

Because the TSE file is nothing more than a ZIP archive, if you want to nose around inside, all you have to do is add a .zip suffix to the filename and you can decompress it like any other ZIP file. If you take a peek inside any archive file, at the top level you’ll find four folders:

1. Data Folder

The data folder contains a file for every document in the database. These are encoded using the Notes/Domino standard DXL format, which is mostly pretty easy to decipher just by looking at it. The document type definition for DXL is located here. The name of each file is based on the note id of the corresponding document.

2. Design Folder

As you might guess, the design folder holds a file for each design element, encoded in DXL and named by note id.

Note: Domino supports two different styles of DXL: binary and default. Both are standard text-based XML, but in binary mode, complex data values are exported as base-64 encoded versions of the raw binary data stored in the .NSF file. The binary mode offers maximum fidelity but it is impossible to interpret without a deep knowledge of Domino internal data structures. Prior to version 3.0, Teamstudio Export only used the default mode that converts even complex data values to human-readable XML elements. Export 3.0 and later also includes binary mode for forms and views to ensure that the archive contains complete design information. The human-readable DXL is stored in the design folder and, from Export 3.0 onwards, the binary-mode DXL is stored in the design2 folder.

3. Profile Folder

The profile folder contains one file for each profile document in the database, also encoded in DXL.

4. Views Folder

The views folder contains one XML file for each view and each folder in the database. There is no standard DXL format for view data, and so we have defined our own format, which is documented in the Export online doc.

In addition to these folders, you will find five files at the top level of the directory tree:

1. acl.dxl - a file containing the ACL information in DXL format;

2. db.dxl - this captures database level information, such as the replica id;

3. log.txt - a plain text file containing any errors or warnings that occurred during the archive process;

4. meta.xml - this is an XML format file containing metadata which is used by Export primarily to maintain the UI;

5. unidindex.txt - this is a plain text CSV file that maps NoteIDs to UniversalNoteIDs (UNIDs) and allows Export to convert between the two during the HTML export process.

And that’s it. If you’re curious, unzip an archive and take a look. Will you ever need to know any of this information? Possibly not, but if you’re anything like me, you want to know how something works as much as how to operate it. And in this case, you don’t even need to take anything apart.