New PND format
The current PND format has some shortcomings as listed below. This page should serve as a discussion page/white board for how the format could be improved.
Proposal for discussion, this is just some opening shots
Contents
Current situation
The current (ISO-based) PND format has the following shortcomings:
- It currently uses ISO, CramFS, and other filesystems... This means that there's no standard to follow, and libpnd needs to carry around support for a multitude of file systems.
- It (often, but not always) uses the ISO file system which is inefficient at storing data, because:
- It contains duplicate headers for each file entry (one version with big-endian integers[1], one version with little-endian integers[2]) which leads to many kilobytes of wasted space.
- Its file tables are fixed-size, and there are therefore limitations on how many folders you can have, what names you can give your files, how big your files can be etc. For instance, files can only have names that are max 31 characters long, all upper-case, and limited to the ASCII character encoding, and an ISO does only support a folder depth of 8[3]. To support more, the Joliet file system extension is needed (or isofs won't recognize all file paths).
- It needs various file system extensions to behave correctly, like for example Rock Ridge Interchange Protocol and Joliet. Without them, it becomes pretty useless as demonstrated above, and with them, it becomes difficult to read by a tool (You can only use isofs! There are no other tools out there for programmers to use!).
- It's very difficult to use since you need special ISO making tools to create the image. ISO is a relatively rare format if you're not technically inclined.
- If you want to add or remove files from it, it's near impossible if you use a compact ISO file, since there's no way that you can "expand" an existing ISO file.
- The PND header data is at the end (or in the middle of the file if a screenshot is included) which makes it impossible for e.g. libmagic to recognize the PND file. It will instead recognize it as an ISO file. Having the header data at the end also makes it take a very long time to find the data, making tools and the libpnd library very inefficient.
- The PND file uses a custom XML format for its metadata. There's no reason to do this, especially since the established ".desktop" file format fills exactly the same function.
(note: ".desktop" does not contain all metadata we have wanted over time, but it's of course possible to add extensions à la "X-Pandora-Whatever=xyz")
- There's no "index" for the PND file. The whole file has to be scanned (albeit backwards) to find a PXML file, and there's a big chance for false positives etc.
- Data is just appended linearly to the file so there's no order. If the format is to be extended (to e.g. include an icon file after the screenshot file), should the data just be appended as well?
- UTF-8 is strictly the only encoding that is supported. If you make your PXML on a Windows machine, it won't work.
Proposed revisions
Step 1: File system
The file system for PNDs should be replaced by the uncompressed ZIP archive format. ZIP has the advantage that it's incredibly compact, and uncompressed ZIP makes it possible to read data from the file without having to do any decompression.
ZIP files are mountable using various implementations of zipfs, and most of these implementations won't store files in memory when they are being read if the ZIP file is uncompressed.
Step 2: Metadata
PND files should no longer require special tools for them to be created. Therefore, since ZIP files support random access on files, it shouldn't be necessary to append/prepend metadata to the file. The user simply includes the "PXML.xml" and "preview.png" files inside of the ZIP, and any tools that need information about the PND can simply go to the central directory of the ZIP file (or use a simple ZIP library to do it for them) and get the location of the file inside of the ZIP. This should also dramatically decrease loading times for PNDs.
Note; performance testing is needed; at the time we started, zip was enormously slower than plain ISO; with driver changes, it may or may not be so, hence our adopted multiple-filesystem-type system. zipfs is one possible option.
Step 3: Metadata format
Current PND files include so-called "PXML.xml" files. These files have a custom XML format that has a strange structure.
These files should be replaced by ".desktop" files. A PND can contain one or more ".desktop" files in its root directory that specify how an application should be launched. PND tools simply use all ".desktop" files they can find in the PND when creating launchers for the contents of the PND.
Advantages
- There don't have to be any special tools for reading PND files. The package can be run on any platform using any programming language that can read ZIP files.
- We can use existing facilities to manage the launching of applications. The ".desktop" files can basically be copied without modification into standard locations of the system, and all launchers will become aware of them.
- Reading PND files becomes easier (because of better tool support... sorry, but you can't link to libpnd in all programming languages) and quicker (becuase of the central directory for random file access).
Remaining issues
- How should libpnd tell the difference between old and new PND files? Should it depend on the "file" tool, should it run its own recognition algorithm, or what? Should the extension be changed to e.g. ".box" (Which was proposed in a thread)?
- How to make sure that the ZIP files are uncompressed? Should we provide a script that "uncompresses" an ordinary ZIP file?
- Is cramfs better than ZIP?
- cramfs cannot be uncompressed like ZIP can
- If you compare compressed ZIP and cramfs, cramfs is more efficient.
- cramfs is harder to use than ZIP (for the developer).
- cramfs needs to have all of its opened files in memory.
Upgrade path from the old PND format
A simple script can be written that extracts the old PND file, including the screenshot and the PXML.xml file. The PXML.xml file is then converted to one or many .desktop files, and the desktop files, the preview.png file, and the package contents are compacted into a ZIP that then is renamed to "*.pnd"
Usage scenario
Repackaging of an application from another package format
- The user grabs the package for the application
- He dumps the executable and all required libraries into a folder
- He dumps a "screenshot.png" file into the folder that he's made
- He copies the ".desktop" file(s) for the application from the old package, opens them, replaces "Exec=/usr/bin/bla" with "Exec=./bla" and saves them in the directory
- He uses a ZIP archiver to make a ZIP out of the folder, making sure that he sets the "uncompressed" option.
Creating PNDs as part of a build process
- The build tool creates a directory with all of the necessary information like above and invokes the "zip" utility to compress the folder.
Accessing a PND's contents for the...
...lazy programmer
- The programmer extracts the PND into a folder via libzip and accesses its contents.
...pragmatic programmer
- The programmer uses zipfs to mount the ZIP and accesses the file's contents.
...smart performance-aware programmer
- The programmer fseek()'s the ZIP file until he finds 0x04034b50.
- He jumps 18 bytes forward and reads the int at that location, storing it in "length".
- He jumps 4 bytes and checks if this int matches "length". If it doesn't, it means that the file is compressed, and an error is reported.
- He then jumps forward 4 bytes and stores the short at that location in a variable "nameLength".
- Another 2-byte jump gets the short "extensionsLength".
- He then jumps 2 bytes and reads "nameLength" amount of bytes from the file.
- He then uses strcmp() to see if this string matches the sought-after file. If not, he continues the fseek().
- The programmer now skips "entensionsLength" amount of bytes.
- The programmer reads "length" amount of bytes from the file, and uses this as the sought-after file data.
...performance fascist/programmer who wants low seek times
- The programmer fseek()'s the file from the end until he finds 0x02014b50.
- He then uses the following table to get the information he needs:
Offset | Bytes | Description |
---|---|---|
0 | 4 | Central directory file header signature = 0x02014b50 |
4 | 2 | Version made by |
6 | 2 | Version needed to extract (minimum) |
8 | 2 | General purpose bit flag |
10 | 2 | Compression method |
12 | 2 | File last modification time |
14 | 2 | File last modification date |
16 | 4 | CRC-32 |
20 | 4 | Compressed size |
24 | 4 | Uncompressed size |
28 | 2 | File name length (n) |
30 | 2 | Extra field length (m) |
32 | 2 | File comment length (k) |
34 | 2 | Disk number where file starts |
36 | 2 | Internal file attributes |
38 | 4 | External file attributes |
42 | 4 | Relative offset of local file header |
46 | n | File name |
46+n | m | Extra field |
46+n+m | k | File comment |
- The relative file offset can then be used to jump to the file in question. The file is then scanned as described in the previous section.