New PND format

From Pandora Wiki
Revision as of 16:06, 3 March 2010 by Dflemstr (talk | contribs) (Remaining issues)
Jump to: navigation, search

The current PND format has some shortcomings as listed below. This page should serve as a discussion page/white board for how the format could be improved.

Current situation

The current (ISO-based) PND format has the following shortcomings:

  • It uses the ISO file system which is inefficient at storing data, because:
    • It contains duplicate headers for each file entry (one version with big-endian integers, one version with little-endian integers[1]) which leads to multiple kilobytes of wasted space.
    • Its file tables are fixed-size, and there are therefore limitations on how many folders you can have, what names you can give your files, how big your files can be etc.
    • It needs various file system extensions to behave correctly, like for example Rock Ridge Interchange Protocol and Joliet. Without them, it becomes pretty useless (only DOS-like filenames supported) and with them, it becomes difficult to read by a tool.
    • It's very difficult to use since you need special ISO making tools to create the image.
    • If you want to add or remove files from it, it's near impossible if you use a compact ISO file, since there's no way that you can "expand" an existing ISO file.
  • The PND header data is at the end (or in the middle of the file if a screenshot is included) which makes it impossible for e.g. libmagic to recognize the PND file. It will instead recognize it as an ISO file. Having the header data at the end also makes it take a very long time to find the data, making tools and the libpnd library very inefficient.
  • The PND file uses a custom XML format for its metadata. There's no reason to do this, especially since the established ".desktop" file format fills exactly the same function.

Proposed revisions

Step 1: File system

The file system for PNDs should be replaced by the uncompressed ZIP archive format. ZIP has the advantage that it's incredibly compact, and uncompressed ZIP makes it possible to read data from the file without having to do any decompression.

ZIP files are mountable using various implementations of zipfs, and most of these implementations won't store files in memory when they are being read if the ZIP file is uncompressed.

Step 2: Metadata

PND files should no longer require special tools for them to be created. Therefore, since ZIP files support random access on files, it shouldn't be necessary to append/prepend metadata to the file. The user simply includes the "PXML.xml" and "preview.png" files inside of the ZIP, and any tools that need information about the PND can simply go to the central directory of the ZIP file (or use a simple ZIP library to do it for them) and get the location of the file inside of the ZIP. This should also dramatically decrease loading times for PNDs.

Step 3: Metadata format

Current PND files include so-called "PXML.xml" files. These files have a custom XML format that has a strange structure.

These files should be replaced by ".desktop" files. A PND can contain one or more ".desktop" files in its root directory that specify how an application should be launched. PND tools simply use all ".desktop" files they can find in the PND when creating launchers for the contents of the PND.

Advantages

  • There don't have to be any special tools for reading PND files. The package can be run on any platform using any programming language that can read ZIP files.
  • We can use existing facilities to manage the launching of applications. The ".desktop" files can basically be copied without modification into standard locations of the system, and all launchers will become aware of them.
  • Reading PND files becomes easier (because of better tool support... sorry, but you can't link to libpnd in all programming languages) and quicker (becuase of the central directory for random file access).

Remaining issues

  • How should libpnd tell the difference between old and new PND files? Should it depend on the "file" tool, should it run its own recognition algorithm, or what? Should the extension be changed to e.g. ".box" (Which was proposed in a thread)?
  • How to make sure that the ZIP files are uncompressed? Should we provide a script that "uncompresses" an ordinary ZIP file?
  • Is squashfs better than ZIP?

Upgrade path from the old PND format

A simple script can be written that extracts the old PND file, including the screenshot and the PXML.xml file. The PXML.xml file is then converted to one or many .desktop files, and the desktop files, the preview.png file, and the package contents are compacted into a ZIP that then is renamed to "*.pnd"

Usage scenario

Repackaging of an application from another package format

  • The user grabs the package for the application
  • He dumps the executable and all required libraries into a folder
  • He dumps a "screenshot.png" file into the folder that he's made
  • He copies the ".desktop" file(s) for the application from the old package, opens them, replaces "Exec=/usr/bin/bla" with "Exec=./bla" and saves them in the directory
  • He uses a ZIP archiver to make a ZIP out of the folder, making sure that he sets the "uncompressed" option.

Creating PNDs as part of a build process

  • The build tool creates a directory with all of the necessary information like above and invokes the "zip" utility to compress the folder.

Accessing a PND's contents for the...

...lazy programmer

  • The programmer extracts the PND into a folder via libzip and accesses its contents.

...pragmatic programmer

  • The programmer uses zipfs to mount the ZIP and accesses the file's contents.

...smart performance-aware programmer

  • The programmer fseek()'s the ZIP file until he finds 0x04034b50.
  • He jumps 18 bytes forward and reads the int at that location, storing it in "length".
  • He jumps 4 bytes and checks if this int matches "length". If it doesn't, it means that the file is compressed, and an error is reported.
  • He then jumps forward 4 bytes and stores the short at that location in a variable "nameLength".
  • Another 2-byte jump gets the short "extensionsLength".
  • He then jumps 2 bytes and reads "nameLength" amount of bytes from the file.
  • He then uses strcmp() to see if this string matches the sought-after file. If not, he continues the fseek().
  • The programmer now skips "entensionsLength" amount of bytes.
  • The programmer reads "length" amount of bytes from the file, and uses this as the sought-after file data.

...performance fascist/programmer who wants low seek times

  • The programmer fseek()'s the file from the end until he finds 0x02014b50.
  • He then uses the following table to get the information he needs:
ZIP central directory file header
Offset Bytes Description
0 4 Central directory file header signature = 0x02014b50
4 2 Version made by
6 2 Version needed to extract (minimum)
8 2 General purpose bit flag
10 2 Compression method
12 2 File last modification time
14 2 File last modification date
16 4 CRC-32
20 4 Compressed size
24 4 Uncompressed size
28 2 File name length (n)
30 2 Extra field length (m)
32 2 File comment length (k)
34 2 Disk number where file starts
36 2 Internal file attributes
38 4 External file attributes
42 4 Relative offset of local file header
46 n File name
46+n m Extra field
46+n+m k File comment
  • The relative file offset can then be used to jump to the file in question. The file is then scanned as described in the previous section.