Changes between Initial Version and Version 1 of Obsolete/MovedToTree/PackageManagement/FileFormat


Ignore:
Timestamp:
Nov 13, 2009, 1:24:56 PM (15 years ago)
Author:
bonefish
Comment:

Initial version of the Haiku Package format specification. Incomplete yet.

Legend:

Unmodified
Added
Removed
Modified
  • Obsolete/MovedToTree/PackageManagement/FileFormat

    v1 v1  
     1[[PageOutline(2-3, Contents)]]
     2= Haiku Package Format =
     3
     4This document specifies the Haiku Package (HPKG) file format, which was designed for efficient use by Haiku's package file system. It is somewhat inspired by the [http://code.google.com/p/xar/ XAR format] (separate TOC and data heap), but aims for greater compactness (not XML for the TOC).
     5
     6Three stacked format layers can be identified:
     7 - A generic container format for structured data.
     8 - An archive format specifying how file system data are stored in the container.
     9 - A package format, extending the archive format with attributes for package management.
     10
     11
     12
     13== The Data Container Format ==
     14
     15A HPKG file consists of four sections:
     16 Header::
     17   Identifies the file as HPKG file and provides access to the other sections.
     18 Heap::
     19   Contains arbitrary (mostly unstructured) data referenced by the next two
     20   sections.
     21 TOC (table of contents)::
     22   The main section, containing structured data with references to unstructured
     23   data in the Heap.
     24 Package Attributes::
     25   A section similar to the TOC. Rather than describing the data contained in
     26   the file, it specifies meta data of the package as a whole.
     27
     28All numbers in the HPKG are stored in big endian format or [http://en.wikipedia.org/wiki/LEB128 LEB128] encoding.
     29
     30
     31=== Header ===
     32
     33The header has the following structure:
     34
     35{{{
     36struct hpkg_header {
     37        uint32  magic;
     38        uint16  header_size;
     39        uint16  version;
     40        uint64  total_size;
     41
     42        // package attributes section
     43        uint32  attributes_compression;
     44        uint32  attributes_length_compressed;
     45        uint32  attributes_length_uncompressed;
     46
     47        // TOC section
     48        uint32  toc_compression;
     49        uint64  toc_length_compressed;
     50        uint64  toc_length_uncompressed;
     51
     52        uint64  toc_attribute_types_length;
     53        uint64  toc_attribute_types_count;
     54        uint64  toc_strings_length;
     55        uint64  toc_strings_count;
     56};
     57}}}
     58
     59 magic::
     60   The string 'hpkg' (B_HPKG_MAGIC).
     61 header_size::
     62   The size of the header.
     63 version::
     64   The version of the HPKG format the file conforms to. The current version is
     65   1 (B_HPKG_VERSION).
     66 total_size::
     67   The total file size.
     68
     69 attributes_compression::
     70   The compression algorithm used for the package attributes section.
     71 attributes_length_compressed::
     72   The compressed size of the package attributes section. Equals
     73   attributes_length_uncompressed, if the section is not compressed.
     74 attributes_length_uncompressed::
     75   The uncompressed size of the package attributes section.
     76
     77 toc_compression::
     78   The compression algorithm used for the TOC section.
     79 toc_length_compressed::
     80   The compressed size of the TOC section. Equals
     81   toc_length_uncompressed, if the section is not compressed.
     82 toc_length_uncompressed::
     83   The uncompressed size of the TOC section.
     84
     85 toc_attribute_types_length::
     86   The size of the attributes types subsection of the TOC section.
     87 toc_attribute_types_count::
     88   The number of entries in the attributes types subsection of the TOC section.
     89 toc_strings_length::
     90   The size of the strings subsection of the TOC section.
     91 toc_strings_count::
     92   The number of entries in the strings subsection of the TOC section.
     93
     94
     95=== TOC ===
     96
     97The TOC section contains a list of attribute trees. An attribute has a name, a data type, and a value, and can have child attributes. E.g.:
     98 - "shopping list" : string : "bakery"
     99   - "item" : string : "rye bread"
     100   - "item" : string : "bread roll"
     101     - "count" : int : 10
     102   - "item" : string : "cookie"
     103     - "count" : int : 5
     104 - "shopping list" : string : "hardware store"
     105   - "item" : string : "hammer"
     106   - "item" : string : "nail"
     107     - "size" : int : 10
     108     - "count" : int : 100
     109
     110Attributes often share the same name and data type, particularly when a list of some kind are stored. In order to save space each unique name and data type pair is stored as an attribute type in a separate subsection and is referenced by an index.
     111
     112A similar optimization exists for shared string attribute values. A string value used by more than one attribute is stored in the strings subsection and is referenced by an index as well.
     113
     114Hence the TOC section consists of three subsections:
     115 Attribute types::
     116   A table of attribute name, data type pairs.
     117 Strings::
     118   A table of commonly used strings.
     119 Main TOC::
     120   The attribute trees.
     121
     122==== Attribute Types ====
     123
     124The attribute types subsection consists of a list of attribute type entries terminated by a 0 byte. An attribute type entry is stored as:
     125 Attribute data type::
     126   A uint8 specifying the data type.
     127 Attribute name::
     128   A null-terminated UTF-8 string.
     129
     130These are the specified data type values:
     131||0||B_HPKG_ATTRIBUTE_TYPE_INVALID||invalid||
     132||1||B_HPKG_ATTRIBUTE_TYPE_INT||signed integer||
     133||2||B_HPKG_ATTRIBUTE_TYPE_UINT||unsigned integer||
     134||3||B_HPKG_ATTRIBUTE_TYPE_STRING||UTF-8 string||
     135||4||B_HPKG_ATTRIBUTE_TYPE_RAW||raw data||
     136
     137Each attribute type is implicity assigned the (null-based) index at which the respective entry appears in the list, i.e. the nth entry has the index n - 1. The attribute is referenced by this index in the main TOC subsection.
     138
     139==== Strings ====
     140
     141The strings subsections consists of a list of null-terminated UTF-8 strings. The section itself is terminated by a 0 byte.
     142
     143Each string is implicity assigned the (null-based) index at which the it appears in the list, i.e. the nth string has the index n - 1. The string is referenced by this index in the main TOC subsection.
     144
     145==== Main TOC ====
     146
     147The main TOC subsection consists of a list of attribute entries terminated by a 0 byte. An attribute entry is stored as:
     148 Attribute tag::
     149   An unsigned LEB128 encoded number.
     150 Attribute value::
     151   The value of the attribute encoded as described below.
     152 Attribute child list:
     153   Only if this attribute is marked to have children: A list of attribute
     154   entries terminated by a 0 byte.
     155
     156The attribute tag encodes three pieces of information:
     157  {{{(typeIndex << 3) + (encoding << 1) + hasChildren + 1}}}
     158
     159 typeIndex::
     160   The index of the attribute type.
     161 encoding::
     162   Specifies the encoding of the attribute value as described below.
     163 hasChildren::
     164   1, if the attribute has children, 0 otherwise.
     165
     166==== Attribute Values ====
     167
     168A value of each of the data types can be encoded in different ways, which is defined by the encoding value:
     169
     170 - B_HPKG_ATTRIBUTE_TYPE_INT and B_HPKG_ATTRIBUTE_TYPE_UINT:
     171
     172   ||0||B_HPKG_ATTRIBUTE_ENCODING_INT_8_BIT||int8/uint8||
     173   ||1||B_HPKG_ATTRIBUTE_ENCODING_INT_16_BIT||int16/uint16||
     174   ||2||B_HPKG_ATTRIBUTE_ENCODING_INT_32_BIT||int32/uint32||
     175   ||3||B_HPKG_ATTRIBUTE_ENCODING_INT_64_BIT||int64/uint64||
     176
     177 - B_HPKG_ATTRIBUTE_TYPE_STRING:
     178
     179   ||0||B_HPKG_ATTRIBUTE_ENCODING_STRING_INLINE||null-terminated UTF-8 string||
     180   ||1||B_HPKG_ATTRIBUTE_ENCODING_STRING_TABLE||unsigned LEB128: index into string table||
     181
     182 - B_HPKG_ATTRIBUTE_TYPE_RAW
     183
     184   ||0||B_HPKG_ATTRIBUTE_ENCODING_RAW_INLINE||unsigned LEB128: size; followed by raw bytes||
     185   ||1||B_HPKG_ATTRIBUTE_ENCODING_RAW_HEAP||unsigned LEB128: size; unsigned LEB128: offset into heap||
     186
     187
     188=== Package Attributes ===
     189
     190The package attributes section contains a list of attribute trees, just like
     191the TOC section. Since the purpose of the section is to store meta data of the package as a whole, it will be relatively small and less repetitive (no or only short item lists). Therefore this section does not have attribute types and strings subsections. It directly stores a list of self contained attribute entries terminated by a 0 byte. An entry has the following format:
     192 Attribute data type::
     193   A uint8 specifying the data type of the attribute value.
     194 Has children::
     195   A uint8: non 0, if the attribute has children, 0 otherwise.
     196 Attribute name:
     197   A null-terminated UTF-8 string.
     198 Attribute value::
     199   The value of the attribute encoded as described in the Main TOC section.
     200 Attribute child list:
     201   Only if this attribute is marked to have children: A list of attribute
     202   entries terminated by a 0 byte.
     203
     204
     205=== Section Compression ===
     206
     207The TOC and the package attributes section can be compressed. Which compression algorithm is used is specified by the {{{toc_compression}}} respectively the {{{attributes_compression}}} field in the header. The following values are defined:
     208
     209||0||B_HPKG_COMPRESSION_NONE||no compression||
     210||1||B_HPKG_COMPRESSION_ZLIB||zlib (LZ77) compression||
     211
     212
     213
     214== The Archive Format ==
     215
     216This section specifies how file system objects (files, directories, symlinks) are stored in a HPKG file. It builds on top of the container format, defining the types of attributes, their order, and allowed values.
     217
     218E.g. a "bin" directory, containing a symlink and a file:
     219{{{
     220bin           0  2009-11-13 12:12:09  drwxr-xr-x
     221  awk         0  2009-11-13 12:11:16  lrwxrwxrwx  -> gawk
     222  gawk   301699  2009-11-13 12:11:16  -rwxr-xr-x
     223}}}
     224could be represented by this attribute tree:
     225 - "dir:entry" : string : "bin"
     226  - "file:type" : uint : 1 (0x1)
     227  - "file:mtime" : uint : 1258110729 (0x4afd3f09)
     228  - "dir:entry" : string : "awk"
     229    - "file:type" : uint : 2 (0x2)
     230    - "file:mtime" : uint : 1258110676 (0x4afd3ed4)
     231    - "symlink:path" : string : "gawk"
     232  - "dir:entry" : string : "gawk"
     233    - "file:permissions" : uint : 493 (0x1ed)
     234    - "file:mtime" : uint : 1258110676 (0x4afd3ed4)
     235    - "data" : raw : size: 301699, offset: 0
     236    - "file:attribute" : string : "BEOS:APP_VERSION"
     237      - "file:attribute:type" : uint : 1095782486 (0x41505056)
     238      - "data" : raw : size: 680, offset: 301699
     239    - "file:attribute" : string : "BEOS:TYPE"
     240      - "file:attribute:type" : uint : 1296649555 (0x4d494d53)
     241      - "data" : raw : size: 35, offset: 302379
     242
     243
     244=== Attribute Types ===
     245
     246The following attribute types are specified by the archive format. Any other attributes will be ignored.
     247
     248==== B_HPKG_ATTRIBUTE_NAME_DIRECTORY_ENTRY ("dir:entry") ====
     249 - '''Type:''' string
     250 - '''Value:''' File name of the entry.
     251 - '''Allowed Values:''' Any valid file (not path!) name, save "." and "..".
     252 - '''Child Attributes:'''
     253   - B_HPKG_ATTRIBUTE_NAME_FILE_TYPE: The file type of the entry.
     254   - B_HPKG_ATTRIBUTE_NAME_FILE_PERMISSIONS: The file permissions of the entry.
     255   - B_HPKG_ATTRIBUTE_NAME_FILE_USER: The owning user of the entry.
     256   - B_HPKG_ATTRIBUTE_NAME_FILE_GROUP: The owning group of the entry.
     257   - B_HPKG_ATTRIBUTE_NAME_FILE_ATIME[_NANOS]: The entry's file access time.
     258   - B_HPKG_ATTRIBUTE_NAME_FILE_MTIME[_NANOS]: The entry's file modification time.
     259   - B_HPKG_ATTRIBUTE_NAME_FILE_CRTIME[_NANOS]: The entry's file creation time.
     260   - B_HPKG_ATTRIBUTE_NAME_FILE_ATTRIBUTE: An extended file attribute associated with entry.
     261   - B_HPKG_ATTRIBUTE_NAME_DATA: Only if the entry is a file: The file data.
     262   - B_HPKG_ATTRIBUTE_NAME_SYMLINK_PATH: Only if the entry is a symlink: The path the symlink points to.
     263   - B_HPKG_ATTRIBUTE_NAME_DIRECTORY_ENTRY: Only if the entry is a directory: The child entries in that directory.
     264
     265==== B_HPKG_ATTRIBUTE_NAME_FILE_TYPE ("file:type") ====
     266 - '''Type:''' uint
     267 - '''Value:''' Type of the entry.
     268 - '''Allowed Values:'''
     269
     270   ||0||B_HPKG_FILE_TYPE_FILE||file||
     271   ||1||B_HPKG_FILE_TYPE_DIRECTORY||directory||
     272   ||2||B_HPKG_FILE_TYPE_SYMLINK||symlink||
     273 - '''Default Value:''' B_HPKG_FILE_TYPE_FILE
     274 - '''Child Attributes:''' none
     275
     276==== B_HPKG_ATTRIBUTE_NAME_FILE_PERMISSIONS ("file:permissions") ====
     277 - '''Type:''' uint
     278 - '''Value:''' File permissions.
     279 - '''Allowed Values:''' Any valid permission mask.
     280 - '''Default Value:'''
     281   - For files: 0644 (octal).
     282   - For directories: 0755 (octal).
     283   - For symlinks: 0777 (octal).
     284 - '''Child Attributes:''' none
     285
     286==== B_HPKG_ATTRIBUTE_NAME_FILE_USER ("file:user") ====
     287 - '''Type:''' string
     288 - '''Value:''' Name of the user owning the file.
     289 - '''Allowed Values:''' Any non-empty string.
     290 - '''Child Attributes:''' none
     291
     292==== B_HPKG_ATTRIBUTE_NAME_FILE_GROUP ("file:group") ====
     293 - '''Type:''' string
     294 - '''Value:''' Name of the group owning the file.
     295 - '''Allowed Values:''' Any non-empty string.
     296 - '''Child Attributes:''' none
     297
     298==== B_HPKG_ATTRIBUTE_NAME_FILE_ATIME ("file:atime") ====
     299 - '''Type:''' uint
     300 - '''Value:''' File access time (seconds since the Epoch).
     301 - '''Allowed Values:''' Any value.
     302 - '''Child Attributes:''' none
     303
     304==== B_HPKG_ATTRIBUTE_NAME_FILE_ATIME_NANOS ("file:mtime:nanos") ====
     305 - '''Type:''' uint
     306 - '''Value:''' The nano seconds fraction of the file access time.
     307 - '''Allowed Values:''' Any value in [0, 999999999].
     308 - '''Child Attributes:''' none
     309
     310==== B_HPKG_ATTRIBUTE_NAME_FILE_MTIME ("file:mtime") ====
     311 - '''Type:''' uint
     312 - '''Value:''' File modified time (seconds since the Epoch).
     313 - '''Allowed Values:''' Any value.
     314 - '''Child Attributes:''' none
     315
     316==== B_HPKG_ATTRIBUTE_NAME_FILE_MTIME_NANOS ("file:mtime:nanos") ====
     317 - '''Type:''' uint
     318 - '''Value:''' The nano seconds fraction of the file modified time.
     319 - '''Allowed Values:''' Any value in [0, 999999999].
     320 - '''Child Attributes:''' none
     321
     322==== B_HPKG_ATTRIBUTE_NAME_FILE_CRTIME ("file:crtime") ====
     323 - '''Type:''' uint
     324 - '''Value:''' File creation time (seconds since the Epoch).
     325 - '''Allowed Values:''' Any value.
     326 - '''Child Attributes:''' none
     327
     328==== B_HPKG_ATTRIBUTE_NAME_FILE_CRTIM_NANOS ("file:crtime:nanos") ====
     329 - '''Type:''' uint
     330 - '''Value:''' The nano seconds fraction of the file creation time.
     331 - '''Allowed Values:''' Any value in [0, 999999999].
     332 - '''Child Attributes:''' none
     333
     334==== B_HPKG_ATTRIBUTE_NAME_FILE_ATTRIBUTE ("file:attribute") ====
     335 - '''Type:''' string
     336 - '''Value:''' Name of the extended file attribute.
     337 - '''Allowed Values:''' Any valid attribute name.
     338 - '''Child Attributes:'''
     339   - B_HPKG_ATTRIBUTE_NAME_FILE_ATTRIBUTE_TYPE: The type of the file attribute.
     340   - B_HPKG_ATTRIBUTE_NAME_DATA: The file attribute data.
     341
     342==== B_HPKG_ATTRIBUTE_NAME_FILE_ATTRIBUTE_TYPE ("file:attribute:type") ====
     343 - '''Type:''' uint
     344 - '''Value:''' Type of the file attribute.
     345 - '''Allowed Values:''' Any value in [0, 0xffffffff].
     346 - '''Child Attributes:''' none
     347
     348==== B_HPKG_ATTRIBUTE_NAME_DATA ("data") ====
     349 - '''Type:''' data
     350 - '''Value:''' Raw data of a file or attribute.
     351 - '''Allowed Values:''' Any value, if uncompressed, otherwise see below.
     352 - '''Child Attributes:'''
     353   - B_HPKG_ATTRIBUTE_NAME_DATA_COMPRESSION: The compression algorithm used for
     354     storing the data.
     355   - B_HPKG_ATTRIBUTE_NAME_DATA_SIZE: The size of the uncompressed data.
     356
     357==== B_HPKG_ATTRIBUTE_NAME_DATA_COMPRESSION ("data:compression") ====
     358 - '''Type:''' uint
     359 - '''Value:''' ID of the data compression algorithm.
     360 - '''Allowed Values:'''
     361
     362   ||0||B_HPKG_COMPRESSION_NONE||no compression||
     363   ||1||B_HPKG_COMPRESSION_ZLIB||zlib (LZ77) compression||
     364 - '''Default Value:''' B_HPKG_COMPRESSION_NONE
     365 - '''Child Attributes:''' none
     366
     367==== B_HPKG_ATTRIBUTE_NAME_DATA_SIZE ("data:size") ====
     368 - '''Type:''' uint
     369 - '''Value:''' Size of the uncompressed data.
     370 - '''Allowed Values:''': Any value.
     371 - '''Default Value:''' Size of the compressed data.
     372 - '''Child Attributes:''' none
     373
     374==== B_HPKG_ATTRIBUTE_NAME_DATA_CHUNK_SIZE ("data:chunk_size") ====
     375 - '''Type:''' uint
     376 - '''Value:''' Size of a compressed data chunk.
     377 - '''Allowed Values:''': Any value.
     378 - '''Default Value:'''
     379    - If not compressed: 0
     380    - If B_HPKG_COMPRESSION_ZLIB compressed: 64 * 1024
     381 - '''Child Attributes:''' none
     382
     383==== B_HPKG_ATTRIBUTE_NAME_SYMLINK_PATH ("symlink:path") ====
     384 - '''Type:''' string
     385 - '''Value:''' The path the symlink refers to.
     386 - '''Allowed Values:''': Any valid symlink path.
     387 - '''Default Value:''': Empty string.
     388 - '''Child Attributes:''' none
     389
     390
     391=== TOC Attributes ===
     392
     393The TOC can directly contain any number of attributes of the B_HPKG_ATTRIBUTE_NAME_DIRECTORY_ENTRY type, which in turn contain descendent attributes as specified in the previous section. Any other attributes are ignored.
     394
     395
     396=== Data Compression ===
     397
     398Data referred to by an B_HPKG_ATTRIBUTE_NAME_DATA attribute will be the raw data, if uncompressed. If compressed, the data have a special format, that allows for fast random access.
     399
     400==== B_HPKG_COMPRESSION_ZLIB ====
     401
     402The original data are split into equally sized chunks and compressed individually. The compressed data chunks are stored (in order) without padding, preceded by an uint64 array specifying the relative positions of the compressed data of each chunk. The positions are relative to the first byte following the position array. Since the first chunk is always at position 0, it's array element is omitted. Therefore a uncompressed data split into n chunks will have n - 1 position array elements.
     403
     404
     405== The Package Format ==
     406
     407TODO...