About metadata in PDF documents

In a PDF file, metadata can be stored in two places:

  • In the Info dictionary of the file trailer dictionary. This dictionary contains information about the file, such as title, author, and creation date. This information is stored as PDF objects such as strings and dates, not in XML format.

    The information in this dictionary is visible to Acrobat and Adobe Reader users through the document properties. Users can set some of the properties, such as Title, Author, Subject, and Keywords. Users can also add custom properties with a unique name and value.

  • In the Metadata dictionary of the document catalog. This dictionary contains metadata that is associated with the entire document. This information is represented as XMP metadata.

    Note: Individual streams in a document, such as images, may also have metadata entries that contain associated XMP metadata. However, the XMP Utilities service does not provide the ability to manipulate such component-level metadata.

All metadata in the Info dictionary is also represented in the Metadata dictionary in the form of XMP metadata properties. The standard properties, such as Title and Author, are represented in XMP as properties from the PDF schema.

When the XMP Utilities service reads metadata from a PDF file, it resolves inconsistencies between values in the Info dictionary and those in the XMP metadata:

  • If the Info dictionary is newer, the Info dictionary properties are used to update the XMP metadata.

  • If the XMP metadata is newer, the XMP properties are used to update the Info dictionary.

  • Properties in the Info dictionary that are not listed in “Document Information Dictionary” in the PDF Reference are mapped to the pdfx namespace (“http://ns.adobe.com/pdfx/1.3/”). This mapping is used when copying properties between the repositories in the situations described in the first two points.

When a PDF document is saved, some metadata properties are automatically updated, specifically, xmp:ModifyDate, xmp:MetadataDate, xapMM:InstanceID and, if missing, xapMM:DocumentID. If you attempt to modify these properties, values you specify will be overridden.

// Ethnio survey code removed