What is a metadata?
Literally, this isinformation about data. It allows the description of the document contents, whatever the media.
There are several types of metadata:
- managementmetadata, to access the document;
- description metadata, to understand the content;
- And preservationmetadata to ensure the access continuity and document understanding .
In the OAIS model, all information related to the content information is metadata. (see article of OAIS model).
What is the issue of metadata identification and preservation ?
Metadata is the identity card of a document. They identify, describe and explain the origin of its creation, its utility and its recipients.
Without all these elements, a document can quickly become incomprehensible and therefore unusable.
Indeed, how understanding a succession of numbers in a table, if it is not clear what are the values in the abscissa and ordinate, who made this document, for whom, for what purpose or when
The identification of all the metadata that must accompany a digital document is therefore a crucial but difficult work. Crucial, as it will determine directly the quality of archive service and future document access. Difficult, because experience feedback in digital preservation is obviously almost nonexistent today, and we can only move forward by making assumptions and taking a few assumed risks .
The main standards for metadata
In order to assist the archivists in the difficult task of setting up a set of metadata, there are dictionaries and metadata sets and standards for packaging these metadata.
The core standard for metadata is certainly the Dublin Core reference set, which is used today by a large international community of archivists.
The Dublin Core metadataconsist of 15 basic elements intended to describe any resource, broadly available on the Internet.
These 15 elements are:
This package is the core descriptive metadata of the document.
But at this first general level of descriptive metadata, it is necessary to add metadata more technical, specialized in the business of preservation itself. Several recommendations exist in this area, such as PREMIS(Preservation Metadata: Implementation Strategies), which is a data dictionary defining the key elements to improve the preservation functions. It lists the preservation metadata that must be known by the long term archiving.
There are also many other specialized metadata dictionaries, among which TEF for French electronic theses, or LOM and LOMFR SupLOMFR for teaching and learning resources.
All these dictionaries are not exclusive and can be combined with other standard metadata packages.
One of the most famous is METS(Metadata Encoding and Transmission Standard. This is an open, non-proprietary, modular and extensible XML package, which encapsulates several blocks of metadata to describe a digital object: descriptive, administrative and structural metadata, on files and on the links between objects.
Within the archival community, EAD(Encoded Archival Description) is widely used as a metadata packaging standard. It allows the encoding of the archive retrieval mechanisms and can therefore describe archives, manuscript collections and more broadly any type of hierarchical collection of document or objects (photographs, microfilms, museum pieces).
In practice, the identification of a set of preservation metadata is rather a risk management approach. Starting from one or more broad lists, each metadata deemed unnecessary can be eliminated in view of future, expected or predictable services.
See the list of metadata used for the CINES preservation repository.