An important part of the digital archiving problematic concerns the files formats and their ability to be understood in the future.
To be potentially archived, the n°1 condition for a format is its ability to be exploited in its entirety for an unlimited period. So, there must be an available specification describing all of its characteristics. The format and its specification must be free without any time restrictions.
The objective is to find some criteria to guarantee the n°1 condition.
The format specification must be open. It’s better if the specification is linked to a standard, so the quality of its description is guaranteed. If there’s no standard, the format has to be widely used. It may allow thinking that the specification is well written. It results from this description that the format could be proprietary.
To be archived at CINES, a format has to follow these three criteria:
- Widely used (or soon to be)
- Standardized (if possible)
This selection is necessary for:
- Format validity check
- Migration (transformation to another format)
- Format readability and understanding
CINES ‘general archiving policy aims to minimize the number of collected formats:
- To make the management of format logic migration process easier
- To make the best format monitoring possible
CINES formats monitoring
The criteria presented above are the founding principles of the CINES reflection around the file format selection for archiving. Formats monitoring organizes itself around a set of lists in which are described the different file formats “watched” by the PAC team. There are 5 lists:
- List of the formats under review: contains the formats suggested by the users or the emergent formats detected by the monitoring
- List of the formats that could potentially be accepted in the archiving platform: gathered the formats evaluated as relevant using the criteria presented above together.
- List of the formats accepted by the archiving platform: is composed of the formats accepted for archiving and approved by the management. It mustn’t be mistaken for the list of the formats already archived in the platform. This list is the only one accessible to everyone. This one doesn’t fall under the format monitoring. In fact, a format can be accepted for archiving without being archived in PAC, if no user has submitted files in this format.
- List of the formats about to become obsolete: contains the formats accepted for archiving for which a threat of obsolescence has been detected by the monitoring. In that case, discussions with the users start to stop the submission in these formats.
- List of the obsolete formats: contains the formats which have been accepted in PAC but are now considered as obsolete regarding the selection criteria. They are not accepted anymore and have to be migrated to another format to preserve the readability of their content.