In their simplest form, persistent identifiers are nothing more than unique labels that are assigned to objects in a one-to-one relationship. Such identifiers are well understood in computing systems and we present examples of identifiers as used in a large-scale laboratory information management system (LIMS) in Part II of the extended report [2]. When used in the context of the internet, the concept of persistent identification is frequently coupled with the concept of actionability, implying that the PID is persistently linked to a specific object and when actuated, will always return the same response to the end-user (typically a hyperlink to a specific web page or other form of digital content). In this context PIDs differ from URLs, which are used to create hyperlinks and provide the internet address of where a given object is stored. As the storage location is not persistent, some “behind-the-scenes” mapping of object identifiers to object locations is required (resolution). This topic is covered in more detail in part IV of the extended report [2].
Persistent identifiers are a powerful enabling technology that provides a way to efficiently cope with chronic problems such as broken links and the general difficulty of reliable and reproducible information retrieval on the Internet. For example, PIDs associated with published articles allow rapid and accurate tracking of written works. PIDs are also in use within the life sciences such as the INSDC identifiers (e.g., sequence accession numbers used at GenBank, EMBL, and DNA Database of Japan) [22-24]. However, these are largely institution specific, i.e., used only within the institutions for which they were created, or are controlled by those organizations, such as the PubMed ID, issued by the National Library of Medicine.
Six PID schemes currently used across different domains and by a number of different organizations are reviewed and include: Uniform Resource Name (URN [25]:); Persistent Uniform Resource Locator (PURL [14]:); Archival Resource Key (ARK [26]); Life Science Identifiers (LSID [27]); Handle System (Handle [28]); Digital Object Identifier System (DOI [29]). This review also addresses the questions that need to be answered when an organization is assessing the need to incorporate a PID scheme into its data management plan.
Each of these identifiers is used in well-defined settings in which the data and metadata models of the underlying repositories were established a priori. The identifiers serve as a means of directly accessing a specific record or other form of digital content or the associated metadata. If the identifier is actionable, then it is possible to retrieve the linked object using the familiar interface of a web-browser. However, with the use of web services that provide structured access to the content of interest automatically (e.g. from a database or application on a handheld device using embedded PIDs), similar results can be achieved where an interactive interface is not suitable.
An effective and durable PID scheme requires ongoing maintenance and therefore ongoing resources. While some tasks can be automated, responsibility for this ongoing task must be assigned to an agency, program, or office, or to a trusted third-party who can guarantee reliability and virtually constant up-time to meet the needs of various end-user communities. In the case of integrating a persistent identifier scheme within the ABS process, the use of a trusted third party with the appropriate expertise and resources is probably the best option, especially if that third party is already engaged in such activity for other purposes.
The selection of an appropriate PID for the CBD ABS and related activities will be critical for its broad utility and community acceptance. However, it does not obviate the importance of carefully defining precisely what the identifiers refer to, and what will be returned by queries of various types. It is possible to develop a range of PID services that could, for instance, provide a direct link to digital and paper copies of entire documents, such as PICs, MTAs, CoOs, and other relevant agreements or permit tracking of genetic resources or parts of genetic resources in a future proof method, or do so on-the-fly. It could also be possible to track the transfer of materials and the corresponding agreements to third parties in a manner that is consistent with the rights and obligations of all parties to the initial agreement or to subsequent agreements. Similarly, the ability to track these genetic resources into the STM, general interest and patent literature is technically feasible.
Services such as these could be facilitated through the use of a trusted third party acting as a clearinghouse for registering ABS-related events (e.g., PIC, MTAs, CoO, and other relevant agreements) according to a set of well-understood business rules. With such a clearinghouse in place, it becomes possible to traverse a series of transactions backward and forward in time, even in instances where some ambiguity may exist. By drawing on highly interconnected information, it is possible to follow events, and to accurately recreate those events, when adequate documentation is available. Such a system would be useful for monitoring the use of genetic resources, especially since there will be instances in which long periods of time may exist between the time PICs, MTAs, and CoO are executed and some commercial or non-commercial product results. With the selection of the appropriate PID system, a system of this design could support human and machine queries and facilitate the retrieval of all relevant documents from public and private databases, including the STM literature, patent, and regulatory databases. This is discussed in more depth in part IV (CBD/ABS services) [2].