owned this note
owned this note
Published
Linked with GitHub
---
title: ARK Resolver Infrastructure
subtitle: Draft ARK identifier resolution service specifications
tags: identifier, ARK, resolve, PID
author: Dave Vieglais
date: 2023
---
# ARK Resolver Infrastructure
Draft ARK identifier resolution service specifications.
## Identifier Resolution Background
The practice of resolving an identifier to a resource location generally involves progressive refinement of location detail gathered through sequential inquiry to a series of information sources. When resolution is performed over HTTP, the progressive refinement occurs through a series of HTTP redirects until the resource is located or the resolution is rejected as invalid.
The different sources of information about identifiers can be broadly grouped into three levels: *Scheme Resolver*, *Group Resolver*, and *Resource Resolver*.
1. The *Scheme Resolver* directs requests to services registered for a particular scheme (e.g. `ark:` or `doi:`). For example, a *Scheme Resolver* has knowledge that identifiers using the "doi" scheme may be resolved by the service located at "https://doi.org/". Similarly, that ARK identifiers may be resolved at "https://n2t.net/". Hence, a *Scheme Resolver* redirects an identifier resolve request to a known *Group Resolver*.
2. The *Group Resolver* supports grouping of identifiers with particular services. The *Group Resolver* typically examines a portion of the identifier such as a prefix to determine which services can provide more detail about the identifier. For example, ARK identifiers have a numeric prefix, the "NAAN" (Name Assigining Authority Number). Each NAAN is associated with a service responsible for managing identifiers created with that prefix. Hence, a *Group Resolver* redirects an identifier resolve request to a known *Resource Resolver*. *Note:* This could also be called a "Prefix Resolver", however it is common practice to chain together a couple of resolvers at this level (e.g. NAAN -> shoulder) and calling them both "Prefix Resolvers" gets a little confusing. Hence the label "Group Resolver", which basically means the resolver handles a group of identifiers, albeit determined by the cummulative prefix.
3. A *Resource Resolver* associates an identifier with the location of a specific resource, which will generally be a Resource Server where the actual resource is stored and made available. Hence, a *Resource Resolver* service takes an identifier as input, and returns the Resource Server location information to the client.
```plantuml
@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
title Context for a generalized hierarchical identifier resolution system
Person(user, "User", "A user looking to resolve an identifier")
Person(admin, "Administrators", "Infrastructure administrators maintain registries at each resolver service")
System(SR, "Scheme Resolver", "Directs resolution requests to a registered Prefix Resolver")
System(GR, "Group Resolver", "Directs resolution requests to a registered Resource Resolver")
System_Boundary(resources, "Resource Application") {
System(RR, "Resource Resolver", "Directs resolution requests to a resource or another Resource Resolver.")
System(RS, "Resource Server", "Returns a resource given an identifier.")
}
Rel(SR, GR, "Redirect")
Rel(GR, RR, "Redirect")
Rel(RR, RS, "Redirect")
Rel(user, SR, "Resolve")
Rel(user, GR, "[Resolve]")
Rel(user, RR, "[Resolve]")
Rel(user, RS, "Retrieve")
Rel(admin, SR, "Manages")
Rel(admin, GR, "Manages")
Rel(admin, resources, "Manages")
@enduml
```
> Figure 1. Components of a hierarchical identifier resolution system. Each level of resolver service has precise knowledge of a limited extent of resources. For example, a *Scheme Resolver* understands where identifiers of different schemes may be resolved, but can not resolve an individual identifier to a resource. *Group Resolvers* have knowledge of where different groups of identifiers may be resolved to resources. For example, a Group Resolver would direct an ARK resolution request to the location associated with a registered NAAN. The *Resource Resolver* has precise knowledge of the location of resources associated with a limited range of identifiers such as for a specific NAAN. The Resource Resolver is often closely associated with a Resource Server, which returns a resource instance given an identifier.
The three levels of resolution described here is a broad generalization. Fewer or more levels may actually be involved in the resolution of any particular identifier. However, three levels has emerged as a somewhat typical pattern for scalable resolution of globally unique identifiers and provides a useful framework for discussion.
An example of sequential location refinement for resolution of the ARK identifier `ark:/12148/bpt6k10733944` is provided in figure 2.
```plantuml
Actor User as user
Participant "Scheme\nResolver" as sr
Participant "Group\nResolver" as gr
Participant "Resource\nResolver" as rr
Participant "Resource\nServer" as rs
user -> sr: identifiers.org/**ark:**/12148/bpt6k10733944
activate user
activate sr
sr --> user: 302 "https://n2t.net/ark:/12148/bpt6k10733944"
deactivate sr
user -> gr: n2t.net/**ark:/12148**/bpt6k10733944
activate gr
gr --> user: 302 "https://ark.bnf.fr/ark:/12148/bpt6k10733944"
deactivate gr
user -> rr: ark.bnf.fr/**ark:/12148/bpt6k10733944**
activate rr
rr --> user: 302 "https://gallica.bnf.fr/ark:/12148/bpt6k10733944"
deactivate rr
user -> rs: gallica.bnf.fr/**ark:/12148/bpt6k10733944**
activate rs
rs --> user: 200 "Aventures d'Alice au pays des merveilles / par Lewis Carroll ; illustrées par Arthur Rackham"
deactivate rs
deactivate user
```
> **Figure 2.** Generalization of identifier resolution through three levels of resolver service. The *Scheme Resolver* `identifiers.org` understands "ark:" identifiers are resolved by the *Group Resolver* `n2t.net`, which in turn understands identifiers with the prefix "12148" are resolved by the *Resource Resolver* `ark.bnf.fr`. The *Resource Resolver* in turn directs the user to the actual resource.
### Multiple Instances
Note that there may be multiple instances of Scheme, Group, and Resource Resolvers. Where content between instances overlaps, it is critical that mechanisms are in place to ensure trustworthy consistency across instances.
Inconsistent identifier detail between resolver instances could make resolution inconsistent, so reducing the value of globally unique, resolvable identifiers. For ARK identifiers, the NAAN registry provides the source of truth for prefix resolution. Hence, multiple ARK Group Resolvers may be deployed if each uses the NAAN registry as the authoritative source associating prefixes with specific Resource Resolvers. Updates to the NAAN registry would need to be propogated to Group Resolvers within a resonable time to ensure consistent behavior of all ARK Group Resolvers.
### Trusting Resolution Services
Service trustworthiness is important to avoid potential content spoofing or malicious action where an agent deliberately causes resolution to content different from content owner intent (E.g. an identifier pointing to a factual news article redirected to fake news by a malicious agent).
Since the NAAN registry is the source of truth for ARK Group Resolvers, a client could verify the integrity of a group resolution response through mechanisms such as registry revision comparison (does the Group Resolver report the same version as the NAAN registry) or by comparing results from multiple Group Resolvers. Other mechanisms such as checksum or signature comparisons could also be employed.
### Identifier as a HTTP URL
When expressing an identifier as a URL, the identifier is being directly coupled to a service, and so consideration should be given to the availability of the service and efficiency of use within the applicable context. In general, services at the finer level of detail in the resolver chain will be more ephemeral since server names and url patterns may change as services evolve, and so URLs associated directly with those services may be considered somehwat more fragile in the long term. Hence, there is some benefit to constructing URLs using services at higher levels in the resolution chain.
There is no universal, globally accepted identifier scheme resolver service, though such capability is offered by `identifiers.org` and `n2t.net`. In the example workflow presented in Figure 2. above, any of the following URLs ultimately resolve to the same resource:
1. https://identifiers.org/ark:/12148/bpt6k10733944
2. https://n2t.net/ark:/12148/bpt6k10733944
3. https://ark.bnf.fr/ark:/12148/bpt6k10733944
4. https://gallica.bnf.fr/ark:/12148/bpt6k10733944
URL 4 is most closely associated with the resource and is more efficient for the user (no redirects), though may be more ephemeral and impacted by changes to the collection infrastructure. URL 4 could be considered appropriate for use within the collection viewer application where any future changes to the URL may be managed by the application logic.
At the other extreme, URL 1 might be considered good for long term reference to the resource. Retrieval of the resource is less efficient since there are more redirects, but is arguably more resilient to change in underlying services since the resolver metadata may be updated as necessary by the resource managers, thus removing the need to change published URLs. For example, if for some reason it was necessary for the `ark.bnf.fr` to change names, then all URLs like `https://ark.bnf.fr/ark:/12148/bpt6k10733944` would need to be changed. If however, the URL `https://identifiers.org/ark:/12148/bpt6k10733944` or `https://n2t.net/ark:/12148/bpt6k10733944` were used to reference the resource, then all that is needed is for the maintainers of `n2t.net` to update the registered target for the NAAN `12148`. All existing URLs at the higher levels would continue to resolve to the expected resource.
### Content Negotiation
Resources are often available in different formats (e.g. an image may be serialized as JPEG and PNG) and may also have different representations in the same format (e.g. a JPEG serialized image may be available in multiple sizes). Content negotiation support is most likely necessary at the Resource Resolver or Resource Server level and is dependent on the type of content being served and the capabilities of the resource collection management services.
## ARK Identifier Resolution Infrastructure
The ARK NAAN registry provides the source of truth associating an ARK group (NAAN) with Resource Resolvers.
### ARK Group Resolver
The responsibilities of an ARK Group Resolver include:
1. Redirect an ARK resolution request to a registered Resource Resolver.
2. Present information about a Resource Resolver when requested.
3. Ensure group registration information is kept in sync with the master NAAN registry.
4. Perform service operations reliably.
5. Perform service operations efficiently with minimal latency.
6. Ensure operational information in Scheme Resolvers is kept up to date.
7. Return a "Not Found" error code if the requested group does not exist.
```plantuml
@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
title ARK Group Resolver Context
Person(user, "User", "A user looking to resolve an identifier")
Person(admin, "Administrator", "Group Resolver administrator maintains the service.")
System_Ext(SR, "Scheme Resolver", "Directs resolution requests to a registered Group Resolver")
System(NAAN, "NAAN Registry", "Source of truth associating ARK NAANs (prefixes) with Resource Resolvers")
System(PR, "Group (NAAN) Resolver", "Directs resolution requests to a registered Resource Resolver")
System_Ext(RR, "Resource Resolver", "Directs resolution requests to a resource or another Resource Resolver.")
Rel(NAAN, PR, "Informs")
Rel(SR, PR, "Redirect")
Rel(PR, RR, "Redirect")
Rel(user, PR, "Resolve")
Rel(admin, PR, "Manages")
@enduml
```
It is beneficial to have multiple ARK Group Resolvers deployed, and these may be registered with Scheme Resolvers to facilitate discovery. However, there is benefit to users if ARK identifiers can be consistently and conveniently represented in URL form, ideally using a well known and easily recognized domain name such as `arks.org`.
Multiple Group Resolver instances may share the DNS entry for `arks.org` and clients may be directed to different instances using different strategies such as random round-robbin, geolocation, or other client properties or a combination thereof.
The cohort of Group Resolvers participating under a single domain name would clearly require some trust and coordination of operation.
### ARK Resource Resolver
ARK Resource Resolvers handle the association of an identifier with the location of a resource for one or more ARK NAANs.
The responsibilities of an ARK Resource Resolver include:
1. Redirect an ARK resolution request to the location for retrieval of the identified resource or to another Resource Resolver.
2. Present information about a resource when requested.
3. Ensure resource location information is current.
4. Perform service operations reliably.
5. Perform service operations efficiently with minimal latency.
6. Ensure operational information in the NAAN registry is kept up to date.
7. Return a "Not Found" error code if a requested resource does not exist.
8. May support content negotiation by media type and profile if appropriate for the resource, and if so, should present hints of available representations though link headers in resolution responses.
```plantuml
@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
title ARK Prefix Resolver Context
Person(user, "User", "A user looking to resolve an identifier")
Person(admin, "Administrator", "Resource Resolver administrator maintains the service.")
System_Ext(SR, "Scheme Resolver", "Directs resolution requests to a registered Prefix Resolver")
System_Ext(NAAN, "NAAN Registry", "Source of truth associating ARK NAANs (groups) with Resource Resolvers")
System_Ext(PR, "Group (NAAN) Resolver", "Directs resolution requests to a registered Resource Resolver")
System(RR, "Resource Resolver", "Directs resolution requests to a resource or another Resource Resolver.")
System(RS, "Resource Server", "Returns a resource given an identifier.")
Rel(NAAN, PR, "Informs")
Rel(SR, PR, "Redirect")
Rel(PR, RR, "Redirect")
Rel(RR, RS, "Redirect")
Rel(user, SR, "Resolve")
Rel(user, RS, "Retrieve")
Rel(admin, RR, "Manages")
Rel(admin, NAAN, "Informs")
@enduml
```
A Resource Resolver may also choose to operate as a Group Resolver, and if doing so will need to adopt the corresponding responsibilities.
## Resolver Service API
### capabilities
Reports capabilities of the service. Open API response.
e.g. https://arks.org/api
### get
Access an identified resource. A Resource Server will return the resource. A Scheme, Group, or Resource Resolver will return a redirect to the known location of the resource.
e.g. https://arks.org/ark:/12148/bpt6k10733944
### get_metadata
Access metadata about a resource. A Resource Server will return metadata about an identified resource. A Scheme, Group, or Resource Resolver will return a redirect to the known location of the resource metadata.
e.g. Implementation specific to a Resource Server.
### get_pid_metadata
Access metadata about the identifier. A Scheme, Group, or Resource Resolver will return a document describing the service knowledge about about the identifer. The level of detail available at different types of resolver will likely differ. For example, a scheme resolver may return only metadata about the scheme of an identifier.
e.g.:
- https://arks.org/ark:/12148/bpt6k10733944??
- https://arks.org/ark:/12148/bpt6k10733944?info
- https://arks.org/.info/ark:/12148/bpt6k10733944
### list
List the resources known by the service. A Scheme Resolver should return a list of schemes. A Resource Resolver should return a list of resources known to the service. In some cases, the list may be very large so subsetting mechanisms may be required (e.g. paging, grouping by some resource or identifier property).
e.g.:
- https://arks.org/.info/
- https://arks.org/.info/ark:
- https://arks.org/.info/ark:12148