--- title: EZID Resolver Design subtitle: ARK and DOI identifier resolution for the EZID service --- # EZID Resolver Design Identifier resolution is the process of determining the location of an identified resource. For the EZID application, the resolution service is provided for ARK identifiers and is accessed using the HTTP protocol, and so ARK and HTTP specifications apply. ``` PREFIX SCHEME | SUFFIX /--\ /---\ /--------------------\ http://example.org/ark:/12025/654xz321/s3/f8.05v.tiff \________________/ \__/ \___/ \______/ \____________/ (replaceable) | | | Qualifier | ARK Label | | (NMA-supported) | | | Name Mapping Authority | Name (NAA-assigned) Hostport (NMAH) | Name Assigning Authority Number (NAAN) ``` ## Operations There are two basic operations to be supported: 1. Resolution, redirection to the location registered with the identifier. 2. Instrospection, providing information about the identifier. Called "inflection" in the ARK spec. Operation requirements include: * Respond to HTTP GET and HEAD requests. * Respect HTTP protocol semantics ### Resolution The basic process of resolution is straight forward. Current situation: ```plantuml actor User as U actor Owner as C participant EZID as E participant N2T as N participant Target as T == minting == C -> E: mint ark:123/a to target/foo activate E note right of E pid = ark:123/a url = https://n2t.net/ark:123/a loc = https://target/foo end note E --> C: ok E -> N: create ark:123/a to target/foo N --> E: ok deactivate E == resolution == U -> N: ark:123/a activate N N --> U: 302, target/foo deactivate N U -> T: /foo ``` After transition of resolution to EZID: ```plantuml actor User as U actor Owner as C participant EZID as E participant Target as T == minting == C -> E: mint ark:123/a to target/foo activate E note right of E pid = ark:123/a url = https://ezid/ark:123/a loc = https://target/foo end note E --> C: ok deactivate E == resolution == U -> E: ark:123/a activate E E --> U: 302, target/foo deactivate E U -> T: /foo ``` A possible workflow for identifier resolution in EZID: ```plantuml start ->IDENTIFIER; :split identifier; ->t=SCHEME\np=PREFIX\ns=SUFFIX; if (scheme is ARK?) then (no) :response = { status: 302, location: doi.org/10.PREFIX/SUFFIX message: "doi.org" }; else (yes) if (EZID prefix?) then (yes) :find longest suffix match; if (match?) then (yes) :response = { status: 302, location: URL message: found }; else (no) :response = { status: 404, message: "Not Found" }; endif else (no) :response = { status: 404, message: "Not Found" }; endif endif :return response; stop ``` Questions: 1. Under what conditions should an existing identifier not be resolvable? * Withdrawn? * Deleted? * Target known to be unavailable? * Privacy? * Reserved Generally no change in policy, but edge cases need to be handled correctly. 2. If the prefix is not registered with EZID, should the response be 404 or redirect to another service (N2T)? * Perhaps redirect to N2T, need to ensure redirect loops don't happen * Better - provide information in the 404 response pointing to N2T for the place to go 4. What endpoint should be used for resolution? * **Suggest `ezid.cdlib.org/{PID}`** * ~~ark.ezid.cdlib.org/ark:/123/lkdfjg~~ 6. OK to present the alternate link to metadata for the identifier (i.e. inflection URL) in the response? This would provide the client with a hint that details about the identifier can be obtained through the inflection URL. It would be in a response header like: ``` Link: <https://ezid.net/info/ark:/12345/xyz>; rel="alternate"; type="application/ld+json"; profile="https://w3id.org/ark/metadata" ``` 7. Consider supporting shoulder listing, e.g. https://n2t-stg.n2t.net/ark:/99999 8. Verify suffix passthrough is working as expected (e.g. Smithsonian) 9. Existing arks will be updated to resolve to the EZID resolver location instead of the target. ### Inflection The ARK spec indicates that an inflection request can be made by including one or more question ("?") characters at the end of a request URL. The basic workflow for inflection is much the same as for resolution. The main difference being the final action once the corresponding identifier record has been located. ```plantuml start ->IDENTIFIER; :split identifier; ->t=SCHEME\np=PREFIX\ns=SUFFIX; if (EZID prefix?) then (yes) :find longest shoulder match; if (match?) then (yes) :Get record; :response = { status: 200, ID META }; else (no) -> no; :Get prefix; :response = { status: 200, PREFIX META }; endif else (no) -> no; :response = { status: 404, message: "Not Found" }; endif :return response; stop ``` Questions: 1. Can the "?" char be reliably passed through the load balancer, Apache, Django stack? * No, only a double "??" can pass through, appearing as a single "?" in the query parameters. * May be possible to intercept at the load balancer and set a custom header forwarded to Apache / Django 3. What metadata should be presented in the response? * Privacy? Only reserved identifiers have no response * Different metadata for authenticated user? Owner? Nope. 5. Can we use "profile" recognition to implement inflection at the resolve end point? * Seems sensible to support, especially since DataCite (DOIs in general) fail in support for content negotiation. 8. Content negotion should support ANVL and probably JSON? 9. What is the media type of ANVL? (check with Kunze) ## Deployment 1. Implement and deploy resolution and inflection functionality for EZID. New PIDs still have `asURL` set to N2T 2. Notify users of impending change 3. New PIDs start using EZID for `asURL` 4. Update the target URLs for EZID ARKs on N2T to point to the EZID `asURL` 5. No further updates from EZID to N2T. ## Notes on conflicts with HTTP and URI specification These notes are general to ARKs, not specific to EZID. > The Name and Qualifier parts are strings of visible ASCII characters and should be less than 128 bytes in length. The length restriction keeps the ARK short enough to append ordinary ARK request strings without running into transport restrictions (e.g., within HTTP GET requests). Characters may be letters, digits, or any of these six characters: ``` = # * + @ _ $ ``` > The following characters may also be used, but their meanings are reserved: ``` % - . / ``` ### Inflection The ARK "inflection" is meant to "change the meaning" of the identifier, to reference the metadata associated with the ARK instead of the object identified by the ARK. The specification uses a "?" to do this. Also: > "When the ARK is inflected by appending dual question marks ('??'), the returned metadata contains a commitment statement from the current provider." The question char is a reserved character in the URI specification[^uri] which creates a behavior conflict when requesting ARK inflection from an ARK resolver over HTTP by way of embedding the ARK as part of the URL. Since these "?" chars are not to be interpreted as URL query delimiters according to RFC 3986, they should be escaped as `%3F`. Here's some examples of inflection with n2t: `https://n2t.net/ark:/86084/b4057cw7z%3F`: ``` erc: who: Tevel Gitlin. Award booklet, 1946 what: IS030_GITL_003 when: (:unav) where: ark:/86084/b4057cw7z (currently https://blavatnikarchive.org/item/2964) how: (:unav) # inflections under construction # reference https://n2t.net/e/n2t_apidoc.html ``` `https://n2t.net/ark:/86084/b4057cw7z%3F%3F`: ``` erc: who: Tevel Gitlin. Award booklet, 1946 what: IS030_GITL_003 when: (:unav) where: ark:/86084/b4057cw7z (currently https://blavatnikarchive.org/item/2964) how: (:unav) id created: 2021.08.02_09:31:42 id updated: 2021.08.02_09:31:33 persistence: (:unav) # inflections under construction # reference https://n2t.net/e/n2t_apidoc.html ``` `https://n2t.net/doi:10.21239/V9F61N?`: ``` erc: who: (:unav) what: (:unav) when: (:unav) where: doi:10.21239/V9F61N (currently https://siam.invemar.org.co/download-alfresco-file/241512) how: Text # inflections under construction # reference https://n2t.net/e/n2t_apidoc.html ``` `https://n2t.net/doi:10.21239/V9F61N??`: ``` erc: who: (:unav) what: (:unav) when: (:unav) where: doi:10.21239/V9F61N (currently https://siam.invemar.org.co/download-alfresco-file/241512) how: Text datacite: <?xml version="1.0"?> <resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd"><identifier identifierType="DOI">10.21239/V9F61N</identifier><creators><creator><creatorName>Elías Alberto Blanco Mota</creatorName></creator></creators><titles><title xml:lang="spa">Informe técnico de levantamiento batimétrico Bahía de Buenaventura – Pacifico Colombiano</title></titles><publisher>INVEMAR</publisher><publicationYear>2015</publicationYear><resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>batimetria</subject></subjects><dates><date dateType="Created">2015</date></dates><language>spa</language><sizes><size>3 MB</size></sizes><formats><format>PDF</format></formats><version>1</version><rightsList><rights>CC BY 4.0</rights></rightsList><descriptions><description descriptionType="Abstract" xml:lang="spa">Levantamiento batimétrico de precisión (líneas cada 50m) de La Bahía de Buenaventura sector Cascajal y esteros aledaños, en el departamento del Valle del Cauca.&#13; La información fue tomada en el campo, en el periodo comprendido entre el 05 y 13 de mayo de 2015, incluyendo ademas perfiles de velocidad del sonido y sus ubicaciones geográficas en el área de estudio, así mismo la ubicación de ayudas a la navegación flotantes en la zona. abarcando un total 2509 Ha aproximadamente, desde una profundidad mínima promedio de -2.75 msnm hasta una máxima promedio de 19 msnm.</description></descriptions><geoLocations><geoLocation><geoLocationPlace>Buenaventura, Valle, Colombia</geoLocationPlace></geoLocation></geoLocations></resource> datacite.resourcetype: Text id created: 2018.03.07_13:12:02 id updated: 2021.05.10_12:12:36 persistence: (:unav) # inflections under construction # reference https://n2t.net/e/n2t_apidoc.html ``` `https://n2t.net/ark:/53355/cl010066723?` ``` -> https://collections.louvre.fr/ark:/53355/cl010066723 ``` `https://n2t.net/ark:/53355/cl010066723%3F%3F` ``` https://collections.louvre.fr/ark:/53355/cl010066723 ``` `http://n2t.net/ark:/65665??` ``` ark:/65665: date: 2014.08.18 manager: n2t na_policy: NP | (:unkn) unknown | 2014 | name: The Smithsonian Institution (=) TSI redirect: http://collections.nmnh.si.edu/ark:$id type: naan ark:/65665/: date: 2015.03.31 is_supershoulder: true manager: ezid minter: name: National Museum of Natural History, Smithsonian Institution - empty shoulder type: shoulder ark:/65665/n6: date: 2014.08.18 manager: ezid minter: https://n2t.net/a/ezid/m/ark/65665/n6 name: National Museum of Natural History, Smithsonian Institution type: shoulder ``` `http://n2t.net/ark:/65665/300008335-8d74-4c3f-873c-a9d8b4b3d6a8??`: ``` -> http://collections.nmnh.si.edu/id/ark:/65665/3000083358d744c3f873ca9d8b4b3d6a8?? ``` Given the ambiguities of the ARK specification and conflict with HTTP URI structure, resolver support of inflection would be better provided through an alternate request and advertised via HTTP link headers [^link-headers]. Candidate [link header relations](https://www.iana.org/assignments/link-relations/link-relations.xhtml): about : Not applicable since the current URI is about the URI provided in the header. This needs to be inverted for ARK resolvers. alternate : Viable, with URI, type, profile, and optional title, lang. **This is the preferred option.** describes, describedBy : Encumbered by the POWDER spec, which appears to be essentially unused. related : Burdened by ATOM spec, not specific enough. Recommendation: Use the HTTP Link Header response to advertise availability of ARK metadata. For example: ``` GET https://ezid.net/resolve/ark:/12345/xyz HTTP/1.1 302 Found Location: https://example.net/data/12345/xyz Link: <https://ezid.net/info/ark:/12345/xyz>; rel="alternate"; type="application/ld+json"; profile="https://w3id.org/ark/metadata"; ``` Provides that an alternate representation serialized in JSON-LD according to the profile identified by `https://w3id.org/ark/metadata` is available from URL `https://ezid.net/info/ark:/12345/xyz` [^uri]: https://datatracker.ietf.org/doc/html/rfc3986#section-2.2 ### Forward Slashes and Periods ARKs use the forward slash character "/" as a delimiter. Slashes are not as problematic as the question chars specified for inflection. Basically everything starting at the beginning of `ark:` should be handled by the resolver. Note however: > The characters `/' and `.' are ignored if either appears as the last character of an ARK. > ARK Spec § 2.6 ### Hash Characters The hash character "`#`" is allowed as a character in an ARK. When appearing in a URI, the `#` denotes a URI fragment, and URI fragments are not transmitted as part of a URL request sent to the server by a client. This means that the portion of the ARK starting from the `#` char will never be received by the resolver service unless it is percent encoded, i.e. `%23`. ### Hyphens Hyphens are completely arbitrary and meaningless in ARK identifiers: > Hyphens are considered to be insignificant and are always ignored in ARKs. A '-' (hyphen) may appear in an ARK for readability, or it may have crept in during the formatting and wrapping of text, but it must be ignored in lexical comparisons. As in a telephone number, hyphens have no meaning in an ARK. It is always safe for an NMA that receives an ARK to remove any hyphens found in it. > ARK Spec § 2.6 ## Implementation The EZID resolver will be implemented as a HTTP service. ### HTTP Methods The principle HTTP method to be supported for resolution is GET. HEAD and POST should also be supported. #### GET The `GET` method requests transfer of a current selected representation for the target resource. - https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.1 - https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/GET > "Request strings too long for GET may be sent using HTTP's POST command." > ARK Spec § 5.2. #### HEAD The `HEAD` method is identical to `GET` except that the server MUST NOT send a message body in the response - https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.2 - https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD #### POST See GET. For ARKs, the POST is considered the same as GET, just for longer request strings. It is OK for a POST response to return a 302 status and redirect information. ### Caching Caching can significantly improve performance and reduce server load. At a minimum, the resolver should provide time stamps (e.g. `Last-Modified` response header) indicating when a resource was last modified. ## ARK Normalization ARK Spec 2.7: Normalization of an ARK for the purpose of octet-by-octet equality comparison with another ARK consists of four steps. First, any upper case letters in the "ark:" label and the two characters following a '%' are converted to lower case. The case of all other letters inthe ARK string must be preserved. Second, any NMAH part is removed(everything from an initial "http://" up to the next slash) and allhyphens are removed. Third, structural characters (slash and period) are normalized.Initial and final occurrences are removed, and two structuralcharacters in a row (e.g., // or ./) are replaced by the firstcharacter, iterating until each occurrence has at least one non-structural character on either side. Finally, if there are any components with a period on the left and a slash on the right, either the component and the preceding period must be moved to the end of the Name part or the ARK must be thrown out as malformed. The fourth and final step is to arrange the suffixes in ASCII collating sequence (that is, to sort them) and to remove duplicate suffixes, if any. It is also permissible to throw out ARKs for which the suffixes are not sorted. [^uri-parts]: https://datatracker.ietf.org/doc/html/rfc3986#section-3 [^link-headers]: https://datatracker.ietf.org/doc/html/rfc8288 ## References - [ARK General Info](https://arks.org/about/) - [ARK Specification v.18](https://www.ietf.org/archive/id/draft-kunze-ark-18.txt) - [HTTP Specification](https://datatracker.ietf.org/doc/html/rfc7231) - [Link Headers]()