---
title: A Sour ARK
keywords: "identifier, ARK, HTTP, URL"
author: Dave
date: 2021-06-01
---
# A Sour ARK <svg height='50px' width='50px' fill="#000000" xmlns="http://www.w3.org/2000/svg" data-name="LINE BLACK" viewBox="0 0 48 48" x="0px" y="0px"><title>emoticon, emoji. emotion, sour, acid</title><path d="M24,1A23,23,0,1,0,47,24,23,23,0,0,0,24,1Zm0,44A21,21,0,1,1,45,24,21,21,0,0,1,24,45Zm3.71-16.29L25.41,31l2.3,2.29a1,1,0,0,1,0,1.42,1,1,0,0,1-1.42,0L24,32.41l-2.29,2.3a1,1,0,0,1-1.42,0,1,1,0,0,1,0-1.42L22.59,31l-2.3-2.29a1,1,0,0,1,1.42-1.42L24,29.59l2.29-2.3a1,1,0,0,1,1.42,1.42ZM20,18a1,1,0,0,1-.4.8l-4,3a1,1,0,0,1-.6.2,1,1,0,0,1-.8-.4,1,1,0,0,1,.2-1.4L17.33,18,14.4,15.8a1,1,0,0,1,1.2-1.6l4,3A1,1,0,0,1,20,18Zm13.6-2.2L30.67,18l2.93,2.2a1,1,0,0,1,.2,1.4,1,1,0,0,1-.8.4,1,1,0,0,1-.6-.2l-4-3a1,1,0,0,1,0-1.6l4-3a1,1,0,1,1,1.2,1.6Z"></path></svg>
In the article "Internet of Samples (iSamples): Toward an interdisciplinary cyberinfrastructure for material samples"[^1], reference to an ARK identifier is made as:
````
http://n2t.net/ark:/65665/3a63356e5–953a-4666-a25f-60270f7f1dcf
````
This is not the same character string as the original ARK, even though it looks the same. The benign looking string actually contains the Unicode character u2013, or "en-dash"[^2].
````
┌─ u0045 ─┐
↓ ↓ ↓
http://n2t.net/ark:/65665/3a63356e5–953a-4666-a25f-60270f7f1dcf
↑
└─ u2013
````
The name of the ARK, "3a63356e5–953a-4666-a25f-60270f7f1dcf" was adjusted, perhaps in a word processor, replacing one of the ASCII hyphens with the similar looking "en-dash" character.
The URL associated with the ARK has the en-dash appropriately URL escaped as `%E2%80%93`[^uurl]:
````
http://n2t.net/ark:/65665/3a63356e5%E2%80%93953a-4666-a25f-60270f7f1dcf
````
Resolving the ARK with `n2t` results in an apparently satisfactory result to the user:
````
> GET: http://n2t.net/ark:/65665/3a63356e5%E2%80%93953a-4666-a25f-60270f7f1dcf
< 302 http://collections.nmnh.si.edu/id/ark:/65665/3a63356e5–953a4666a25f60270f7f1dcf
< 0.1473 sec
> GET: http://collections.nmnh.si.edu/id/ark:/65665/3a63356e5%E2%80%93953a4666a25f60270f7f1dcf
< 302 https://collections.nmnh.si.edu/id/ark:/65665/3a63356e5%E2%80%93953a4666a25f60270f7f1dcf
< 0.0324 sec
> GET: https://collections.nmnh.si.edu/id/ark:/65665/3a63356e5%E2%80%93953a4666a25f60270f7f1dcf
< 302 https://collections.nmnh.si.edu/search/ark/?ark=ark:/65665/3a63356e5%e2%80%93953a4666a25f60270f7f1dcf
< 0.0805 sec
> GET: https://collections.nmnh.si.edu/search/ark/?ark=ark:/65665/3a63356e5%e2%80%93953a4666a25f60270f7f1dcf
< 302 https://collections.nmnh.si.edu/search/?ark=ark:/65665/3a63356e5%E2%80%93953a4666a25f60270f7f1dcf
< 0.0295 sec
> GET: https://collections.nmnh.si.edu/search/?ark=ark:/65665/3a63356e5%E2%80%93953a4666a25f60270f7f1dcf
< 200 text/html; charset=UTF-8
< 0.6293 sec
````
However note that the encoded en-dash is carried through, whereas the hyphens have been dropped[^hyph]. The final result is an apparently OK 200 status with the URL, however viewing that page in a browser provides:

So there's actually a few things going wrong here:
1. The ARK is malformed in the publication
The authors should verify that like all content in the article, any identifiers used are correct. The publication workflow should not adjust any characters in data elements such as identifiers.
2. `N2T` forwards the request to the registered NAAN with the malformed name.
Section 2.8 of the [ARK specification](https://www.ietf.org/archive/id/draft-kunze-ark-18.txt) stipulates that "The Name and Qualifier parts are strings of visible ASCII characters and should be less than 128 bytes in length."
`N2T` should probably fail the request since the ARK is malformed[^arkchars].
3. The final target provides a `200 OK` status when in fact it should be a `404 Not Found` status.
A service providing a resolution end point should assume that human as well as programmatic mechanisms will be used to access content. In this case, there is no indication of a failure to resolve except some text (which disappears after a few seconds) on the landing page. The service should respond with an appropriate HTTP status code, and may continue to present the same landing page. This will inform both programmatic and human users of the error.
[^1]: [https://doi.org/10.1093/gigascience/giab028](https://doi.org/10.1093/gigascience/giab028)
[^2]: `U+2013:` [EN DASH](https://en.wikipedia.org/wiki/Dash#En_dash)
[^uurl]: A URL (Uniform Resource Locator) consists of ASCII characters[^url], hence any Unicode must be appropriately escaped. IRI (Internationalized Resource Identifier) may include Unicode characters[^iri] and are supported in HTML5 and later.
[^url]: Uniform Resource Locators (URL) § 2.2 [rfc1738](https://datatracker.ietf.org/doc/html/rfc1738#section-2.2)
[^iri]: Internationalized Resource Identifiers (IRIs) [rfc3987](https://datatracker.ietf.org/doc/html/rfc3987)
[^hyph]: Hyphens have no meaning in ARKs and may be dropped with no loss of information, see §2.6 of [The ARK Identifier Scheme](https://www.ietf.org/archive/id/draft-kunze-ark-18.txt).
[^arkchars]: ARK identifiers must match `^(ark\:)/*[0-9A-Za-z]+(?:/[\w/.=*+@\$-]*)?(?:\?.*)?$`