# For HackMD et al:
tags: Blog, PIDs, ISBN, Metadata
# When is a persistent identifier not persistent? Or an identifier?
Ever wondered what that bar code on the back of every book is?
It's an ISBN:
an International Standard Book Number.
Every modern book published has an ISBN,
which uniquely identifies that book,
and anyone publishing a book can [get an ISBN for it][get ISBN]
whether an individual or a huge publishing house.
It's a little more complex than that in practice
but generally speaking it's
1 book, 1 ISBN.
[get ISBN]: https://www.bl.uk/help/get-an-isbn-or-issn-for-your-publication
If you search an online catalogue,
such as [WorldCat][9780393073775-WC]
or [The British Library][9780393073775-BL]
for the ISBN `9780393073775`
(or the 10-digit equivalent, `0393073777`)
you'll find results appear for two completely different books:
1. Waal FD. The Bonobo and the Atheist: In Search of Humanism Among the Primates. New York: W. W. Norton & Co.; 2013. 304 p. <http://www.worldcat.org/oclc/1167414372>
2. Lodge HC. The Storm Has Many Eyes; a Personal Narrative. 1st edition. New York: New York Norton; 1973. <http://www.worldcat.org/oclc/989188234>
In fact, things are so confused
that the cover of one book gets pulled in for the other as well.
Investigate further and you'll see that it's not a glitch:
both books have been assigned the same ISBN.
Others have found the same:
> "However, if the books do not match, it's usually one of two issues. First, if it is the same book but with a different cover, then it is likely the ISBN was reused for a later/earlier reprinting. ... In the other case of duplicate ISBNs, it may be that an ISBN was reused on a completely different book. This shouldn't happen because ISBNs are supposed to be unique, but exceptions have been found."
> --- [GoodReads Librarian Manual: ISBN-10, ISBN-13 and ASINS](https://help.goodreads.com/s/article/Librarian-Manual-ISBN-10-ISBN-13-and-ASINS)
While most publishers stick to the rules about never reusing an ISBN,
it's apparently common knowledge in the book trade
that ISBNs from old books get reused for newer books,
sometimes accidentally (due to a typo),
sometimes intentionally (to save money),
and that has some tricky consequences.
I recently attended a webinar entitled
["Identifiers in Heritage Collections - how embedded are they?"][Webinar]
from the [Persistent Identifiers as IRO Infrastructure ("HeritagePIDs") project][HeritagePIDs],
part of [AHRC's Towards a National Collection programme][TaNC].
As quite often happens,
the question was raised:
**what Persistent Identifier (PID) should we use for books**
and **why can't we just use ISBNs**?
Rod Page, who gave the [demo that prompted this discussion][demo slides],
also wrote a short follow-up blog post
[about what makes PIDs work (or not)][Rod blog post]
which is worth a look before you read the rest of this.
[demo slides]: http://pid-demonstrator.herokuapp.com/demo/
[Rod blog post]: https://iphylo.blogspot.com/2020/07/persistent-identifiers-demo-and-rant.html?m=1
These are really valid questions
and worth considering in more detail,
and to do that we need to understand what makes a PID special.
We call them **persistent**,
and indeed we expect some sort of guarantee
that a PID remains valid for the long term,
so that we can use it as a link or placeholder for the referent
without worrying that the link will get broken.
But we also expect PIDs to be **actionable**:
it can be made into a valid URL by following some rules:
so that we can directly obtain the object referenced
or at least some information about it.
Actionability implies two further properties:
an actionable identifier must be
guaranteed to have only one identifier for a given object
(of a given type); and
guaranteed that a single identifier refers to only one object
Where does this leave us with ISBNs?
Well first up they're not actionable to start with:
given an ISBN,
there's no canonical way to obtain information about the book referenced,
although in practice there are a number of databases that can help.
There is, in fact,
an actionable ISBN standard:
[ISBN-A] permits converting an ISBN into a DOI
with all the benefits of the underlying DOI and Handle infrastructure.
Sadly, creation of an ISBN-A isn't automatic
and publishers have to explicitly create the ISBN-A DOI
in addition to the already-create ISBN;
More than that though,
it's hard to make them actionable since
ISBNs fail on both uniqueness and unambiguity.
as seen in the example I gave above,
ISBNs do get recycled,
They're not supposed to be:
> "Once assigned to a monographic publication, an ISBN can never be reused to identify another monographic publication, even if the original ISBN is found to have been assigned in error."
--- International ISBN Agency. ISBN Users’ Manual [Internet]. Seventh Edition. London, UK: International ISBN Agency; 2017 [cited 2020 Jul 23]. Available from: <https://www.isbn-international.org/content/isbn-users-manual>
Yet they are,
so we can't rely on their precision[^2].
[^2]: Actually, as my colleague pointed out,
even DOIs potentially have this problem,
although I feel they can mitigate it better
with metadata that allows
rich expression of relationships between DOIs.
and perhaps more problematic in day-to-day use,
a given book may have multiple ISBNs.
To an extent this is reasonable:
different editions of the same book may have different content,
or at the very least different page numbering,
so a PID should be able to distinguish these for accurate citation.
Unfortunately the same edition of the same book
will frequently have multiple ISBNs;
in particular each different format
(hardback, paperback, large print, ePub, MOBI, PDF, ...)
is expected to have a distinct ISBN.
Even if all that changes is the publisher,
a new ISBN is still created:
> "We recently encountered a case where a publisher had licensed a book to another publisher for a different geographical market. Both books used the same ISBN. If the publisher of the book changes (even if nothing else about the book has changed), the ISBN must also change."
--- [Everything you wanted to know about the ISBN but were too afraid to ask](https://www.linkedin.com/pulse/everything-you-wanted-know-isbn-were-too-afraid-ask-leonard-fernandes/)
Again, this is reasonable
since the ISBN is primarily intended for stockkeeping by book sellers[^1],
and for them the difference between a hardback and paperback is important
because they differ in price if nothing else.
This has bitten more than one librarian
when trying to merge data from two different sources
(such as usage and pricing)
using the ISBN as the "obvious" merge key.
It makes [bibliometrics] harder too,
since you can't easily pull out a list
of all citations of a given edition in the literature,
just from a single ISBN.
[^1]: In fact, the newer ISBN-13 standard is simply an ISBN-10
encoded as an "International Article Number",
the standard barcode format for almost all retail products,
by sticking the "Bookland" country code of 978 on the front
and recalculating the check digit.
So where does this leave us?
I'm not really sure yet.
ISBNs as they are currently specified and used by the book industry
aren't really fit for purpose as a PID.
But they're there and they sort-of work
and establishing a more robust PID for books
would need commitment and co-operation
from authors, publishers and libraries.
That's not impossible:
a lot of work has been done recently
to [make the ISSN (International Standard Serial Number, for journals) more actionable][ISSN].
But perhaps there are other options.
Where publishers, booksellers and libraries
are primarily interested in IDs for stock management,
authors, researchers and scholarly communications librarians
are more interested in the scholarly record as a whole
and tracking the flow of ideas (and credit for those)
which is where PIDs come into their own.
Is there an argument for a coalition of these groups
to establish a parallel identifier system for citation & credit
that's truly persistent?
It wouldn't be the first time:
[ISNIs (International Standard Name Identifiers)][ISNI] and
[ORCIDs (Open Researcher and Contributor IDs)][ORCID]
both identify people,
but for different purposes in different roles
and with robust metadata linking the two where possible.
I'm not sure where I'm going with this train of thought
so I'll leave it there for now,
but I'm sure I'll be back.
The more I dig into this the more there is to find,
including the mysterious, long-forgotten and no-longer accessible
[Book Item & Component Identifier proposal][BICI].
In the meantime,
if you want a persistent identifier
and aren't sure which one you need
these [Guides to Choosing a Persistent Identifier][choosing]
from Project FREYA
should get you started.