Introduction and Aim
The GS1 Digital Link standard [DL] offers a means through which identifiers that exist offline can be resolved to multiple, related online resources. In the simplest example, a barcode is scanned to extract the identifier which is then resolved to a Web page that describes the barcoded item. This superficial example is only scratching the surface of much more powerful underlying mechanism. It was designed to service the needs of the GS1 community (manufacturers, supply chains and retailers), but the principles do not depend on the GS1 system and can readily be transferred to other identification systems.
It has much in common with existing resolvable identifier systems such as DOI and ORCiD, and the emerging work on Decentralized Identifiers [DID], as well as existing applications such as HATEOAS. As explained elsewhere [HOI], GS1 Digital Link does not introduce any new technology, but it does use and combine existing technologies from the Web and elsewhere in a specific way.
The purpose of this document is to record the principles that underpin the GS1 Digital Link standard such that they can be adopted by others wishing to combine their identification systems with the Web in similar fashion. In particular, the use of a
linkType query parameter to instruct a Web server to redirect requests to a target URL with a matching link relation type is proposed as a potentially powerful mechanism.
A brief overview of GS1 Digital Link
The standard provides a structure that allows GS1 identifiers to be encoded as HTTP URIs. The potential is that every item that carries one or more GS1 identifiers can be looked up online. Where the identifiers are in their native syntax, i.e. not as a Web URI, they can be converted using a simple algorithm. Likewise, where they are encoded as a Web URI, those identifiers can be extracted and passed on to existing systems as now. Crucially, when those Web URIs are encoded in, say, a QR code or NFC tag, they can also be scanned by a smartphone without needing a specialist app, since mobile Web browsers can access the smartphone camera and open source toolkits exist for the decoding of URIs from QR codes.
Products, shipments, medicines and other items that typically use GS1 identifiers are likely to be associated with multiple related online information resources, such as instruction manuals, patient safety leaflets, master data, recipe ideas, a variety of images and video assets, a Web page that describes the item, and so on. Whilst a GS1 Digital Link URI may point to a single resource, just like any other URL, it is most useful and most functional if it points to an intermediate resolver service.
The GS1 Digital Link standard defines a type of resolver that uses link relation types, as well as things like human language and media types, to annotate links to those resources in a way that can be accessed programmatically. Resolvers can be queried in three ways:
- A default redirect, no special interaction required. For products, it is expected that the manufacturer will, by default, direct requests to the consumer-facing product information page and this is what would be returned from a simple scan of a QR code on a pack.
- An app can request a specific type of information, that is, the request can specify a particular link relation type. If available, the resolver’s default behavior is overridden and the request is redirected to the resource that matches the specified link type.
- An app can simply ask for all available links associated with the item.
The standard includes further features, including a sophisticated compression/decompression algorithm and detailed semantics. This means that for every identifier there is an equivalent RDF property, a defined value space, and compression/decompression parameters. The approach taken can certainly be applied elsewhere but the details of the GS1 Digital Link standard are specific to the GS1 system and are not easily transferrable to non-GS1 environments.
It should also be noted that, at the time of writing, there remain a number of features that have yet to be defined in the standard, the evolution of which is expected to continue until at least the end of 2021.
The following sections detail a series of principles used in the design of GS1 Digital Link that it is believed will be most readily applied elsewhere with the potential for at least a degree of interoperability across a range of identification systems, as well as a broader implementation community.
Do no harm
After nearly 50 years, the GS1 system of identifiers is very mature and widely implemented. The GS1 Digital Link standard introduces a new syntax for those identifiers, matching that of a URL, but does not affect their semantics in any way. An important result is that a set of GS1 identifiers in its traditional syntax and a GS1 Digital Link HTTP URI carry exactly the same informational content.
No online lookup necessary
Even when encoded in a Web URI, it must be possible to access GS1 identifiers without making an online look up. It cannot be the case that, for example, a point of sale scanner fails because of a problem with internet connectivity.
This principle is closely related to the next.
The resolver is not part of the identifier
This is where we need to be very careful as the domain name is very much a part of an HTTP URI. It is the authority over the rest of the identifier and every domain name is sovereign. This is emphasized in URI Design and Ownership [BCP190] that makes clear that standards should not constrain URI design beyond a specific authority. However, GS1 Digital Link includes a number of features that mitigate the apparent breach of best practice.
Firstly, each resolver is sovereign. There is no requirement that every resolver redirects to the same resources. This is in contrast to, for example, DOIs where each resolver is part of a synchronized network that returns the same response. Whether you resolve
10.1103/PhysRevD.89.032002 at https://doi.org/10.1103/PhysRevD.89.032002 or https://hdl.handle.net/10.1103/PhysRevD.89.032002 you are redirected to the same item or set of resources.
The thinking at GS1 is that different resolvers can serve different communities. As an example, a resolver set up to serve the needs of people with a dietary intolerance might point to information specifically about the allergens in a particular food item. This information content will almost certainly also be available from the manufacturer’s website but might not be presented in the same way (e.g. human-readable vs machine-interpretable Linked Data) or as prominently.
Secondly, the standard does explicitly warn that just because a URL looks like a GS1 Digital Link URI it may not be and that developers should therefore make no assumptions based purely on the structure of the URL. As noted earlier, a GS1 Digital Link URI can serve not only as a locator for information but as a Linked Data identifier within RDF triples.
Thirdly, the standard mandates that all conformant resolvers MUST include a Resolver Description File at a well-known location, so it is possible to check, a priori, whether a given GS1 Digital Link URI includes the address of a conformant resolver. The structure of that Resolver Description File is tailored to the GS1 context but a more general version could be defined.
Finally, by making sure that every resolver is sovereign, and that applications may use any resolver they choose, there is no single point of failure. The resolver at id.gs1.org is cited directly in the standard and is the basis for canonical URIs, but there is no requirement that it should be used by an application. It is intended to be available as a 'resolver of last resort' especially when the appropriate domain name (e.g. of the brand owner) is initially unknown and the canonical GS1 Digital Link URI is constructed from data read from a barcode that lacks such information about domain name. Separate from the standard, but as an important business decision, the resolver at id.gs1.org will only redirect to online resources specified by the respective licensee of that particular GS1 identification key (typically the brand owner for a product).
Identifiers in the path, attributes in the query string
The example 2D barcode below carries 4 pieces of information. The first is the Global Trade Item Number (GTIN) which is the identifier of the type of item (at the time of writing, this is the number you see encoded in a linear barcode that goes beep at the checkout). Then there is the batch/lot number and the serial number of the specific item. All these contribute to the identity of the item.
The fourth data field is not an identifier in the usual sense of the word, but an attribute. That is, a characteristic of the item, in this case its expiry date. Other attributes in the GS1 system include things like measured weight, delivery locations and shipment contents.
The common distinction between identifiers on the one hand and properties on the other, is not clear cut. GS1 Digital Link and other GS1 data sharing standards need to recognize the difference between identifiers and descriptive attributes. In informational terms, the standard treats GTIN as a class, the batch/lot as a sub class, and a serial number as an instance identifier. Thus there is a hierarchy and therefore a natural order to those identifiers, since it is the combination of GTIN and batch/lot identifier or GTIN and serial number that is globally unambiguous. For this reason, they are encoded in the URI path thus:
Since attributes like expiry dates and measured weights have no specific order (an item that expires on 2022-12-25 and weighs 500g is no different from one that weighs 500g and expires on 2022-12-25), they appear in the URI query string. Therefore, the complete GS1 Digital Link URI that carries the same information as the 2D barcode above is
(When encoding in barcodes, GS1 uses a truncated YYMMDD date format which is sufficient for its needs rather than the full ISO 8601 or xsd:date format).
A question that is often raised is why the type of identifier and its value are not concatenated into a single URI path without hierarchical path delimiters ( / ). There are several reasons for this.
It makes a clear distinction between the parameter and its value, which makes it easier to write code that uses the structure.
The hierarchical structure is intuitive from the perspective of RESTful resource collections; by removing each successive /key/value pair from the URI path information from right to left, we can obtain a broader resource / resource collection. For example, given:
it is fairly intuitive that the following may be URIs of broader resources, although this is really only a naming convention:
- This hierarchical URI structure is particularly useful when using Linked Data techniques to express machine-interpretable facts about the identified thing at different levels of granularity, e.g. master data / product specifications that apply for every object with that product GTIN vs details such as date of production that are specific to a batch/lot or warranty registration details that are specific to an individual serial number for that product GTIN.
- Only some combinations of identifiers are permitted within the GS1 system. For example, a batch/lot number makes no sense when the primary identifier is a location, and it’s perhaps inappropriate to apply a Best Before End Date to a Global Service Relation Number, typically used to identify a patient in a hospital. Keeping identifier types – parameters – separate from their values makes the validation of identifier combinations easier to manage.
- Separation makes it possible for resolvers to make statements about its handling of the identifier types themselves (using Linked Data techniques). Version 1.1 of the GS1 Digital Link standard includes a chapter about the semantic inferences that can be made automatically, including data value transformation rules where appropriate, e.g. to transform a truncated YYMMDD date value into an xsd:date format that can be more widely understood. As a result, for each GS1 Application Identifier a Linked Data property exists within the GS1 Web vocabulary [WebVoc].
The formal structure of a GS1 Digital Link URI is provided in very precise detail using ABNF grammar [RFC7405]. It includes detail of all the valid values for each of the GS1 identifiers (whether numeric only, alphanumeric, limited character sets and lengths etc.). As a result, the GS1-specific details are not readily transferrable but the overall structure is transferrable and a generic workflow for constructing equivalent ABNF grammars can be defined for use with other identification systems, using the same design principles.
For every link, there is a minimum set of metadata
In order to be able to make sense of multiple resources associated with a given identifier, each link MUST have associated metadata. As a minimum, and most crucially, this includes the link relation type (referred to as the link type in the GS1 standard). It also requires a human readable title so that applications are able to present end users with appropriate text for a hyperlink.
Optional additional metadata includes the language of the target resource, its media type and a further ‘context’ variable, the value space for which is not defined to allow maximum flexibility. A typical use for the context variable would be the territory to which the link applies or perhaps a particular configuration of an online service that complies with a specific regulation, especially when that same service endpoint might support multiple configurations for different regulations in different jurisdictions.
The standard draws heavily on the Web Linking standard [RFC8288] to define the metadata associated with each link and takes advantage of its support for Compact URI Expressions [CURIE] as custom link relation types. GS1 defines its set of link relation types in its Web vocabulary [WebVoc] and recognizes terms from schema.org.
Three ways to use a resolver
For every identified item, there is a default defined target resource
In the GS1 context, it is important that consumers are able to scan a QR code or NFC tag containing a GS1 Digital Link URI without needing any specialist software and receive ‘something useful.’ It was considered that a list of links would not meet that criteria and that therefore the default action of a resolver should be to redirect to one of the available target URLs. Thus, for every identifier (or set of identifiers), there MUST be a default redirect. Redirection to any other resource must be in response to a request for a specific link type.
Such a default is defined primarily in terms of the link type – that is, for a given identifier, redirection to a link of the defined link type is the default response, but this may vary according to the other available parameters, notably language and media type, which are transmitted in the user agent’s HTTP request.
Resolvers can be instructed to redirect to a target resource of a specific type
The GS1 Digital Link standard defines a URI query string parameter of
linkType that carries instructions to the resolver. Values for the parameter are Linked Data properties defined within the GS1 Web vocabulary [WebVoc], for example
gs1:pip for a link to a 'Product Information Page.'
The value of the
linkType parameter can be a URI (either in full or as a CURIE), or the word
all (see next principle). If the resolver has a link available that matches that link type, then it should redirect to it (with an HTTP status code of 307). The resolver may, of course, take note of additional factors like language, media type and context. Whilst the user agent’s preference for language and media type are transmitted via the HTTP request headers, the
context parameter can only be transmitted as part of the URI query string. Support for
lang as a query parameter is optional.
Resolvers can be instructed to return all available links to related resources
If the value of the
linkType parameter is
all, then the resolver MUST NOT redirect. Instead, it MUST return a list of all the available links associated with the identified item. The GS1 standard goes on to require that the list MUST be available as a JSON object, SHOULD be available as an HTML page and MAY be available in other serializations, such as JSON-LD. The choice of which format is returned is determined through content negotiation.
At the time of writing, the exact structure of the JSON object has not been defined. Such a definition is a priority for the working group. A particular implementation challenge has been how to treat language, media type and context as being of equivalent importance, i.e. with no hierarchy.
All available links must be exposed through the HTTP Link header
In addition to the
linkType=all command, GS1 conformant resolvers also expose all available links through the HTTP Link header even when redirecting. This makes it possible for applications to make a simple HTTP HEAD request, with redirection suppressed, and receive all the available information with great efficiency. Again, the Web Linking standard [RFC8288] is an important reference.
Incoming query strings are passed through
Resolvers take note of the identifiers in the path and are not required to take any notice of any attributes included in the query string. Target resources may well process query string parameters, however. For example, a target Web page might respond to the presence of an expiry date in the query string and give an appropriate warning to the user if the item has expired. Therefore, it is important that incoming query string parameters are passed on in full.
The vast majority of online resources will ignore any query string parameters they don’t understand. Therefore, passing the query string on to the target resource will not normally be a problem. However, there are instances where it does matter, that is, where a website or API does break if query string parameters are included that aren’t recognized. Therefore, a GS1 conformant resolver offers the option of not forwarding the query string.
This also covers the edge case where a target resource uses parameters called
context to mean something different from the GS1 Digital Link usage. The belief is that these cases will be very much the exception, not the norm, and that therefore, it is ‘safe’ to work on the assumption that, by default, query strings can be passed on.
The query string is an extension point
Any well-written standard has an extension mechanism. As noted, the GS1 Digital Link standard includes a very precise definition of how to construct URIs based on the existing GS1 identification system, but it also allows arbitrary parameters and their values to be included in the URI query string. The only restriction is that those additional parameters don’t clash with the GS1 system.
The GS1 Digital Link standard contains a great deal of detail not included here. Some of that might also be appropriate in other contexts but the 11 principles cited above seem to be the most likely to be important in any similar context. GS1 would be interested in developing these ideas further in a more general purpose standard.
- URI Design and Ownership Best Current Practice 190. IETF RFC7320. Mark Nottingham, July 2014
- CURIE Syntax 1.0 A syntax for expressing Compact URIs. W3C Working Group Note. Mark Birbeck, Shane McCarron. 16 December 2010
- Decentralized Identifiers (DIDs) v1.0 Core architecture, data model, and representations. W3C Working Draft. Drummond Reed, Manu Sporny, Markus Sabadello. 8 April 2020.
- GS1 Digital Link 1.1 (PDF). GS1 Standard. Mark Harrison, Phil Archer et al. Ratified February 2020
- A history of the ideas in GS1 Digital Link Phil Archer, 25 November 2019
- Augmented BNF for Syntax Specifications: ABNF. IETF RFC. D. Crocker, P. Overell. January 2008
- Web Linking IETF RFC8288. Mark Nottingham, October 2017.
- GS1 Web Vocabulary. GS1. Eric Kauz, Mark Harrison. 2013