Paul’s notes on how JSON-LD works

8 min readMay 4, 2022

We all know what JSON-LD is: JSON with a @context field tacked on top, right? That’s pretty much all it is. Except sometimes you see an @id field, which, sure, that makes sense. And sometimes the @context field is multiple URLs, which seems odd — because how do they mix? Well, no worries. Just follow the documentation of whatever API you’re using, right?

That’s normally how I engage with JSON-LD, and it’s a strength of the format that I can ignore those details. However, the time has come that I need to understand more — and perhaps you need to as well! So here are my notes on JSON-LD.

The core data model

When we use JSON, it’s often being hosted by a service endpoint. It’s a representation of the internal state of that service. A JSON-LD document works that way too; it’s a representation of an underlying model. Conceptually, however, we think of JSON-LD’s model as representing “all systems” because it’s a tool for semantic cross-org exchange.

This data model underlying JSON-LD is a directed graph. The document itself is a node object. Each key in the document is an edge from the node object to other “node objects” and “value objects.”

Node object. What one might think of as an entity. Often has a URL¹ but that’s not required. (If it doesn’t have a URL, it’s called a “blank node.”)
Value object. Values like strings, numbers, lists, sets, and so on.

Let’s consider a simple example:

{
   "name": "Manu Sporny",
   "homepage": "http://manu.sporny.org/"
}

Let’s imagine this as a graph. name and homepage are our edges, and the document and the values are nodes:

Take a moment to digest this diagram. We’ve reversed a relatively simple JSON document into a 2-edge graph.

It’s a little bizarre to think of it this way, but remember that we’re looking at an underlying data model. It’s like the difference between your internal database and your service’s responses: the JSON is assembled from the rows or documents in your DB.

JSON-LD is basically asserting a global graph-database. Even if you’re actually using MySQL or CouchDB we pretend everybody’s using a graph db. If we use that metaphor of a global graph-database, then all the other aspects of JSON-LD become features of that database.

Global attribute names

If we’re building a global graph-database, then we need an unambiguous way to identify our schemas. In Datomic we’d use attribute-names like movie/title but that’s not going to fly on a global setting. We’d get conflicts straight away.

JSON-LD uses URLs¹ as attribute-names. Instead of movie/title we use something like http://example.com/movie/title.

Now in my opinion, this is a bit of a PITC². It’s very visually noisy, and I think giving every attribute a URL is overkill — we could have used a property-graph model to identify bundles of attributes under a URL. That said, let’s consider the positives:

JSON-LD does the absolute best it can to reduce the visual noise of URL keynames, and does so pretty successfully without sacrificing correctness.
The idea of a global graph database with universal schemas is still pretty cool, even if it’s been dragging around the floor since 2000-late.

So let’s look at how this works by a (slightly inaccurate³) example:

{
  "http://xmlns.com/foaf/0.1/name": "Manu Sporny",
  "http://xmlns.com/foaf/0.1/homepage": "http://manu.sporny.org/"
}

Don’t panic. JSON-LD hasn’t even reached its final form. What we’re seeing above is akin to running a SELECT statement in your SQL console.

The context

To make JSON-LD function more like JSON, we use the @context field.

{
  "@context": {
    "name": "http://xmlns.com/foaf/0.1/name",
    "homepage": "http://xmlns.com/foaf/0.1/homepage"
  },
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/"
}

The object above is equivalent to the prior object with URL keynames. The context establishes a mapping from short keys in the object (“terms”) to full URLs. Every key that we want to expand to a URL must be in the context.

It’s still quite noisy to include term mappings in the context, so JSON-LD includes the ability to give the URL of a context JSON instead.

{
  "@context": "https://json-ld.org/contexts/person.jsonld",
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/"
}

This points to an object that contains the same information you’d embed. For instance, that person.jsonld might look like this:

{
   "@context": {
     "name": "http://xmlns.com/foaf/0.1/name",
     "homepage": "http://xmlns.com/foaf/0.1/homepage"
   }
}

With URL contexts, we’re finally to a usable place! And this is likely the form you’re most familiar with as a consumer of JSON-LD, because it’s the form most projects use.

Multiple contexts may be combined using an array, which is processed in order.

{
  "@context": [
    "https://json-ld.org/contexts/person.jsonld",
    "https://json-ld.org/contexts/place.jsonld",
    {"title": "http://purl.org/dc/terms/title"}
  ],
  "homepage": "http://manu.sporny.org/",
  "title": "The Empire State Building",
  "geo": {
    "latitude": "40.75",
    "longitude": "73.98"
  }
}

The contexts merge together with last-defined-winning. In that example, "homepage" comes from person.jsonld, "geo" comes from place.jsonld, and "title" is defined inline.

If you’re using multiple contexts, you might run into a key-name conflict. The last one always wins, so we need a way to resolve conflicts. Instead of being forced to use the URL keyname, you can use a compact form via prefixes:

{
  "@context": {
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "foaf:name": "Dave Longley"
}

A prefix gets spliced in directly, so foaf:Person becomes http://xmlns.com/foaf/0.1/Person in this case.

Alternatively the @vocab keyword lets you specify a default prefix for keynames that aren’t mapped explicitly in the context.

{
  "@context": {
    "@vocab": "http://xmlns.com/foaf/0.1/"
  },
  "name": "Brew Eats"
}

This maps the same way as the prior example, but when using the @vocab you don’t have to use the colon-prefix.

Node IDs

Nodes in the graph usually have a URL¹ ID. If they don’t, they’re called a “blank node.”

We use the @id keyword to set the URL ID of a node:

{
  "@context": {
    "name": "http://schema.org/name"
  },
  "@id": "http://me.markus-lanthaler.com/",
  "name": "Markus Lanthaler"
}

Not sure there’s more to say than that.

Note: The context is not a type…

A misconception that I held for a long time was that the @context field represented the “type” of the JSON document.

It does not. A context provides definitions for the edges and nodes that your document’s key/values represent.

…The @type is a type

Nodes and values in JSON-LD can have a type which is established with the @type keyword. Types are, unsurprisingly, URLs¹.

Giving a node a type:

{
  "@context": {...},
  "@id": "http://me.markus-lanthaler.com/",
  "@type": "http://schema.org/Person",
  ...
}

Giving a value a type:

{
  "@context": {
    "modified": {
      "@id": "http://purl.org/dc/terms/modified",
      "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
    }
  },
  "modified": "2010-05-29T14:17:39+02:00"
}

You can also use terms in your context as a type:

{
  "@context": {
    "Person": "http://schema.org/Person"
  },
  "@id": "http://example.org/places#BrewEats",
  "@type": "Person"
}

Additional type information on attributes

We’ve covered how to expand our keynames to full URLs — or, more accurately, to expand the attributes of our graph edges to IRIs — and to give nodes and values a type.

JSON-LD has a few keywords for further refining the type-interpretation:

@id Indicates the value is a URL¹ reference to a node.
@list Indicates the value is an ordered list.
@set Indicates the value is an unordered set.
@nest Indicates you ignore the containing object as if its properties exist outside of it⁴.
@json Indicates the value is a regular JSON object and shouldn’t get processed as JSON-LD. The spec warns against this.

To use this type information, you put the value in an object and use the intended keyword as the key.

{
  "@context": {
    "homepage": "http://xmlns.com/foaf/0.1/homepage"
  },
  "homepage": { "@id": "http://manu.sporny.org/" }
}

That inline usage is a bit much, so you can use the context to establish the same information:

{
  "@context": {
    "homepage": {
      "@id": "http://xmlns.com/foaf/0.1/homepage",
      "@type": "@id"
    }
  },
  "homepage": "http://manu.sporny.org/"
}

This varies by the keyword. For set and list, you use @container instead of @type, and for@nest you just set the term mapping to the keyword.

{
  "@context": {
    "nestedItem": "@nest",
    "listItem":  {"@id": "...", "@container": "@list"},
    "setItem":   {"@id": "...", "@container": "@set"},
    "iriItem":   {"@id": "...", "@type": "@id"},
    "jsonItem":  {"@id": "...", "@type": "@json"}
  },
  ...
}

How do sub-objects work?

If a keyname’s value is an object, it’s considered an embedded node.

{
  "@context": {
    "@vocab": "http://xmlns.com/foaf/0.1/"
  },
  "@type": "Person",
  "name": "Manu Sporny",
  "knows": {
    "@id": "https://greggkellogg.net/foaf#me",
    "@type": "Person",
    "name": "Gregg Kellogg"
  }
}

In the example above we’re describing two nodes in the document, Manu and Gregg, and establishing a “knows” edge between them.

The “expanded form”

The JSON-LD we’ve been looking at is the “compact form,” which means it’s the representation that’s most convenient to read — both by humans and by consuming APIs.

Remember the internal data model, the graph db? We can convert a JSON-LD document into that internal model which we call the “expanded form.”

Here’s a compact form:

{
   "@context": {
      "name": "http://xmlns.com/foaf/0.1/name",
      "homepage": {
        "@id": "http://xmlns.com/foaf/0.1/homepage",
        "@type": "@id"
      }
   },
   "name": "Manu Sporny",
   "homepage": "http://manu.sporny.org/"
}

And here’s its expanded form:

[
  {
    "http://xmlns.com/foaf/0.1/name": [
      { "@value": "Manu Sporny" }
    ],
    "http://xmlns.com/foaf/0.1/homepage": [
      { "@id": "http://manu.sporny.org/" }
    ]
  }
]

A few things to notice about this form:

It’s universally consistent. The keynames have been transformed to their full URLs, and the values are always objects with the keyword-based keynames.
The “context” has been dropped. All of the information has been preserved except for the shortened “term” mappings.

Because the expanded form is consistent, code that’s processing JSON-LD from many sources may want to convert to the expanded form before operating on it.

The expanded form isn’t exactly pretty however, if you’re looking to normalize JSON-LD into a form you can work with, you can convert the input doc into the expanded form and then convert it back to the compact form using a context that you provide. This 2-phase transformation will maintain all of the information while helping to produce a predictable object-form.

Wrapping up

There are a few advanced features that I left out of these notes. This includes things like @graph and @index and localization tools. You can refer to the spec to learn more about them.

If JSON is an exchange format, then JSON-LD is a semantic exchange format. It helps applications map their data to specific global schemas. It has no opinion on the interpretation of those types — that’s up to the applications defining the schemas.

The underlying data model of JSON-LD is a graph. I imagine you could feed the “expanded form” directly into a graph database, but the more common usage is to transform it into the “compact form” that matches your application best.

Hopefully these notes will help you wrap your head around JSON-LD a bit more. They certainly helped me.

— Paul

— — —

¹ Technically JSON-LD uses IRIs, but I’m writing for people who don’t get excited about differentiating between IRIs and URLs.

² Pain In The Conciseness

³ We’re building our way to the complete JSON-LD behavior so the example here is not fully correct.

⁴ So, @nest. Remember that all properties in a JSON-LD represent an edge, so if you wanted to put a bunch of items in a sub-object then the keyname of that sub-object is actually creating an edge. The @nest keyword basically tells JSON-LD to ignore the keyname of the sub-object and not create an edge for it. See the spec for more info.