Home Blog Introduction to Structured Data

Introduction to Structured Data

20th June 2016

The importance of real text on a webpage is now more important than ever before. A few years ago designers attempted to overcome the limitations of the web by using images to replace ‘boring’ text. Aesthetically it was great, allowing designers full control over all aspects of typography but it made it impossible for machines to understand the content of the page. It didn’t take long to realise the obvious shortcomings and today it seems like a distant memory. In today’s web, structured data has introduced a extra layer of compatibility between human and machine.

There are 4 widely recognised structured data formats:

  1. Microformats
  2. Microdata
  3. RDFa
  4. JSON

Microformats

Microformats use classes to markup key content. Put a class of “vevent” on an element and it becomes an event. There are, of course, specific parts you need to make something an event: A namedate and location.

Therefore, marking up an event could look like this:

<div class=”vevent”>
  <h1 class=”fn summary”>Party at my house</h1>
  <p><abbr class=”dtstart” title=”2016-01-12”><time>12 January 2016, </time></abbr><abbr class=”duration” title=”1D”></abbr></p>
  <p class=”location”>Belfast</p>
</div>

Microdata

Microdata was one of the initiatives that came with HTML5. A standardised, internationally recognised system that avoided the problems caused by Microformats: It doesn’t use class names and tends to be easier to construct and debug.

Microdata extends HTML and can use any language – or vocabulary. One of the most widely recognised is Schema.org. Virtually anything can be marked-up with generally accepted language. I say “generally”, there have been several vocabularies used and suggested over the years. Microdata is capable of using any of them, but the Schema.org vocab is surfacing and now widely accepted amongst the search engines.

This is an ever-growing system that aims to make sense of content and relationships between content, even more so than Microformats.

Microdata uses four HTML attributes to define the structures:

  1. itemscope
    Marks out the limits for the particular content. Everything related to, or needed, for this structure needs to be contained within an element with this attribute.
  2. itemtype
    Defines the type of structure you’re going for, including the Vocabulary used e.g.“http://schema.org/Event” or “http://schema.org/WebPage
  3. itemprop
    This is a property of the structure. It can either be a piece of data or it can be another nested structure.
  4. content
    If the data is not part of the normal HTML, it can be expressed in a content attribute.

The best way to explain this is by an example. Let’s go back to our Event.

<div itemscope itemtype=”http://schema.org/Event”>
  <h1 itemprop=”name”>Party at my house</h1>
  <p><time itemprop=”startDate” datetime=”2016-01-12”>12 January 2016, </time><meta itemprop=”duration” content=”1D” /></p>
  <p itemprop=”location” itemscope itemtype=”http://schema.org/Place”>
    <span itemprop=”address” itemscope itemtype=”http://schema.org/PostalAddress”>
      <span itemprop=”addressLocality”>Belfast</span>
    </span>
  </p>
</div>

RDFa

RDFa is a W3C recommended attribute extender that over its life has used several different vocabularies, including Dublin Core, as well as schema.org. It uses a similar system to mark up HTML as Microformats. It does not have an equivalent for itemscope.

  1. vocab
    This defines the language the structured data will be in. If your HTML document uses exclusively one type of vocabulary, this can be defined in the body, or even the html element of your site. So, for example:
    <html vocab=”http://schema.org”>
  2. typeof
    Similar to itemtype, though this just needs the actual name of the structure
    <div typeof=”Event”>
  3. property
    Identical to itemprop
  4. content
    Identical to content in Microdata

Again, let’s take our example and mark it up with a Schema.org structure, but in RDFa format this time:

<div vocab=”http://schema.org” typeof=”Event”>
  <h1 property=”name”>Party at my house</h1>
  <p><time property=”startDate” datetime=”2016-01-12”>12 January 2016, </time><meta property=”duration” content=”1D” /></p>
  <p property=”location” typeof=”Place”>
    <span property=”address” typeof=”PostalAddress”>
      <span property=”addressLocality”>Belfast</span>
    </span>
  </p>
</div>

JSON-LD

This is a pairing system that separates the structured data completely from the HTML. From the coder’s perspective, this is much easier to create and maintain, since it’s just a dedicated block of code, and is widely supported now by Google and other search engines, but it is important that the details in the JSON block match what’s in the HTML. If this doesn’t happen, you are at risk of being penalised.

Let’s see how our Event looks in JSON-LD format.

<script type=”application/ld+json”>
{
  “context”: “http://schema.org”,
  “@type”: “Event”,
  “name”: “Party at my house”,
  “startDate”: “2016-01-12”,
  “duration”: “1D”,
  “location”:
  {
    “@type”: “Place”,
    “address”:
    {
      “@type”: “PostalAddress”,
      “addressLocality”: “Belfast”
    }
  }
}
</script>

It might be tempting to use this method every time, though it should probably only be used when adding structure to your HTML impairs its quality. Generally though, quality HTML should be friendly to Microdata, RDFa etc.

So you might be thinking, which one should I use. You have two choices: get behind one method and go with that, or use a combination. Using a combination might be a better catch-all solution, but it does inevitably result is slightly more bloated code. Though, it’s probably better that your page is packed with useful, readable, and reusable content. Web design is communication anyway.

If you’re interested, this is what your Event would look like with all the bells and whistles: Microformats (new and old version), Microdata and RDFa.

<div vocab=”http://schema.org” typeof=”Event” itemscope itemtype=”http://schema.org/Event” class=”vevent h-event”>
  <h1 property=”name” itemprop=”name” class=”fn p-name”>Party at my house</h1>
  <p><time property=”startDate” itemprop=”startDate” datetime=”2016-01-12”>12 January 2016, </time><meta property=”duration” itemprop=”duration” content=”1D” /><abbr class=”dtstart dt-start” title=”2016-01-12”></abbr><abbr class=”duration p-duration” title=”1D”></abbr></p>
  <p property=”location” itemprop=”location” typeof=”Place” itemscope itemtype=”http://schema.org/Place” class=”location p-location”>
    <span class=”vcard h-card”>
      <span property=”address” itemprop=”address” typeof=”PostalAddress” itemscope itemtype=”http://schema.org/PostalAddress” class=”adr h-adr>
        <span property=”addressLocality” itemprop=”addressLocality” class=”p-locality”>Belfast</span>
      </span>
    </span>
  </p>
</div>

What it looks like

How does this appear on Google results? Here is an example of an Event in search results.

Google structured data example image

Sometimes Google inserts beefier, more informative boxes into search results which really focuses the user on key information. This is for Tin House Coffee

Google information box for Tin House Coffee

This is not only useful for Google. Structure data can be extracted and reused. Little Chrome Extensions, for instance, can take content on a page and turn it into things like contact entries for your records:

Chrome contact extension

Organic v Structured

Does this structured data allow too much separation from the human-readable HTML? How well does it help your site rank in Search Engines?

Google is a major advocate for Structured Data. It helps them create useful results, bringing actual information out into its pages before the link to your site is even clicked on. If you rely on people being “on” your site, this might not be as good for you as it is for Google. If however you are selling something, or just using your website as a way of getting information “out”, this could be a very useful weapon in your arsenal.

There are some things you need be careful of. There might be a temptation to have different content in your structured markup than your human-readable content. The search engines are aware of this and will work hard to make sure your structured data compliments your webpages. They don’t want you to play the system, they want you to help them to help you, so being honest with your content really helps. Don’t focus all your attention on structured data whilst ignoring or under baking your page designs. They should really go hand-in-hand.

Structured data is a useful tool, and arming your webpages with it in a way that’s logical, sensible and with the end-user in mind, you should be on your way to creating more “three-dimensional” websites.

Related Articles


on 20th June 2016