Understand Schema.org – What is it

It is a shared markup vocabulary that makes it easier for webmasters to markup HTML pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right Web pages. The recommendation can be implemented in different syntax variations, such as microdata, microformats or JSON-LD.

This post is the first in a series which aim to help you understand schema.org using common scenarios. The examples of applying structured data markup uses a Home and Construction business to provide concrete examples.

Why use Schema.org

Easier for Publishers

While metadata has existed from beginning of the HTML standard, schema.org is a single vocabulary and markup syntax that is supported by the major search engines. This means webmasters no longer have to make tradeoffs between syntax nor do they have implement multiple vocabularies. This saves time and improves the effectiveness.

Easier for Search Engines

The justification for schema.org is that it makes it easier for the search engines to understand and categorize your web content. This makes them better able to match your content to users’ queries. As well, consolidating on a single standard means less work for them.

Rich Snippets

Rich snippets are extra lines of information shown to users in the search results. RIch snippets are created by the search engines for some types of structured data, like microdata, RDFa or Schema.org markup. The extra line of information is meant to provide extra information to the user to help them decide to go to that website. It could the 5 star rating, the price or dates of events. Google shows this extra information for approximately 10 different types of objects. Rich snippets are an informative visual design cue that increases click-through rates. Here are some samples.





User search clarity

Users experience better search experience in two ways. First, the search engines have a better idea of the content they’re supplying to the users, the matches are better. Second, more rich snippets provide a glimpse of the content very quickly and can save them time. Together, users benefit greatly from the use of Schema.org.

Challenges with Schema.org

  • Websites built with AJAX are difficult for crawlers. If you have content that uses a webservice and loaded while the page loads then you have to make a significant effort to create HTML snapshots. The snapshots are provided only for the search crawlers and this is only feasible for medium to large companies that have developers who can spend a few weeks doing this.
  • Publishers do the hard work to supply the additional metadata and its main consumers, the search engines do not provide any outsides with the data they’ve collected.
  • Publishers risk supplying such good data that the search engines may directly answer the user queries without sending the user to your website or without attributing the answer.
  • Schema.org changes (mostly extensions, but somethings become deprecated) and publishers will need to update their markup from time to time
  • Schema.org was developed by Google, Yahoo!, Bing and Yandex and the wider community was informed after the fact. The ongoing development of the standard is still a product of these companies. However, there is a formal request process so you can influence the direction of extensions if you have broad industry support.
  • The biggest benefit today is rich snippets and only some of the entity types generate rich snippets.
  • Schema.org has some idiosyncracies
    • A person (alive, dead, undead, or fictional). What is undead but not fictional?
    • A review can review a Review.

Check back soon for the next release in this Schema.org series:

Mark van Berkel is Founder and President of Hunch Manifest Inc. While managing business operations he also leads the team in designing semantic technology to provide personalized online presence and reputation management services. Prior to forming the company, he was consultant in enterprise software projects to companies including Panasonic, Shell, and General Electric and was an Architect for a world leading human capital management software-as-a-service. Mark holds a Bachelor of Information Systems from StFX University and did his graduate studies at University of Toronto, getting a MEng Industrial Information Engineering and an MBA, Strategy and Innovation from the Rotman School of Management. In 2006 he published a 170 page report as a researcher at the Semantic Technologies Lab at the University of Toronto and built a semantic technology prototype for SAP Research Labs. Connect with Mark on LinkedIn or Twitter. Mark is also certified in Google Analytics.