HTML 5 Species Taxon Microdata Using Darwin Core

I just added HTML 5 species taxon microdata to worldspecies.org. This provides machine readable data for certain properties of a species. ie. Taxonomy, common names, synonyms. The data is embedded in the HTML and is tagged as property name/value pairs.

An example of the extracted data can be seen using Google's Rich Snippets Testing Tool.

Overview

HTML 5 introduces the concept of tagged content microdata that it is machine readable using a specific vocabulary. The content is grouped into items that have properties. The type and properties of the items are governed by the vocabulary. The item and properties can be attached to existing HTML tags in the content or new tags can be created. The <div> tag is the most common for an item. The <span> tag is the most common for a property. If the property value should not be displayed to the browser user, then the <meta> tag can be used with a ''content'' attribute.

Rather than invent my own vocabulary, I went searching for an existing vocabulary. The vocabularies that major search engines support can be found at schema.org. They do not currently have one that supports species taxonomy.

Further searching yielded the Darwin Core at Biodiversity Information Standards - TDWG. This is a vocabulary used for the exchange of species information. The Taxon class is perfect for representing species taxonomy information.

I added two additional properties to the Taxon class: superFamily and synonym. SuperFamily is used by the ITIS Catalog of Life in their taxonomy. This is where worldspecies.org gets its taxonomy from. Synonyms are represented in the Darwin Core as another Taxon item instead of as a property of the official species item.

Vocabulary

Item Type: http://rs.tdwg.org/dwc/terms/Taxon
Properties

  • kingdom: Kingdom
  • phylum: Phylum
  • class: Class
  • order: Order
  • http://globalspecies.org/terms/superFamily: Super family
  • family: Family
  • genus: Genus
  • specificEpithet: Species part of scientific name
  • infraspecificEpithet: Infraspecies part of scientific name
  • scientificName: Full species name or higher taxon name ex. 1) Puma concolor 2) Chordata
  • taxonRank: Enumeration of - kingdom, phylum, class, order, superfamily, family, genus, species, infraspecies
  • vernacularName: Common name. Can have multiple.
  • http://globalspecies.org/terms/synonym: Equivalent scientific name text or a http://rs.tdwg.org/dwc/terms/Taxon. Can have multiple.

Darwin Core has additional properties. The above properties are the ones used for worldspecies.org.

Example 1 (Species Puma concolor)

<div itemscope="1" itemtype="http://rs.tdwg.org/dwc/terms/Taxon">
<meta itemprop='kingdom' content='Animalia' />
<meta itemprop='phylum' content='Chordata' />
<meta itemprop='class' content='Mammalia' />
<meta itemprop='order' content='Carnivora' />
<meta itemprop='family' content='Felidae' />
<meta itemprop='genus' content='Puma' />
<meta itemprop='specificEpithet' content='concolor' />
<meta itemprop='taxonRank' content='species' />
<h1 itemprop='scientificName'>Puma concolor</h1>
<meta itemprop='vernacularName' content='Cougar' />
<meta itemprop='vernacularName' content='Puma' />
<meta itemprop='vernacularName' content='Mountain lion' />
<h2>Synonyms</h2>
<ul>
<li itemprop='http://globalspecies.org/terms/synonym'>Felis concolor</li>
</ul>
</div>

itemscope="1" defines the start of an item.%%%
itemtype="http://rs.tdwg.org/dwc/terms/Taxon" defines the item type.%%%
itemprop='scientificName' defines a property.

If you look at the Puma concolor page on worldspecies.org you may wonder why tags are used for the kingdom, vernacularName, etc. when the data is actually being displayed on the page and could be added to the <a> tag or wrapped in a tag.

The reason that the kingdom property was not added to the <a> tag is that the property value of an <a> tag is the ''href'' attribute, not the contents of the <a> tag.

Ex. <a href="/ntaxa/109518">Animalia</a>

The microdata value for this property is '/ntaxa/109518' instead of 'Animalia'. Other tags that behave this way include <img alt="" /> and <object width="300" height="150">. The vernacularName properties are not <span> tags because the php code that outputs them is shared by other code that does not need the microdata markup.

Example 2 (Phylum Chordata)

<div itemscope="1" itemtype="http://rs.tdwg.org/dwc/terms/Taxon">
<meta itemprop='kingdom' content='Animalia' />
<meta itemprop='phylum' content='Chordata' />
<meta itemprop='taxonRank' content='phylum' />
<h1 itemprop='scientificName'>Chordata</h1>
Common Name: <span itemprop='vernacularName'>Vertebrates and tunicates</span>
</div>