All posts by bamyers99

Mysqldump to TSV Conversion Using Flex

First a little background information. I wanted to import an 8 Gig Wikipedia table dump for a project that I was working on. The table was too big to import all of the records on my Linode server. I only needed certain rows and columns, so I tried parsing the dump file in PHP. This worked, but it was much too slow. So then I went looking for text processor programs. This turned up awk, grep, lex and sed. awk, grep and sed are all line based processors. These would not work because the dump file has multiple database records per line. That left lex and its successor flex.
Continue reading Mysqldump to TSV Conversion Using Flex

HTML 5 Species Taxon Microdata Using Darwin Core

I just added HTML 5 species taxon microdata to worldspecies.org. This provides machine readable data for certain properties of a species. ie. Taxonomy, common names, synonyms. The data is embedded in the HTML and is tagged as property name/value pairs.

An example of the extracted data can be seen using Google's Rich Snippets Testing Tool.
Continue reading HTML 5 Species Taxon Microdata Using Darwin Core