Mysqldump to TSV Conversion Using Flex

First a little background information. I wanted to import an 8 Gig Wikipedia table dump for a project that I was working on. The table was too big to import all of the records on my Linode server. I only needed certain rows and columns, so I tried parsing the dump file in PHP. This worked, but it was much too slow. So then I went looking for text processor programs. This turned up awk, grep, lex and sed. awk, grep and sed are all line based processors. These would not work because the dump file has multiple database records per line. That left lex and its successor flex.

Flex is a lexical analyzer that looks for patterns and outputs whatever you want when a pattern is matched. The output code is written in C and Flex generates a C program that you compile into an executable. I wanted to convert the dump file to a tab-separated value (TSV) file because that is easy to import in PHP. I also wanted to only read certain rows and columns, so I piped the output into awk to strip rows and columns.

See the attached “Mysl Flex” file below for the flex configuration.

Some things to note about the configuration:

  • The standard C fileio routines only support a maximum file size of 2 Gig. The line #define _FILE_OFFSET_BITS 64 enables large file support.
  • Three types of output are supported: 1) A filename based on the table name. 2) A specified filename. 3) stdout
  • The {TABLESTART} pattern section sets the state to INTABLE and preps the output file if necessary.
  • The <INTABLE> state section controls the  table row processing.
  • The<INROW> state section controls the row field processing.
  • The <INSTRING> state section controls the string value ‘ escaping.
  • The . and \n patterns cause the non-record insert data in the file to be ignored.

The command to generate the C code is: flex -Cf -8 mysql.flex

  • The -Cf optimizes for speed.
  • The -8 enables 8-bit support. This is needed for Unicode and hi-ASCII characters.

The command to compile the C code is: gcc lex.yy.c -lfl -o mysqlparse

Usage: ./mysqlparse <inputfile> <outputfile>

  • <inputfile> – Mysqldump file to convert to tsv formatted rows
  • <outputfile> – (Optional) tsv output file. If not specified, output files are written to the same directory as <inputfile> and are named using the mysql table name + .tsv; A dash (-) means write to stdout

See the attached “Template Links Awk” file below for the awk configuration.

Some things to note about the configuration:

  • The BEGIN { FS = “\t”; OFS = “\t” } line sets the input and output field delimiters to a tab character.
  • The {if ($2 == 10) print $1, $3} line outputs the first and third fields when the second field equals 10.

Example command chain: ./mysqlparse dump.sql – | awk -f templatelinks.awk – >output.tsv

Attachments

30 thoughts on “Mysqldump to TSV Conversion Using Flex”

  1. I’m truly enjoying the design and layout of your website. It’s a very easy on the eyes which makes it much more pleasant for me to come here and visit more often. Did you hire out a designer to create your theme? Exceptional work!

  2. I would like to show some appreciation to this writer just for bailing me out of this predicament. As a result of scouting throughout the the web and coming across notions which are not beneficial, I believed my entire life was gone. Being alive minus the approaches to the difficulties you’ve fixed all through your main write-up is a serious case, and the kind that would have in a negative way affected my entire career if I had not discovered the blog. The knowledge and kindness in dealing with almost everything was helpful. I am not sure what I would’ve done if I hadn’t come upon such a point like this. It’s possible to now relish my future. Thanks a lot so much for your reliable and result oriented help. I won’t think twice to refer your web sites to any person who ought to have support on this area.

  3. hey there and thank you for your information – I’ve certainly picked up anything new from right here. I did however expertise a few technical points using this site, since I experienced to reload the website many times previous to I could get it to load correctly. I had been wondering if your web hosting is OK? Not that I am complaining, but slow loading instances times will often affect your placement in google and can damage your high quality score if advertising and marketing with Adwords. Well I’m adding this RSS to my email and can look out for a lot more of your respective fascinating content. Make sure you update this again very soon..

  4. Instantly create an app for your website with the click of buttons! Appy Pie, is an unrivalled leader in the mobile app bandwagon that allows everyone to transform their app ideas into reality, without any technical knowledge.

    Simply drag and drop the features and create an advanced Android or iOS application for mobiles and smartphones, as easy as a pie.

    Start creating your app on the fly & even convert your website into an app with Appy Pie.

    http://www.tkqlhce.com/click-8095449-11578080

  5. Instantly create an app for your website with the click of buttons! Appy Pie, is an unrivalled leader in the mobile app bandwagon that allows everyone to transform their app ideas into reality, without any technical knowledge.

    Simply drag and drop the features and create an advanced Android or iOS application for mobiles and smartphones, as easy as a pie.

    Start creating your app on the fly & even convert your website into an app with Appy Pie.

    http://www.tkqlhce.com/click-8095449-11578080

  6. Good day! I know this is kinda off topic but I’d figured I’d ask. Would you be interested in trading links or maybe guest authoring a blog post or vice-versa? My website discusses a lot of the same topics as yours and I think we could greatly benefit from each other. If you happen to be interested feel free to send me an email. I look forward to hearing from you! Awesome blog by the way!

  7. I know this if off topic but I’m looking into starting my own blog and was curious what all is required to get set up? I’m assuming having a blog like yours would cost a pretty penny? I’m not very internet savvy so I’m not 100 positive. Any suggestions or advice would be greatly appreciated. Kudos

  8. I have not checked in here for some time since I thought it was getting boring, but the last several posts are great quality so I guess I will add you back to my daily bloglist. You deserve it my friend 🙂

  9. Hi there! Quick question that’s totally off topic. Do you know how to make your site mobile friendly? My website looks weird when viewing from my iphone4. I’m trying to find a template or plugin that might be able to resolve this problem. If you have any suggestions, please share. Thank you!

  10. Hello there, just changed into alert to your blog thru Google, and located that it is really informative. I am going to watch out for brussels. I will be grateful in the event you continue this in future. Numerous other people might be benefited out of your writing. Cheers!

  11. I was very pleased to find this web-site.I wanted to thanks for your time for this wonderful read!! I definitely enjoying every little bit of it and I have you bookmarked to check out new stuff you blog post.

  12. I’m really loving the theme/design of your website. Do you ever run into any internet browser compatibility issues? A few of my blog readers have complained about my website not operating correctly in Explorer but looks great in Safari. Do you have any suggestions to help fix this problem?

  13. Throughout this great design of things you actually receive a B+ for effort. Exactly where you confused us was first on the specifics. You know, as the maxim goes, details make or break the argument.. And it couldn’t be more accurate at this point. Having said that, permit me inform you just what did work. Your text is quite powerful and that is most likely the reason why I am taking the effort in order to comment. I do not really make it a regular habit of doing that. Next, even though I can see a jumps in logic you make, I am definitely not confident of how you appear to unite your details that make the actual final result. For the moment I will subscribe to your position but trust in the near future you connect the dots better.

  14. Hi, i read your blog from time to time and i own a similar one and i was just curious if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me crazy so any support is very much appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *