I am a ruby on rails programmer so my natural flow is to use active record and ruby to solve just about all my programs. So here is what I use
- Ruby 2.0.0+ and Rails
- Linux Ubuntu 12.04
- Chrome Dev Tools
- Nokogiri gem- Parse HTML into a neat easy data structure
- Watir Webdriver gem – Browser Scraping
- StreetAddress gem – Parses addresses using an old perl algorithm.
Important extra gems and versions that work together well as of October 2013
gem ‘nokogiri’, ‘1.5.6’
gem ‘watir-webdriver’, ‘0.6.2’
gem ‘StreetAddress’, ‘1.0.3’, :require => "street_address"
- Presentation – Determine how the website is giving the data (table, iframe, etc)
- Navigation – Determine how to cycle through the website to gather more pages of data
During Part 2 I will be focusing on the presentation part of this and how to write some ruby code to grab data from websites.