Web Scraping

Web scraping/data extraction using mechanize gem in ruby on rails

November 3, 2011

Preksha Patel

Rails, Web Scraping, Data Extraction, Mechanize gem

A mechanize gem which provides the data extraction in a simple and easiest way in ruby on rails.

To use this gem, install it by:

gem install mechanize

than in your controller:

require 'mechanize'

agent = Mechanize.new # creates the mechanize object

doc = agent.get(“your_url”) #pass your url for which you want to extract the data

web_title = agent.page.title #this will give you title of the specified url

web_url = agent.page.uri.to_s.split("http://").to_s.split("/")[0] #this will give you website name e.g.www.abc.com

sometimes url contains https:// rather than http:// in that case use following code:

if web_url.to_s == "https:" web_url = agent.page.uri.to_s.split("https://").to_s.split("/")[0] end html = agent.page.content #this will give you contents of the whole page

The whole contents you can parse using hpricot so don't forget to add this line in your controller: 'require hpricot' doc2 = Hpricot.parse(html) (doc2/ :p).each do |link| # this will give you all p tags of specified url
puts link.attributes end
you would have inner_html using:
p = doc2/ :p puts p.inner_html
That's it!!

Web Scraping

Recent Blogs

Popular Tags