A mechanize gem which provides the data extraction in a simple and easiest way in ruby on rails.
To use this gem, install it by:
gem install mechanize
than in your controller:
agent = Mechanize.new # creates the mechanize object
doc = agent.get(“your_url”) #pass your url for which you want to extract the data
web_title = agent.page.title #this will give you title of the specified url
web_url = agent.page.uri.to_s.split("http://").to_s.split("/") #this will give you website name e.g. www.abc.com
sometimes url contains https:// rather than http:// in that case use following code:
if web_url.to_s == "https:"
web_url = agent.page.uri.to_s.split("https://").to_s.split("/")
html = agent.page.content #this will give you contents of the whole page
The whole contents you can parse using hpricot so don't forget to add this line in your controller: 'require hpricot'
doc2 = Hpricot.parse(html)
(doc2/ :p).each do |link| # this will give you all p tags of specified url
you would have inner_html using:
p = doc2/ :p