Mechanize gem
A mechanize gem which provides the data extraction in a simple and easiest way in ruby on rails.
To use this gem, install it by:
gem install mechanize
than in your controller:
require 'mechanize'
agent = Mechanize.new
# creates the mechanize object
doc = agent.get(“your_url”)
#pass your url for which you want to extract the data
web_title = agent.page.title
#this will give you title of the specified url
web_url = agent.page.uri.to_s.split("http://").to_s.split("/")[0]
#this will give you website name e.g.www.abc.com
sometimes url contains https:// rather than http:// in that case use following code:
if web_url.to_s == "https:"
#this will give you contents of the whole page
web_url = agent.page.uri.to_s.split("https://").to_s.split("/")[0]
end
html = agent.page.content
The whole contents you can parse using hpricot so don't forget to add this line in your controller: 'require hpricot'
# this will give you all p tags of specified url
doc2 = Hpricot.parse(html)
(doc2/ :p).each do |link|
puts link.attributes
end
you would have inner_html using:
p = doc2/ :p
puts p.inner_html
That's it!!