open-uri
library will do the first part of the work for us. After its inclusion in the program, the open method becomes available, which allows you to open both local files and URLs:
require 'open-uri'
url = 'http://maxelc.habrahabr.ru/'
page = open ( url )
text = page. read
/(.*)/
. /(.*)/
. Let's look at the piece of HTML code we need:
<span class="mark"><span>68,25</span><strong class="sign"></strong></span>
%r{mark"><span>(.*)</span><strong class="sign">}m
%r{}
to forget about the correct slashes (very convenient, in particular with HTML), m
at the end of the line tells Ruby to look for matches on several lines (in our case it does not matter, however, again, it is very useful in working with HTML ). To search for matches in the string, we will use the scan
method:karma = text. scan ( %r{mark"><span>(.*)</span><strong class="sign">} )
puts "Karma = #{karma}"
Hpricot
is a fast, easy-to-use, HTML-based parser that works just like that. JQuery
libraries are used to parse XML.
gem install hpricot
. Begin to code. We hpricot
in the program and load the URL, find the necessary element, wrapping everything in OOP at once:class Karma
require 'rubygems'
require 'hpricot'
require 'open-uri'
def initialize ( name )
@url = "http:\/\/" + name + ".habrahabr.ru\/" ;
@hp = Hpricot( open ( @url ))
end
def get
( @hp / "span.mark/span" ) . inner_text
end
end
karma = Karma . new ( 'maxelc' )
puts "Karma = #{karma.get}"
Hpricot(open())
just converts HTML to XML and creates methods for a variable. @hp/"span.mark"
is a shortcut from @hp.search("//span[@class='mark']")
, meaning “look for the <span class='mark'>”
(search as a parameter accepts an XPath
or CSS
expression). The inner_html
method gets the content of the element (in the case of HTML, what is enclosed in tags). By changing the request, we can go into nested tags, which we did: @hp/"span.mark/span”
.
WWW::Mechanize
comes into play.
gem install mechanize
. Write the code:require 'rubygems'
require 'mechanize'
require 'hpricot'
agent = WWW :: Mechanize . new # , .
page = agent. get 'http://habrahabr.ru/login/'
form = page. forms . first # ,
form. login = 'MaxElc' # . name HTML
form. password = '****'
page = agent. submit form #
a = agent. get ( 'http://habrahabr.ru/' ) . search ( ".//a[@href='http://maxelc.habrahabr.ru/mail/']" ) . inner_text # hpricot
puts " #{a}!"
Source: https://habr.com/ru/post/51610/