Introduction
Many have probably heard about the Qt cross-platform library. About the WebKit web page display engine all the more. Not so long ago, the first one began to contain a wrapper over the second, examples of creating browsers in 50 lines are not difficult to find. However, not much is written about how to access individual elements of a web page from a Qt code.
In this description, I assume that people have basic knowledge of PyQt (I taught in Summerfield), and a vague idea of ​​JavaScript. I characterize my level exactly that way, so I apologize in advance for mistakes, especially in the description of the java script. Despite the fact that C ++ / Qt programmers should not be used as a language for Python.
Test examples run on PyQt-4.7.3, version of Python-2.6.6-r1 under the GNU / Linux OS. From the programs you need a browser with debugging JS (Chrome, for example) and PyQt IDE at your discretion, I use Eric4.
Example 1. The browser, which we will scoff
# -*- coding: utf-8 -*-
from PyQt4.QtCore import *
from PyQt4.QtNetwork import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class BaseBrowser(QWidget):
def __init__(self, parent = None):
super(BaseBrowser, self).__init__(parent)
self.__progress = 0
QNetworkProxyFactory.setUseSystemConfiguration(True)
self.webView = QWebView()
self.webView.load(QUrl( "http://www.yandex.ru" ))
self.connect(self.webView, SIGNAL( "loadFinished(bool)" ), self.adjustLocation)
self.connect(self.webView, SIGNAL( "titleChanged(QString)" ), self.adjustTitle)
self.connect(self.webView, SIGNAL( "loadProgress(int)" ), self.setProgress)
self.connect(self.webView, SIGNAL( "loadFinished(bool)" ), self.finishLoading)
self.locationEdit = QLineEdit()
self.locationEdit.setSizePolicy(QSizePolicy.Expanding, self.locationEdit.sizePolicy().verticalPolicy())
self.connect(self.locationEdit, SIGNAL( "returnPressed()" ), self.changeLocation)
self.goButton = QPushButton( "Go" )
self.connect(self.goButton, SIGNAL( "clicked()" ), self.changeLocation)
self.layout = QGridLayout(self)
self.layout.addWidget(self.locationEdit, 0, 0)
self.layout.addWidget(self.goButton, 0, 1)
self.layout.addWidget(self.webView, 1, 0, 1, 2)
self.setLayout(self.layout)
def adjustLocation(self):
self.locationEdit.setText(self.webView.url().toString())
def changeLocation(self):
url = self.locationEdit.text()
if url[0:7] != 'http://' :
url = 'http://' + url
self.webView.load(QUrl(url))
self.webView.setFocus()
def adjustTitle(self):
if self.__progress <= 0 or self.__progress >= 100:
self.setWindowTitle(self.webView.title())
else :
self.setWindowTitle(QString( "%1 (%2%)" ).arg(self.webView.title()).arg(self.__progress))
def setProgress(self, p):
self.__progress = p
self.adjustTitle()
def finishLoading(self):
self.__progress = 100
self.adjustTitle()
if __name__ == "__main__" :
import sys
app = QApplication(sys.argv)
prog = BaseBrowser()
prog.show()
sys.exit(app.exec_())
* This source code was highlighted with Source Code Highlighter .
')
Example 1 code:
pastebin.com/GVQ4dw1MThe browser presents a variation on the browser theme from the C ++ / Qt and PyQt tutorial examples, in the next two examples we will inherit it. I understand that programs, even small ones, do not write, and the program should not be one class, but I keep the balance between the amount of code, its visibility and correctness of the architecture as I can.
So, our browser is not able to do much, but it can load and display the entered page, the QWebView widget is used for this, standard signals created by this widget are tied to the slots of our browser, which allows the program to know the program to change the title of the current SIGNAL web page (“titleChanged (QString) "), the progress of loading SIGNAL (" loadProgress (int) ") and the end of loading - SIGNAL (" loadFinished (bool) "). In addition, a QlineEdit field is created to enter the address of the page and a button to go to this web page, either by pressing “Enter” or clicking on the button.
We launch the browser, try it in work, ofigam on the speed of the work of the “bare” WebKit. So far we have not written anything special. Our browser does not even follow links by any.
Example 2. DOM trees and access to their elements from Qt
In general, it would be better to read about the structure of HTML pages separately, to describe this in two sentences is problematic. In general, if you make an offline shell to any web interface, the Java script will still need to learn, at least that part of it that relates to data access. So, any modern browser allows you to access the contents of a web page by presenting it in a tree of nodes, each node of which is an element, attribute, text, graphic, or any other object. The nodes are interconnected by a parent-child relationship (yes, this is a string from Wikipedia). Using the JavaScript interpreter, the nodes of this tree can be accessed. Let's open our browser and go to the same yandex.ru (I hope they will not be covered with a habra effect). How many links do you see above the search box?
Click on the list of links and open them in the developer’s menu (in Chrome, this is “check item” in the context list). So we will see the position of the current element in the tree. The list has a simple id = "tabs" and is a table. Switch to the JavaScript console and try selecting this table:
document.getElementById("tabs").
Look at how many lines in it:
document.getElementById("tabs").rows.length
And how many columns:
document.getElementById("tabs").rows(0).cells.length.
Now we get the same result in our browser.
# -*- coding: utf-8 -*-
from basebrowser import *
class SimpleJavaScript(BaseBrowser):
def __init__(self, parent = None):
super(SimpleJavaScript, self).__init__(parent)
self.jsButton = QPushButton( "ExecuteJS" )
self.connect(self.jsButton, SIGNAL( "clicked()" ), self.jsScript)
self.jsStringEdit = QLineEdit()
self.jsStringEdit.setSizePolicy(QSizePolicy.Expanding, self.jsStringEdit.sizePolicy().verticalPolicy())
self.jsStringEdit.setText( "document.getElementById(\"tabs\").rows(0).cells.length" )
self.connect(self.jsStringEdit, SIGNAL( "returnPressed()" ), self.jsScript)
self.jsReturnText = QTextEdit()
self.layout.addWidget(self.jsStringEdit, 2, 0, 1, 1)
self.layout.addWidget(self.jsButton, 2, 1, 1, 1)
self.layout.addWidget(self.jsReturnText, 3, 0, 1, 2)
def jsScript(self):
jsString = self.jsStringEdit.text()
jsReturn = self.webView.page().currentFrame().evaluateJavaScript(jsString)
self.jsReturnText.setPlainText(jsReturn.toString())
if __name__ == "__main__" :
import sys
app = QApplication(sys.argv)
ui = SimpleJavaScript()
ui.show()
sys.exit(app.exec_())
* This source code was highlighted with Source Code Highlighter .
Example 2 code:
pastebin.com/p4P1ZEtSSo the JS code is calculated in the webView.page (). CurrentFrame (). EvaluateJavaScript (jsString) function
The evaluateJavaScript function (string) takes as its only argument a QString string containing the JavaScript code. This code will be executed on the current page and the result will be returned as a QVariant variable. At the same time, unfortunately, you will not be able to get the subtree of DOM elements as a result, but any textual or numeric information is welcome.
Example 3. Creating offline controls
This time, the address of the homepage is chosen because I have an ATI card and I am sitting under Linux, who knows, he will understand that this is not from big love. In fact, there are many Select controls on the page, for one of which we will create an equivalent.
# -*- coding: utf-8 -*-
from basebrowser import *
from PyQt4.QtGui import *
from PyQt4.QtCore import *
class JSSelectList(QAbstractListModel):
def __init__ (self, _id, _jsFunc, parent = None):
super(JSSelectList, self).__init__(parent)
self.id = _id
self.jsFunc = _jsFunc
def data(self, index, role=Qt.DisplayRole):
if not index.isValid():
return QVariant()
if role == Qt.DisplayRole:
jsstring = QString( "document.getElementById('%1').options[%2].textContent" ).arg(self.id).arg(index.row())
jsreturn = self.jsFunc(jsstring)
return jsreturn.toString().trimmed()
def rowCount(self, index=QModelIndex()):
jsstring = QString( "document.getElementById('%1').length" ).arg(self.id)
jsreturn = self.jsFunc(jsstring)
ok = False
count, ok = jsreturn.toInt()
return count if ok else 0
def headerData(self, section, orientation, role=Qt.DisplayRole):
if role != Qt.DisplayRole:
return QVariant()
else :
return self.id
class JSComboBoxDemo(BaseBrowser):
def __init__(self, parent = None):
super(JSComboBoxDemo, self).__init__(parent)
self.vendorComboBox = QComboBox()
id = QString( "productLine" )
self.vendorListModel = JSSelectList(id, self.webView.page().currentFrame().evaluateJavaScript)
self.vendorComboBox.setModel(self.vendorListModel)
self.connect(self.vendorComboBox, SIGNAL( "currentIndexChanged(int)" ), self.setSelectOnWebPage);
self.connect(self.webView, SIGNAL( "loadFinished(bool)" ), self.initComboBox)
self.layout.addWidget(self.vendorComboBox, 2, 0, 1, 1)
self.webView.load(QUrl( "http://www.amd.com" ))
def setSelectOnWebPage(self, new_id):
jsstring = QString( "document.getElementById('productLine').selectedIndex=%1" ).arg(new_id)
self.webView.page().currentFrame().evaluateJavaScript(jsstring)
def initComboBox(self):
self.vendorComboBox.setCurrentIndex(0)
if __name__ == "__main__" :
import sys
app = QApplication(sys.argv)
ui = JSComboBoxDemo()
ui.show()
sys.exit(app.exec_())
* This source code was highlighted with Source Code Highlighter .
Example 3 code:
pastebin.com/YzA9hL3HWhen creating such GUI elements as a table, list, dropdown (I don’t know how to translate correctly), Qt allows you to use the convenient MVC approach. You only need to describe the access to your data model - you only need to inherit your data representation from the built-in abstract class and attach it to the standard control (for Summerfield, this seems to be chapter 14). In this case, QAbstractListModel is used, from the parameters it is passed only the JS execution function and the name select ʻa on the page. All override standards.
In the example itself, everything is also quite clear, except for two signal-slot connections that I would like to draw your attention to.
First, it is useless to try to execute JavaScript before the page loads, so we’ll use the fact that when the download finishes, the QWebView widget generates the SIGNAL ("loadFinished (bool)") signal, which I already mentioned in the first example.
self.connect(self.webView, SIGNAL("loadFinished(bool)"), self.initComboBox)
Otherwise, if you push a string
self.vendorComboBox.setCurrentIndex(0)
in __init__, no initialization of the first value will occur - evaluateJavaScript will not return anything, since the page will not have time to load yet.
Secondly, we need synchronization in both directions:
self.connect(self.vendorComboBox, SIGNAL("currentIndexChanged(int)"), self.setSelectOnWebPage)
Similarly, you can synchronize almost all the information on the page, click buttons, download information.
I would be glad if the information will be useful for someone. Merry Christmas and all the New Year.
References:
J. Blanchett, M. Summerfield. Qt 4: GUI programming in C ++.
Mark Summerfield. Rapid GUI Programming with Python and Qt.
Other sources:
Various Internet sites on JavaScript and PyQt, the source code of the Internet browser Arora.