📜 ⬆️ ⬇️

Capturing downloadable resources in QtWebKit or how I saddled a unicorn under dubstep


Habrahabr::Instance()->hello();


I have not written anything for a long time, for a long time. But last week I was pretty sweating ** with the QtWebkit 5.1 module and decided that it would be good to tell you what kind of gloom is waiting for you there, in case you want to try capturing an image from the screen or something like that.

In fact, my task was to make a browser that saves all the images from all the pages it is browsing. Elementary task, at first glance: hang the handler on a separate thread that iterates over all QWebElement by the “img” selector and draws their contents (QWebElement :: render ()) via QPainter to QImage, which, in turn, is saved to a file.
')
But it turned out that not everything is so simple, unfortunately. About the path of the samurai, which I used to complete the task set forth by me under the cut of this post. Enjoy your meal!


Stage 1. Problem


I implemented the algorithm given in the previous paragraph on the very latest Qt 5 from Git on a Mac, compiling Clang 64-bit. In general, the algorithm did not work. All saved images were either black rectangles or hell trash. It was then that I remembered that there is an example from the delivery of Qt 5, which implements a similar functionality. I quickly assembled it and applied it exactly as stated in the readme. Does not work. The official example, oddly enough, does not work. Tested on Linux - the same.

And what to do? And nothing, this thing does not work and it is not clear why. I did not have time to deal with this, so I was looking for alternative solutions. I tried, attention, monsieur of temptation in the studio way with the transfer of the image to the backend through JavaScript. The method is quite simple - we take a picture, draw it on the canvas and send the contents of the canvas to base64 on the backend. There we decrypt, clean and translate into a clean image.

Ugliness! This method provided me with similar images as the previous one. Something is obviously wrong here, but I run, I don’t have time to look back and another solution is born immediately!


Stage 2. Solution


But what if we intercept the resources that the page loads? Why not, I thought. Quickly left to read docks QNetworkAccessManager - hurray! And here is how it works. We have a QWebView, to which we freely specify a QWebPage with a previously defined custom QNetworkAccessManager, which, in fact, our class is InterceptorManager (inherited from QNAM).

The definition of InterceptorManager is something like this:

 class InterceptorManager : public QNetworkAccessManager { Q_OBJECT public: explicit InterceptorManager(QObject *parent = 0); protected: QNetworkReply *createRequest(Operation op, const QNetworkRequest &request, QIODevice *outgoingData) { QNetworkReply *real = QNetworkAccessManager::createRequest(op, request, outgoingData); if (request.url().toString().endsWith(".png")) { NetworkReplyProxy *proxy = new NetworkReplyProxy(this, real); return proxy; } return real; } }; 

We redefine createRequest (), and for all requests we return the QNetworkReply proxy we created. Why is this necessary? QNetworkReply, as a successor to QIODevice, does not have the ability to re-read the content. Since we need QWebPage to render the image. Using a proxy, we can copy the contents and use it later.

Since QNetworkReply proxying is not an easy task, so I’ll give you an example:
networkreplyproxy.h (outdated - the freshest in the repository below)
 #include <QApplication> #include <QWebFrame> #include <QWebPage> #include <QWebView> #include <QWebSettings> #include <QDebug> #include <QDateTime> #include <QDebug> #include <QFile> #include <QTimer> #include <QNetworkProxy> #include <QNetworkReply> #include <QNetworkCookie> class NetworkReplyProxy : public QNetworkReply { Q_OBJECT public: NetworkReplyProxy(QObject* parent, QNetworkReply* reply) : QNetworkReply(parent) , m_reply(reply) { // apply attributes... setOperation(m_reply->operation()); setRequest(m_reply->request()); setUrl(m_reply->url()); // handle these to forward connect(m_reply, SIGNAL(metaDataChanged()), SLOT(applyMetaData())); connect(m_reply, SIGNAL(readyRead()), SLOT(readInternal())); connect(m_reply, SIGNAL(error(QNetworkReply::NetworkError)), SLOT(errorInternal(QNetworkReply::NetworkError))); // forward signals connect(m_reply, SIGNAL(finished()), SIGNAL(finished())); connect(m_reply, SIGNAL(uploadProgress(qint64,qint64)), SIGNAL(uploadProgress(qint64,qint64))); connect(m_reply, SIGNAL(downloadProgress(qint64,qint64)), SIGNAL(downloadProgress(qint64,qint64))); // for the data proxy... setOpenMode(ReadOnly); } ~NetworkReplyProxy() { if (m_reply->url().scheme() != "data") writeDataPrivate(); delete m_reply; } // virtual methids void abort() { m_reply->abort(); } void close() { m_reply->close(); } bool isSequential() const { return m_reply->isSequential(); } // not possible... void setReadBufferSize(qint64 size) { QNetworkReply::setReadBufferSize(size); m_reply->setReadBufferSize(size); } // ssl magic is not done.... // isFinished()/isRunning can not be done *sigh* // QIODevice proxy... virtual qint64 bytesAvailable() const { return m_buffer.size() + QIODevice::bytesAvailable(); } virtual qint64 bytesToWrite() const { return -1; } virtual bool canReadLine() const { qFatal("not implemented"); return false; } virtual bool waitForReadyRead(int) { qFatal("not implemented"); return false; } virtual bool waitForBytesWritten(int) { qFatal("not implemented"); return false; } virtual qint64 readData(char* data, qint64 maxlen) { qint64 size = qMin(maxlen, qint64(m_buffer.size())); memcpy(data, m_buffer.constData(), size); m_buffer.remove(0, size); return size; } signals: void resourceIntercepted(QByteArray); public Q_SLOTS: void ignoreSslErrors() { m_reply->ignoreSslErrors(); } void applyMetaData() { QList<QByteArray> headers = m_reply->rawHeaderList(); foreach(QByteArray header, headers) setRawHeader(header, m_reply->rawHeader(header)); setHeader(QNetworkRequest::ContentTypeHeader, m_reply->header(QNetworkRequest::ContentTypeHeader)); setHeader(QNetworkRequest::ContentLengthHeader, m_reply->header(QNetworkRequest::ContentLengthHeader)); setHeader(QNetworkRequest::LocationHeader, m_reply->header(QNetworkRequest::LocationHeader)); setHeader(QNetworkRequest::LastModifiedHeader, m_reply->header(QNetworkRequest::LastModifiedHeader)); setHeader(QNetworkRequest::SetCookieHeader, m_reply->header(QNetworkRequest::SetCookieHeader)); setAttribute(QNetworkRequest::HttpStatusCodeAttribute, m_reply->attribute(QNetworkRequest::HttpStatusCodeAttribute)); setAttribute(QNetworkRequest::HttpReasonPhraseAttribute, m_reply->attribute(QNetworkRequest::HttpReasonPhraseAttribute)); setAttribute(QNetworkRequest::RedirectionTargetAttribute, m_reply->attribute(QNetworkRequest::RedirectionTargetAttribute)); setAttribute(QNetworkRequest::ConnectionEncryptedAttribute, m_reply->attribute(QNetworkRequest::ConnectionEncryptedAttribute)); setAttribute(QNetworkRequest::CacheLoadControlAttribute, m_reply->attribute(QNetworkRequest::CacheLoadControlAttribute)); setAttribute(QNetworkRequest::CacheSaveControlAttribute, m_reply->attribute(QNetworkRequest::CacheSaveControlAttribute)); setAttribute(QNetworkRequest::SourceIsFromCacheAttribute, m_reply->attribute(QNetworkRequest::SourceIsFromCacheAttribute)); setAttribute(QNetworkRequest::DoNotBufferUploadDataAttribute, m_reply->attribute(QNetworkRequest::DoNotBufferUploadDataAttribute)); emit metaDataChanged(); } void errorInternal(QNetworkReply::NetworkError _error) { setError(_error, errorString()); emit error(_error); } void readInternal() { QByteArray data = m_reply->readAll(); m_data += data; m_buffer += data; emit readyRead(); } protected: void writeDataPrivate() { QByteArray httpHeader; QList<QByteArray> headers = rawHeaderList(); foreach(QByteArray header, headers) { if (header.toLower() == "content-encoding" || header.toLower() == "transfer-encoding" || header.toLower() == "content-length" || header.toLower() == "connection") continue; // special case for cookies.... we need to generate separate lines // QNetworkCookie::toRawForm is a bit broken and we have to do this // ourselves... some simple heuristic here.. if (header.toLower() == "set-cookie") { QList<QNetworkCookie> cookies = QNetworkCookie::parseCookies(rawHeader(header)); foreach (QNetworkCookie cookie, cookies) { httpHeader += "set-cookie: " + cookie.toRawForm() + "\r\n"; } } else { httpHeader += header + ": " + rawHeader(header) + "\r\n"; } } httpHeader += "content-length: " + QByteArray::number(m_data.size()) + "\r\n"; httpHeader += "\r\n"; if(m_reply->error() != QNetworkReply::NoError) { qWarning() << "\tError with: " << this << url() << error(); return; } const QByteArray origUrl = m_reply->url().toEncoded(); const QByteArray strippedUrl = m_reply->url().toEncoded(QUrl::RemoveFragment | QUrl::RemoveQuery); interceptResource(origUrl, m_data, httpHeader, operation(), attribute(QNetworkRequest::HttpStatusCodeAttribute).toInt()); } void interceptResource(const QByteArray& url, const QByteArray& data, const QByteArray& header, int operation, int response) { Q_UNUSED(header); Q_UNUSED(url); Q_UNUSED(operation); Q_UNUSED(response); emit resourceIntercepted(data); } private: QNetworkReply* m_reply; QByteArray m_data; QByteArray m_buffer; }; 


Immediately I say that I cannot answer for the correctness of the lines mentioned above, but this thing works, and this is the most important thing. In any case, the security of using this proxy rests solely on your shoulders.

Conclusion


I checked all this stuff on one commercial project - it works. I hope this will help someone.

Thanks for attention,
namespace

UPD: Designed developments as a project on a githaba - github.com/tucnak/qtwebkit-ri , enjoy your meal.

Source: https://habr.com/ru/post/191476/


All Articles