
Understanding the topic indicated in the title, I found out that it was rather poorly disclosed in runet, although a lot of time had passed since the presentation of this protocol. I want to slightly fill this small gap by sharing my experience.
Let me remind you briefly that
PubSubHubbub (
PuSH ) is a protocol proposed by Google and designed to make more efficient the process of delivering data via RSS channels from publishers to subscribers. The central place in the scheme ensuring the operation of the protocol is assigned to independent
hubs acting as intermediaries between the direct sources of data and their final recipients. At the same time, the hub notifies all channel subscribers registered with it about the arrival of new data immediately after their appearance, while simultaneously transmitting a new portion of data.
Thus, if you create an application that processes feeds in the RSS or Atom format, you can make your life much easier by putting the “black” work on the hub. Specific advantages of this scheme:
- the ability to "integrate" multiple external channels into a single data stream of a common format, coming to the application input: the hub can take care of this;
- no need to separate the new data from the old: the hub will deliver only new;
- no need to constantly monitor the channel for new data: the hub itself will inform when it is necessary;
- minimum time from the moment of publication until the moment of notification of your application.
In other words, you can get prompt delivery of data, significantly saving both on the volume of incoming traffic and on the application's CPU time. For applications on App Engine, limited by quotas, these moments may be fundamental. In addition, you will save your time, because you have to write a smaller amount of simple code.
Below are the minimum necessary Java code snippets that I successfully tested on one of the hubs. Code is quite a bit and it is simple.
So, we are talking about a subscriber application that will receive data from a hub. In accordance with the
protocol , the scenario of interaction between the subscriber and the hub includes the following:
- the subscription request is sent to the hub with the channel address and subscriber address;
- the hub checks the channel and sends a request to the subscriber to confirm the subscription;
- subscriber confirms subscription;
- the hub notifies the subscriber and delivers him new data as they appear in the channel;
- after a certain time, the hub repeatedly asks the subscriber to confirm the subscription.
This script means that our minimal application must implement a servlet capable of:
- confirm the subscription in response to a hub request;
- accept the next package with a portion of the new data.
In addition, it may have a function that implements the subscription request procedure itself.
Subscription request
Since the hubs that I tried allow you to request a subscription "manually" using the appropriate web-based service interface, this procedure is not required within the application.
When requesting a subscription, you must inform the hub of the values ​​of the four required parameters:
- Subscriber URL ( hub.callback ): the servlet address of the application that the hub will interact with;
- request type ( hub.mode ): the desired action, namely, subscription, or rejection of it (subscribe / unsubscribe);
- URL of the subscribed channel ( hub.topic ): the address of the channel whose messages you wish to receive;
- method of confirmation of the request ( hub.verify ): informs the hub about the need or lack of an immediate (synchronous) request for confirmation of the subscription (sync / async).
In addition, the hub can support optional parameters, such as:
- subscription time ( hub.lease_seconds ): duration in seconds, which determines how long we want to receive channel messages;
- secret string ( hub.secret ): transmitted if authentication is required for messages received by the subscriber (a hub on its basis will generate an HMAC code for the transmitted content and sign its own messages to them);
- verification sequence of characters ( hub.verify_token ): if specified, it will be passed as a parameter in the confirmation request, so that the subscriber application can verify that it confirms a non-random subscription.
If you are satisfied with the “manual” subscription mode, you can proceed to the next section.
However, it may be that the application requires the ability to independently subscribe. Here is an example of a function that implements this procedure:
')
import java.net.URL;
import java.net.URLEncoder;
import java.net.HttpURLConnection;
import java.io.OutputStreamWriter;
import com.google.appengine.repackaged.com.google.common.util.Base64;
// ..
public static void pshbSubscribe ( String callback, String mode, String topic, Verify verify) throws IOException {
callback = URLEncoder.encode ( "hub.callback" , "UTF-8" ) + "=" + URLEncoder.encode (callback, "UTF-8" );
mode = URLEncoder.encode ( "hub.mode" , "UTF-8" ) + "=" + URLEncoder.encode (mode, "UTF-8" );
topic = URLEncoder.encode ( "hub.topic" , "UTF-8" ) + "=" + URLEncoder.encode (topic, "UTF-8" );
verify = URLEncoder.encode ( "hub.verify" , "UTF-8" ) + "=" + URLEncoder.encode (verify, "UTF-8" );
String body = callback + "&" + mode + "&" + topic + "&" + verify;
URL url = new URL ( " myhub.com/hubbub " );
HttpURLConnection connection = (HttpURLConnection) url.openConnection ();
connection.setDoOutput ( true );
connection.setRequestMethod ( "POST" );
connection.setRequestProperty ( "Content-Type" , "application / x-www-form-urlencoded" );
connection.setRequestProperty ( "Authorization" ,
“Basic„ + Base64.encode (( “myname: mypwd” ) .getBytes ()));
OutputStreamWriter writer = new OutputStreamWriter (connection.getOutputStream ());
writer.write (body);
writer.close ();
if (connection.getResponseCode ()! = HttpURLConnection.HTTP_NO_CONTENT) {
// error
// ..
}
}
* This source code was highlighted with Source Code Highlighter .
In accordance with the protocol, a subscription request is a POST request to the address provided by the hub ("
myhub.com/hubbub ") in the standard form used to transfer form values ​​(where "
Content-Type " is "
application / x-www -form-urlencoded "). In the body of the message are passed above voiced parameters.
The hub on which I tested the code requires prior registration and a request for a subscription with authentication (HTTP Basic Authentication). Hence the “Authorization” with the name and password ("
myname: mypwd ") of the hub user. As I understand it, this is a feature of a specific hub.
In case of a successful subscription, the hub should return 204 (“No Content”), or 202 (“Accepted”) in the case of asynchronous verification (if hub.verify had the value “async”).
Thus, an example of a subscription request might look like this:
pshbSubscribe ( " myapp.appspot.com/subscribe " , "subscribe" , " habrahabr.ru/rss/blogs/java " , "sync" );
The first parameter is the servlet address of the application. Next, consider the operation of this servlet.
Subscription confirmation
After receiving the subscription request, the hub must request confirmation by sending a GET request to the received address. In our example, this is "
myapp.appspot.com/subscribe ". At this address, the application must implement a servlet that responds to this request:
import javax.servlet.http. *;
// ..
@SuppressWarnings ( "serial" )
public class SubscribeServlet extends HttpServlet {
// ..
public void doGet (HttpServletRequest req, HttpServletResponse resp)
throws IOException {
resp.setContentType ( "text / plain" );
resp.setStatus (200);
if (req.getParameter ( "hub.mode" )! = null )
{
resp.getOutputStream (). print (req.getParameter ( "hub.challenge" ));
resp.getOutputStream (). flush ();
}
}
// ..
* This source code was highlighted with Source Code Highlighter .
In the request, the hub transmits several parameters, the meaning of which is the same as in the subscription request:
- hub.mode : type of request (subscribe / unsubscribe);
- hub.topic : subscribe channel URL;
- hub.verify_token : verification sequence of characters (present, if transmitted upon request).
If you are satisfied with the values ​​of the parameters (correspond to the request), then to confirm the subscription (or cancellation of it), you must return the code 2xx in response, and put the value of one more parameter in the response body:
hub.challenge .
If we do not want to confirm the request, return 404 (“Not Found”).
If the hub returns other codes (3xx, 4xx, 5xx), then it will decide that we have problems and verification failed.
In case the content of the response body differs from the hub.challenge value, the hub will also consider that the verification failed.
If the asynchronous request method is used, then in case of failure (return 3xx, 4xx, 5xx or the content of the response does not correspond to the hub.challenge parameter), the hub should try to require confirmation again.
Receive data from the hub
When the hub discovers that it has new data for the subscriber, it will execute a POST request to the address already known to it provided by the subscriber. In the request body, he will transmit this data in RSS or Atom format ("
Content-Type " will be "
application / rss + xml " or "
application / atom + xml "). To handle the request, our servlet will have the function:
public void doPost (HttpServletRequest req, HttpServletResponse resp)
throws IOException {
SyndFeedInput input = new SyndFeedInput ();
SyndFeed feed = input.build ( new XmlReader (req.getInputStream ()));
@SuppressWarnings ( "unchecked" )
List <SyndEntry> entriesList = feed.getEntries ();
for (SyndEntry entry: entriesList)
{
String title = entry.getTitle ();
String author = entry.getAuthor ();
URL url = new URL (entry.getLink ());
@SuppressWarnings ( "unchecked" )
List <SyndContent> contentsList = entry.getContents ();
// ..
}
// ..
resp.setStatus (204);
}
* This source code was highlighted with Source Code Highlighter .
This example uses the
Rome library classes for working with feeds (SyndFeedInput, SyndFeed, SyndEntry, ...) to parse the data. An example of a similar code used to solve a specific problem (sending data received from a hub via XMPP) can be found
here .
If the hub.secret parameter was defined during the subscription, the request will come with the "
X-Hub-Signature " parameter, with a value of the form "
sha1 = signature ", where 'signature' is the HMAC code generated for the request body content (SHA1 signature ). To verify the authenticity of the message, the application must self-calculate the HMAC code for the request body using the known hub.secret. If the result matches 'signature', then the message is authentic. Read more
here .
If the message is successfully received, you must return the code 2xx, regardless of the results of the “X-Hub-Signature” verification. If it returns otherwise, the hub should attempt to re-execute the request within a reasonable time until it receives a success code.
References: