📜 ⬆️ ⬇️

Get RSS / Atom feeds from any page

Happy New Year! While on the street is a holiday, I was engaged in one of the interesting problems (or tasks, as to whom) of my project. Given - a system similar to Google Readers, which receives a certain address from the user and should provide a view (and later, a subscription) of RSS feeds available there. The task is complicated by the fact that it is impossible to require the user to enter the full address of the tape, or even just the address of the site or an arbitrary page - it can be entered in completely different ways, in whole or in part, etc. Themselves tapes on the page can also be more than one, often several formats at once (or even not at all). Therefore, we need to select the latest posts from all available tapes and display to the user so that he will eventually choose one tape that interests him. Let me tell you a secret - yes, this is only the beginning and in subsequent articles we will together build a slightly smaller version of the aggregation system and news readings. But today we will try to solve the first task, without which our “reader” simply cannot work, no matter what further technology is used.

The basis will be my favorite tool - Zend Framework (using the latest, trunk version). If you are familiar with its capabilities, immediately offer a component of Zend_Feed , which has built-in capabilities for extracting tapes from the page. However, do not rush, in practice the task is not so simple. Therefore, we will solve it gradually.


URL normalization.
')
The user enters some address from which we have to extract all available tapes. The first barrier is that the standard component (the same Zend_Feed) can work only with full page addresses (or the correct link to the site root). Not a component at all, but the mechanism for finding tapes. That is, if we want to use automatic detection of tapes, we need to give him the full address of the page and no more. If the link is already a direct link to the tape, oddly enough, as a result we get ... we get nothing. The same will happen if we enter, for example, the site address in this way - www.abrdev.com or abrdev.com, instead of the full URL with the protocol - http://abrdev.com . Therefore, the very first step will be a banal check to see if our line begins with a protocol indication - “http: //” or “https: //”. The current implementation of the Zend component can only work with these protocols. In addition, there is a limitation when working with tapes that require authorization for access. In principle, if simple HTTP authorization is used there, this is completely solvable, but if something else is required, the components are already powerless, so we can only work with publicly available tapes.

And so, we need a function that accepts an arbitrary string as input, presumably with the address of the site or feed, and returns always or false if the string doesn’t look like a URL, or a complete address, with a protocol, etc. For validation, we use another framework component - Zend_Uri , which provides us with several tools for processing and checking URI (Uniform Resource Identifiers).

First, we will rely on the user, so we will try to immediately use the transmitted string as the address. If this does not work out and Zend_Uri refuses to recognize this as the correct address, it will throw an exception (or return false if the address is just wrong), which we will intercept and try to bring it to a more correct form. If the second attempt fails, then all, give up and return false, meaning that the user entered is not the correct location for the tape.

/** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  1. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  2. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  3. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  4. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  5. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  6. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  7. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  8. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  9. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  10. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  11. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  12. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  13. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  14. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  15. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  16. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  17. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  18. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  19. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  20. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  21. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  22. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  23. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  24. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  25. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  26. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  27. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  28. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  29. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  30. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  31. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  32. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  33. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  34. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  35. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  36. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  37. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  38. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  39. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  40. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  41. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  42. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  43. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  44. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  45. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  46. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  47. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
  48. /** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .
/** * URI * * @param String $uri * @return boolean|string * @throw Zend_Uri_Exception */ public static function _validURI($uri) { if (empty($uri)) return false ; else $uri = trim(strtolower(uri)); try { // URI $_uri = Zend_Uri::factory($uri); $res = $_uri->valid($uri); if ($res === true ) { // , URL return $_uri->getUri(); } else return false ; } catch (Zend_Uri_Exception $e) { // ? try { if ( (strpos($uri, 'http://' ) === false ) || (strpos($uri, 'https://' ) === false ) ) { $uri = self::$defailt_rss_scheme . $uri; $_uri = Zend_Uri::factory($uri); if ($_uri->valid($uri)) return $_uri->getUri(); } else // , ? return false ; } catch (Zend_Uri_Exception $e) { return false ; } } } * This source code was highlighted with Source Code Highlighter .


And so, the first problem is solved - we can send any type of address, and as a result we get either false, which means an error, or a string with a full URL that is suitable for further processing. Please note that only URLs with http / https protocols are available, besides, only public ones (this is not checked at this stage, therefore, apparently, you should warn the user in the address input interface, since we already had cases when users entered addresses tapes that are available only after authorization, and when you try to access such a resource, the server received just the default authorization page).

Getting direct links to tape.

Now the next step - we need to get direct links to all the tapes that we can find at the address specified by the user. Always remember that there may simply not be a single tape, and maybe several, both different and the same, just a different format. This is what, and the various formats and specifications of the tape have accumulated enough, besides, there are often unique and most complicated tapes (while the tape from CNBC turned out to be the most difficult, in principle, which served as the basis for rewriting the old tape processing system). Our luck is that Zend developers have already taken care that the components have completely independent interfaces and the developers abstract from all the nuances of the specifications.

And so, at this stage we can have three options:

Here, for example, the most difficult feed - http://www.cnbc.com/id/19789731/device/rss/rss.xml . I have not fully understood why the Zend_Feed component cannot cope with it. I could be mistaken, but in my opinion there it is connected with attached styles, which were automatically applied for some reason during processing and the output was not XML, but a regular HTML page (but this may be wrong, if anyone can figure it out, write in the comments) . So I had to try a new component - Zend_Feed_Reader , which easily managed.

My system will work with a stream of links, so it’s likely that the tapes could be duplicated. And you never know what, a person will simply enter the same address again. Since the processing and search for feeds is a lengthy operation associated with network access to a remote resource, I would like to maximize server unload. This will help us built into the Zend_Feed_Reader ability to cache data. Yes, and at the stage of data collection we don’t have the task to receive only relevant news - if we show the client to confirm the feed subscription to confirm the last 10 entries, but this will not be the very latest, but let's say, an hour late, nothing special will happen . In addition, if the server that gives the tape supports the correct caching headers, then our cache will be automatically checked and updated. So we will significantly reduce the load in the case of mass subscriptions to a standard set of tapes (it’s no secret that the greater likelihood that a new user will subscribe just to popular tapes that someone has already viewed before him, which means the tape will be in the cache).

  1. $ cache = Zend_Cache :: factory ( 'Core' ,
  2. 'File' ,
  3. array (
  4. 'lifetime' => 24 * 3600,
  5. 'automatic_serialization' => true
  6. 'caching' => true
  7. 'cache_id_prefix' => 'preview_feed_' ,
  8. 'write_control' => true
  9. 'ignore_user_abort' => true
  10. ),
  11. array (
  12. 'read_control_type' => 'adler32' ,
  13. 'cache_dir' => '/ tmp / cache'
  14. ))
  15. Zend_Feed_Reader :: setCache ($ cache);
  16. Zend_Feed_Reader :: useHttpConditionalGet ( true );
* This source code was highlighted with Source Code Highlighter .


Now you can proceed directly to the processing. I tried the test script on a small array of addresses, so it was convenient for me to output the result in an array whose keys are domain names, so first I checked all the links again and pulled out the domain from there (using Zend_Uri_Http). In a real system, this is most likely not needed, as we will process one address at a time.

For example, take the following list at random:

  1. $ _url = array (
  2. 'http://www.cnbc.com/id/19789731/device/rss/rss.xml' ,
  3. 'http://www.planet-php.net/' ,
  4. 'ajaxian.com' ,
  5. 'http://twitter.com/abrdev' ,
  6. 'http://verens.com/archives/2009/12/28/multiple-file-uploads-using-html5/' );
* This source code was highlighted with Source Code Highlighter .


Then we will pass it through the validator described above and get an array of the full URL.

  1. // an array of links that are ready for processing (valid URIs)
  2. $ _links = Array ();
  3. echo "Checking URL ... <br />" ;
  4. foreach ($ _url as $ u)
  5. {
  6. echo "Original URL:" . $ u. "... <br />" ;
  7. $ _url = self :: _ validURI ($ u);
  8. if ($ _url === false ) continue ;
  9. else
  10. $ _links [] = $ _url;
  11. }
* This source code was highlighted with Source Code Highlighter .


Now we will form the basis for an array of results - at first it will be just links for each given address, then the latest messages from each tape will also be added there.

  1. foreach ($ _links as $ fl)
  2. {
  3. // try to extract the URL from the specified site
  4. try
  5. {
  6. $ _lhttp = Zend_Uri_Http :: fromString ($ fl);
  7. if ($ _lhttp-> valid ())
  8. {
  9. // check and get the site name
  10. $ site = $ _lhttp-> getHost ();
  11. $ _feeds_links [$ site] = Array ();
  12. }
  13. else
  14. // if the check failed, skip
  15. continue ;
  16. }
  17. catch (Zend_Uri_Exception $ e) { continue ; }
* This source code was highlighted with Source Code Highlighter .


Then we consistently try to extract all tapes from each address. Using Zend_Feed_Reader, we try to find tapes on the page that will be returned as an array of Zend_Feed_Reader_FeedSet class objects , and in fact just arrays (or rather, an object simply implements the necessary interfaces, so you can work with it like a regular array. If there are tapes, we iterate over everything and extract the href property containing a direct link from them. In case there are no tapes at the specified address (this is a case of just plain pages without tapes, and when using the direct feed address, it will also be perceived as missing tapes) we make the assumption that maybe this is just the case of a direct address and try to get the feed directly. If this attempt fails, we think that, alas, there are no tapes at the specified address and proceed to the next address in the list.

  1. try
  2. {
  3. $ _ln = Zend_Feed_Reader :: findFeedLinks ($ fl);
  4. if (($ _ln instanceOf Zend_Feed_Reader_FeedSet) && (count ($ _ ln)> 0))
  5. {
  6. $ tmp = Array ();
  7. foreach ($ _ln as $ cf)
  8. {
  9. // in $ cf we have an object for each feed, Zend_Feed_Reader_FeedSet
  10. // it inherits from ArrayObject and contains three fields
  11. // interesting to us: 'href' containing link to feed
  12. $ tmp [] = $ cf [ 'href' ];
  13. }
  14. // since there are duplicate feeds, remove duplicates
  15. if (! empty ($ tmp))
  16. {
  17. $ _feeds_links [$ site] = array_unique ($ tmp);
  18. }
  19. }
  20. else
  21. {
  22. // it can be a direct link to FeedURL
  23. // for this you have to try to download the document
  24. try
  25. {
  26. $ _tmp_feed = Zend_Feed_Reader :: import ($ fl);
  27. // we do not know in advance which format
  28. if ($ _tmp_feed instanceOf Zend_Feed_Reader_FeedAbstract)
  29. {
  30. // yes, this is a normal feed, it is already in the cache,
  31. // so just get the address in case of using proxy services
  32. // Practice has shown that using getFeedLink ()
  33. // sometimes does not give the desired result, for example for a CNBC feed
  34. $ _feeds_links [$ site] [] = $ fl;
  35. continue ;
  36. }
  37. else
  38. throw new Zend_Exception ( 'Bad feed' );
  39. }
  40. catch (Zend_Exception $ e)
  41. {
  42. //definitely not
  43. echo "<br /> <b>" . $ fl. "</ b> == Nothing feeds! <br />" ;
  44. continue ;
  45. }
  46. }
  47. }
  48. catch (Zend_Exception $ e)
  49. {
  50. continue ;
  51. }
* This source code was highlighted with Source Code Highlighter .


Please note that when we try to upload a feed directly, we don’t know what the format will be, so to check the result we use the fact that all feed classes have a common ancestor, the abstract class Zend_Feed_Reader_FeedAbstract. Also in this case there will be some duplication, as we will continue to receive the latest entries from the feeds. But since we use caching, then for the case of direct links, the data will already be in the cache, so there will be no repeated request.

Getting the latest tape entries.

In order to provide the user with a choice of several tapes, or just to show what kind of feed he will read after the subscription, we select the last 10 messages and show the user along with the subscription address. Here we do not need to select the entire message, so we limit ourselves to the title and link. Initially, I also wanted to choose other information about the tape, for example, a description or a list of authors, copyright, but it turned out that in many tapes of these fields there is simply no (empty), therefore we restrict ourselves to just the name.

If at this stage we meet with an error, then we simply skip the tape - at best, there will be another tape on the page, but in a different format, at worst - we will not find anything. When the tape is imported, we will get a header, and then in the loop the last 10 records, for each of which we will get a link, name and date of creation (the date always goes to GMT). In the test example, I immediately form a string, in reality, most likely you save each of the components separately, and the time may lead to a single standard (for example, taking into account the user's current locale) and convert to UNIX TIMESTAMP for ease of processing.

  1. echo '<br /> <br /> Retriving last feed items ... <br />' ;
  2. $ _feeds_items = Array (); // posts in feed
  3. $ _item_per_feed = 10; // How many messages from the tape pull
  4. foreach ($ _feeds_links as $ _flinks)
  5. {
  6. if (count ($ _ flinks)> 0)
  7. {
  8. foreach ($ _flinks as $ fl)
  9. {
  10. try
  11. {
  12. $ _x_feed = Zend_Feed_Reader :: import ($ fl);
  13. // can be both Atom and RSS,
  14. // so we check by abstract ancestor class
  15. if ($ _x_feed instanceOf Zend_Feed_Reader_FeedAbstract)
  16. {
  17. $ tmpx = Array ( 'title' => null , 'items' => Array ());
  18. $ tmpx [ 'title' ] = htmlspecialchars ($ _ x_feed-> getTitle (), ENT_QUOTES);
  19. $ i = 0;
  20. foreach ($ _x_feed as $ fitm)
  21. {
  22. if ($ i <$ _item_per_feed)
  23. {
  24. $ i ++;
  25. // get the name, link and date (in GMT)
  26. // GUID - md5 (getId ());
  27. $ tmpx [ 'items' ] [] = ' <a href="' .$fitm-> getLink (). '"target =" _ blank ">' .htmlspecialchars ($ fitm-> getTitle (), ENT_QUOTES). '</a> at' . $ fitm-> getDateCreated () -> toString (). '<br />' ;
  28. }
  29. else break ;
  30. }
  31. $ _feeds_items [$ fl] = $ tmpx;
  32. }
  33. }
  34. catch (Zend_Exception $ e) { continue ; }
  35. }
  36. }
  37. }
  38. // see the result?
  39. var_dump ($ _ feeds_items);
* This source code was highlighted with Source Code Highlighter .


For the time being, we simply output the result through var_dump to the browser (after all, this is just a test script). In a real system, all this data is packaged into a JSON array and sent to the client, which it displays to the user and gives the opportunity to choose one of the tapes to subscribe. Of course, it would be possible to do everything for the user - for example, in the case of several tapes that differ only in format, check the news ID match, and if they are the same, then just take the preferred format and that's it. But it already depends on the specifics of specific tasks.

That's all. Of course, the above code is just an illustration and is not intended for real use (especially by the copy / past method). In the future, we will continue this topic and try to write a real news server aggregator with Web 2.0 AJAX interface, real-time delivery of new messages (via Comet), and also build a server platform for distributed background processing of news feeds (since there may be many tapes for different tapes different polling frequency settings).

Source: https://habr.com/ru/post/79879/


All Articles