📜 ⬆️ ⬇️

Parsing URLs in Zend Framework 2

Task:
  1. Have a method that parses a string containing a URL. The string can contain both absolute and relative URLs, and both of these options must be correctly parsed.
  2. Moreover, in the stock, the “wrong” format of an absolute link is allowed - without the “http: //”. Further in the text, links of “wrong” format will be called incomplete absolute links .
  3. Implement support for "RF" domains.


Example:site.ru/page.php/page.phpsite.ru/page.php
schemehttp
hostsite.rusite.ru
pathpage.phppage.phppage.php

Implementation of points 1 and 2 of our task


The Zend\Uri\Http class should help us with this. It has the methods we need: parse($uri) , getHost() , getPath() , etc.
But! When parsing a URL like “ site.ru/page.php ” (without “ http:// ”), getHost() returns an empty string, and getPath() returns “ site.ru/page.php ”.

Here is my way to achieve the desired . The format of an absolute incomplete link is identical to the link relative to the source (relative reference type). You can recognize the absolute incomplete link by checking its TDL (first level domain). If such a domain exists, the link can be considered absolute incomplete.

  public function myParse($url){ $Http = new Http($url); if($Http->isValidRelative()){ //  url    $path = $Http->getPath(); //  path   «/» —       «»    //    if( $path{0} !== '/' ){ //     ... $absoluteUrl = '//'.urldecode($Http->toString()); $absoluteHttp = new Http($absoluteUrl); // (1) $Hostname = new Hostname(array('allow'=>Hostname::ALLOW_DNS, 'useTldCheck'=>false)); $decode = true; // ...       (2) if ($Hostname->isValid($absoluteHttp->getHost($decode))) { //     ,    «»  $Http = $absoluteHttp; } } } return $Http; } 

Comments on the code

  1. We configure Zend\Validator\Hostname to check the presence of the first-level domain of the link in the $validIdns
  2. Pass in the getHost() method the $decode = true; in order to decode the host. The getHost() method of the Zend\Uri\Http class does not imply any parameters and does not decode anything! Why then and how does it work?! .. Read below.

The implementation of paragraph 3 of our task. IDN RF and work with him


Unfortunately, ZF2 does not really work with IDNs, which we will have to compensate. To do this, you need to download any class you like encoding and decoding url with punycode and extend the class Zend\Uri\Http .
')
 namespace Application\Other; use Zend\Uri\Http as ZendHttp; use Application\Model\IdnaConvert; class Http extends ZendHttp { public function setHost($host){ if($host){ $idn = new IdnaConvert(); $host = $idn->encode($host); } return parent::setHost($host); } public function getHost($decode=false) { if($decode && $this->host){ $idn = new IdnaConvert(); return $idn->decode($this->host); } return parent::getHost(); } } 

Accordingly, our method myParse() should use the extended class Http , which, when parsing a URL, will be able to encode RF domains; and when calling the getHost($decode) method, we will be able to return the Punycode representation or decoded representation, depending on the parameter passed to the method.

PS There are doubts about the quality of the above, but at the same time it is one of the reasons to publish a post in order to learn the opinion of those who are more experienced in the ZF2 part. Another reason - where I have not found a solution to this seemingly obvious problem. Maybe from you I learn about other, perhaps more simple and literate options.

Source: https://habr.com/ru/post/198614/


All Articles