📜 ⬆️ ⬇️

Sitemap.xml or “There was nothing to do ...”

New Year's holidays 666 + 666 + 666 + 6 + 6 + of the 6th year are in full swing. For serious things do not want to take. But you can do all sorts of little things that usually do not reach the hands. The little thing for me was the generation of Sitemap.xml files.
Sitemap.xml is a file that contains in a special format links to website pages that should be indexed by search engines. Comprehensive format information can be found at Sitemaps.org .
I have long wanted to have a convenient tool for generating these files.

Sitemap.xml generation

Searches in the network gave a lot of “convenient online services” for manual creation of a site map and a few simple scripts that are also unsuitable for creating a site map with a large number of links.

What we want?..


To generate sitemap.xml for a small site does not need much effort. For large resources there are features.
There are restrictions on the size of sitemap.xml files in 10MB, as well as restrictions on 50,000 links to one file. Automatic data processing restrictions and became my goal.
Thus, the following requirements were formed:
  1. The script should monitor the size of the received files and the number of added url. If necessary, create several files in accordance with the format;
  2. Do not store intermediate data in memory;
  3. Create compressed versions of files as needed, for sharing with nginx;
  4. Automatically perform simple data checks.

No sooner said than done. The final version of the scripts can be found at the link at the end of the article.
')

What does the script not do?


To warn further questions I will say that the script is not a universal solution, which in one way generates a map for an arbitrary site.
This is just a tool, and the list of links that will be added to the file must be formed independently, possibly in several visits.
In addition, the script does not correct or encode the url passed to it. Therefore, take care that the links comply with the RFC-3986 standard for URIs, RFC-3987 standard for IRI, and the XML standard .

Example


Using this tool you can create a site map like this:
Sample script generating sitemap
<?php require_once(dirname(__FILE__)."/../common.inc.php"); set_time_limit(0); ini_set('memory_limit', '128M'); $dir = dirname(__FILE__);//document root path $tmp_dir = dirname(__FILE__);//temp path $base_url = 'http://mysite.ru/';//url with sitemaps (http://mysite.ru/sitemap.xml) $gzip = true; $config = array('path' => $dir , 'tmp_dir'=>$tmp_dir,'base_url'=>$base_url,'gzip'=>$gzip, 'gzip_level'=>9); $builder = new SitemapBuilder($config); $time = time(); $builder->start(); $builder->addUrl($base_url,$time,1.0); $builder->addUrl($base_url."news",$time,1.0); /* //this is example adding url $documents = News::find(array('criteria'=>'is_published=1')); foreach($documents as $document) $builder->addUrl($document->getUrl(),$document->getUtime(),0.8); */ $builder->commit(); 



Links


  1. Sitemaps.org
  2. Sources of scripts for generating Sitemap.xml
  3. Repository on github.com

Source: https://habr.com/ru/post/274557/


All Articles