📜 ⬆️ ⬇️

How to use the API of a site that does not have an API?

I often have the task to get data from a third-party site, while this site does not always provide an opportunity to conveniently get this data through the API. The only solution in this case is to parse the html content of the pages. Once I wrote regexps, then there were libraries that allow you to get the right content by css-selector, and now it seems like a difficult task that I would like to simplify.

Today I want to tell you about my small library, which allows you to describe HTTP requests in a API-style and parse the server's response to the format you need.

Note: Do not forget about copyright if you use other people's data.

Installation


The library is available for installation via composer, so all that needs to be done is to add the “sleeping-owl / apist” dependency: “1. *” to your composer.json and call the composer update.
')
This library has no dependencies on any frameworks, so you can use it with any framework, or in a pure PHP project. For network requests, Guzzle is used, for manipulations with the dom-tree, “symfony / dom-crawler” is used.

Using


After installation, you can start creating a new class that embodies the API of the site you need. The library does not impose any restrictions on how and where you will create your class. You need to extend the SleepingOwl \ Apist \ Apist class and specify the base URL:

use SleepingOwl\Apist\Apist; class HabrApi extends Apist { protected $baseUrl = 'http://habrahabr.ru'; } 

This is all that is needed for a basic description. Then you can add to this class the methods that you need:

 public function index() { return $this->get('/', [ 'title' => Apist::filter('.page_head .title')->text()->trim(), 'posts' => Apist::filter('.posts .post')->each([ 'title' => Apist::filter('h1.title a')->text(), 'link' => Apist::filter('h1.title a')->attr('href'), 'hubs' => Apist::filter('.hubs a')->each(Apist::filter('*')->text()), 'author' => [ 'username' => Apist::filter('.author a'), 'profile_link' => Apist::filter('.author a')->attr('href'), 'rating' => Apist::filter('.author .rating')->text() ] ]) ]); } 

Here, the “get” method is the type of HTTP request used, other methods are also available (post, put, patch, delete, etc.).
The first parameter is the url of this method; it can be both relative and absolute.
The second parameter is the basis for which I created this library. It describes the structure to be obtained by calling this method. This can be either an array or a single value. That is, for the method described above, the result will be the following:

 $api = new HabrApi; $result = $api->index(); 

Note: the result will be of type array, json-format is used here for convenience.

 { "title": "", "posts": [ { "title": "     Shellshock ( 2)", "link": "http:\/\/habrahabr.ru\/company\/host-tracker\/blog\/240389\/", "hubs": [ "  ", " ", " " ], "author": { "username": "smiHT", "profile_link": "http:\/\/habrahabr.ru\/users\/smiHT\/", "rating": "26,9" } }, { "title": "        PentestIT", "link": "http:\/\/habrahabr.ru\/company\/pentestit\/blog\/240995\/", "hubs": [ "  PentestIT", "   IT", " " ], "author": { "username": "pentestit-team", "profile_link": "http:\/\/habrahabr.ru\/users\/pentestit-team\/", "rating": "36,4" } }, ... ] } 

The third optional parameter can be any additional request parameters, get or post variables, uploaded files, request headers, etc. A full list can be found in the Guzzle documentation .

Creating filters


A few words about how it works: each object created via Apist :: filter ($ cssSelector) after loading data is replaced with the desired value, it saves not only the selector itself, by which it will search for data, but also the entire call string, which were applied to him. After loading the data, he tries to apply these methods to the elements found.

Here are some types of methods that can be applied (you can combine them in the sequence you need):

The sources of the HabrApi.php demo class used in the examples on the project website can be found here .

Sources on GitHub | Documentation and examples

Upd: in version 1.2.0, the ability to initialize api from the yaml file was added, for more details, see the documentation .

Source: https://habr.com/ru/post/241335/


All Articles