Semantic Web, ARC2 and PHP

While the semantic web is just evolving and the provincial web studios do not inflate the price for the words semantic and semantic, let's look at the tools for working with this beast in php.

It should be noted that in the non-English-speaking sphere most of the terms were not settled, just as the understanding of the concept and technology as a whole was not settled. For some, the semantic network is synonymous with html5 with tasty buns, for some it is the widespread use of RDF, the development of storages and devices, and for someone guaranteed to receive decent photos upon request wet pussy at night ( not safe for work).

The support for semantics in their products is stated and / or implemented by drupal.org, ontowiki.net, semantic-mediawiki.org, talis.com. And even there is a specialized WYSIWYG editor loomp.org.

At the moment (spring 2012), there are two solutions that claim to be the public library - this is RDF API for PHP and ARC2 .
RAP postpone until better times, the latest version dates back to 2008. There is support for php 4 versions - this is a very dubious advantage, but maybe someone will need it. But after this page interest in RAP disappeared completely.
')

Theory

A small theoretical course-introduction, which you can safely skip and go to the sandbox.

Glossary:

triple, triplet - values in the form of subject-predicate-object
triplestore - triple storage
RDF is a metadata description model, the main idea of which is based on using triplets for storing relations, properties and states of an information resource
RDF / XML, RDF / JSON, RDFa - model description in xml, json format and in attributes of markup elements
SPARQL is a query language for data described using RDF

As an example, consider the phrase “Masha has a red car” and a set of triplets:

Masha owns the car
Masha is a man
the car is color
color is red

On the one hand, four phrases were obtained from one phrase, but more could be obtained if we continue the descriptions of a person , a car and colors . On the other hand, these statements are much simpler for computer processing.
RDF is used for a unified description of an information resource, i.e. At the heart of an arbitrarily complex model is one concept. This allows you to greatly simplify the implementation of storage and standardize the query language to it (for example, show the owners of "red cars").

Putting the sandbox

The ARC2 library will be used as a box, the starting package will serve as sand.
The server configuration does not require anything extra-ordinary, the sandbox is enough for the standard LAMP server from ubuntu.
At a minimum, you will need libxml, sockets modules for PHP 5.3. * And MySql as storage if you have your own server build.
Installation is simple to indecency, you need to deploy both packages and in the tuukka-arc2-starter-pack / config.php file, set the path to the arc2 library:

include_once(dirname(__FILE__).'/../arc2/ARC2.php');

The folder with the library is renamed from semsol-arc2-xxxxxxx to simple arc2, and tuukka-arc2-starter-pack-xxxxxxx to tuukka-arc2-starter-pack.
In addition to the path in the config.php, you need to register the database settings, the section below is responsible for this:

 $arc_config = array( /* MySQL database settings */ 'db_host' => 'localhost', //  'db_user' => 'root', //   'db_pwd' => '', // ,     'db_name' => 'arc2test', //    //...

If the starter pack is installed in the web server directory, then a request form will appear by requesting localhost / tuukka-arc2-starter-pack / endpoint.php .
Requests can be sent via the console, for this you need to allow execution of the file tuukka-arc2-starter-pack / cli.php (chmod + x cli.php)
There is no big difference between running from under the console or the web, except for the result format.
Web sandbox returns XML.

Lab rat will be the page of Eric Miller
In the source code of the page, the link to the data for download is of interest:

  <link rel="meta" type="application/rdf+xml" title="Contact" href="contact" />

The first thing to start with is to add data to our repository using a query:

 LOAD <http://www.w3.org/People/EM/contact#me>

To run from under the console in the folder tuukka-arc2-starter-pack, run the command:

 ./cli.php "LOAD <http://www.w3.org/People/EM/contact#me>"

The answer should be:

 Loaded 32 triples

Via the web interface:

 <?xml version="1.0"?> <sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> <!-- query time: 0.3139 sec --> </head> <inserted>32</inserted> </sparql>

We successfully loaded the data, now we will request all the names in our repository:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?x foaf:name ?name }

Answer:

 name Eric Miller

Technically, the query language is somewhat similar to SQL, but there is one important difference:
SQL is a query language for a table with data , SPARQL is a query language for a graph .

We will request all the nicknames and contact mailboxes that are in our repository:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?nick ?mbox WHERE { ?x foaf:name ?name . ?x foaf:nick ?nick . x foaf:mbox ?mbox }

Answer:

 name nick mbox Eric Miller em mailto:em@w3.org

Get a list of all familiar contacts:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?knows WHERE { ?x foaf:name ?name . ?x foaf:knows ?knows }

Answer:

 name knows Eric Miller http://www.w3.org/People/Berners-Lee/card#i Eric Miller http://www.w3.org/People/Connolly/#me Eric Miller http://www.w3.org/People/djweitzner/public/foaf.rdf#DJW

Download the data of the last contact - Daniel Weitzner:

 LOAD <http://www.w3.org/People/djweitzner/public/foaf.rdf#DJW>

Answer:

 Loaded 113 triples

Check again all familiar contacts:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?knows WHERE { ?x foaf:name ?name . ?x foaf:knows ?knows }

Answer:

 name knows Eric Miller http://www.w3.org/People/Berners-Lee/card#i Eric Miller http://www.w3.org/People/Connolly/#me Eric Miller http://www.w3.org/People/djweitzner/public/foaf.rdf#DJW Daniel Weitzner _:b279264211_arc8d9fb5 Daniel Weitzner _:b2309778025_arc8d9fb6 Daniel Weitzner _:b420706296_arc8d9fb9 Daniel Weitzner _:b2549551164_Tim Daniel Weitzner _:b2407349360_arc8d9fb10 Daniel Weitzner _:b1728586196_DanC Daniel Weitzner _:b4168748262_arc8d9fb11 Daniel Weitzner _:b1634950492_arc8d9fb12

Strange personalities knows Daniel.

We learn the nickname of Daniel:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?nick WHERE { ?x foaf:name 'Daniel Weitzner' . ?x foaf:nick ?nick }

Answer

 nick Danny

Usually the task to collect information from a web source is solved in several steps:

find data source
grab data (using regular expressions, queries to the tree) or get them from the site owner (RSS, API)
save to convenient storage
use data - receiving and processing

This solution has one big disadvantage - for each source it is necessary to implement data acquisition.
As a consequence, the number of implementations (grabbers, repositories, queries) can be very large.
The presence of an API greatly simplifies the process, but it is not necessary to think about unification and standardization (besides, the API is not always data).

The concept of the semantic network suggests the availability of sources, like the page of contacts of Eric Miller, which are convenient for working with content on it.
This is different from today and yesterday.

To battle

Let's take a closer look at the ARC2 library.
The code that receives data from the page of Eric and displays them in raw form:

 include_once("arc2/ARC2.php"); $parser = ARC2::getRDFParser(); $parser->parse('http://www.w3.org/People/EM/contact#me'); print_r($parser->getTriples()); //print_r($parser->getSimpleIndex()); #  -

Conclusion:

 Array ( [0] => Array ( [s] => http://www.w3.org/People/EM/contact#me [p] => http://www.w3.org/1999/02/22-rdf-syntax-ns#type [o] => http://xmlns.com/foaf/0.1/Person [s_type] => uri [o_type] => uri [o_datatype] => [o_lang] => ) [1] => Array ( [s] => http://www.w3.org/People/EM/contact#me [p] => http://www.w3.org/1999/02/22-rdf-syntax-ns#value [o] => Eric Miller, em@w3.org [s_type] => uri [o_type] => literal [o_datatype] => [o_lang] => ) //.... )

The structure of the array of triplets is specified:

s subject value (URI, Bnode ID, or variable)
p URI of a property or variable
o object value (URI, Bnode ID, literal or variable)
s_type "uri", "bnode" or "var"
o_type "uri", "bnode", "literal" or "var"
o_datatype type URI
o_lang language identifier, for example, "en-us"

Instead of getTriples () you can use getSimpleIndex () , then the output will become more meaningful:

 Array ( [http://www.w3.org/People/EM/contact#me] => Array ( [http://www.w3.org/1999/02/22-rdf-syntax-ns#type] => Array ( [0] => http://xmlns.com/foaf/0.1/Person ) [http://www.w3.org/1999/02/22-rdf-syntax-ns#value] => Array ( [0] => Eric Miller, em@w3.org ) [http://xmlns.com/foaf/0.1/name] => Array ( [0] => Eric Miller ) //... ) )

Use the sandbox storage for an example with requests (copy from config.php):

 $config = array( /* db */ 'db_host' => 'localhost', 'db_user' => 'root', 'db_pwd' => '', 'db_name' => 'arc2test1', /* store name (= table prefix) */ 'store_name' => 'sandbox', ); /* instantiation */ $store = ARC2::getStore($config); $q = ' PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?nick WHERE { ?x foaf:nick ?nick .} '; $rows = $store->query($q); print_r($rows); print_r($store->getErrors());

Result:

 Array ( [query_type] => select [result] => Array ( [variables] => Array ( [0] => nick ) [rows] => Array ( [0]=> Array ( [nick] => em [nick type] => literal ) [1]=> Array ( [nick] => Danny [nick type]=> literal ) ) ) [query_time] => 0.060401201248169 )

Next, we work with the array as we need.

The interesting thing does not end there, the ARC2 library has many interesting functions, and you can write dissertations about semantics for another ten years, but this is all beyond the scope of the article.
However, I hope the acquaintance took place.
Thanks for attention!

Source: https://habr.com/ru/post/142159/

All Articles

Semantic Web, ARC2 and PHP

Theory

Putting the sandbox

To battle

More articles: