While the semantic web is just evolving and the provincial web studios do not inflate the price for the words
semantic and
semantic, let's look at the tools for working with this beast in php.
It should be noted that in the non-English-speaking sphere most of the terms were not settled, just as the understanding of the concept and technology as a whole was not settled. For some, the semantic network is synonymous with html5 with tasty buns, for some it is the widespread use of RDF, the development of storages and devices, and for someone guaranteed to receive decent photos upon request
wet pussy at night (
not safe for work).
The support for semantics in their products is stated and / or implemented by drupal.org, ontowiki.net, semantic-mediawiki.org, talis.com. And even there is a specialized
WYSIWYG editor loomp.org.
At the moment (spring 2012), there are two solutions that claim to be the public library - this is
RDF API for PHP and
ARC2 .
RAP postpone until better times, the latest version dates back to 2008. There is support for php 4 versions - this is a very dubious advantage, but maybe someone will need it. But after
this page interest in RAP disappeared completely.
')
Theory
A small theoretical course-introduction, which you can safely skip and go to the sandbox.
Glossary:
- triple, triplet - values ​​in the form of subject-predicate-object
- triplestore - triple storage
- RDF is a metadata description model, the main idea of ​​which is based on using triplets for storing relations, properties and states of an information resource
- RDF / XML, RDF / JSON, RDFa - model description in xml, json format and in attributes of markup elements
- SPARQL is a query language for data described using RDF
As an example, consider the phrase “Masha has a red car” and a set of triplets:
- Masha owns the car
- Masha is a man
- the car is color
- color is red
On the one hand, four phrases were obtained from one phrase, but more could be obtained if we continue the descriptions of a
person , a
car and
colors . On the other hand, these statements are much simpler for computer processing.
RDF is used for a unified description of an information resource, i.e. At the heart of an arbitrarily complex model is one concept. This allows you to greatly simplify the implementation of storage and standardize the query language to it (for example, show the owners of "red cars").
Putting the sandbox
The
ARC2 library will be used as a box, the
starting package will serve as sand.
The server configuration does not require anything extra-ordinary, the sandbox is enough for the standard LAMP server from ubuntu.
At a minimum, you will need libxml, sockets modules for PHP 5.3. * And MySql as storage if you have your own server build.
Installation is simple to indecency, you need to deploy both packages and in the
tuukka-arc2-starter-pack / config.php file, set the path to the arc2 library:
include_once(dirname(__FILE__).'/../arc2/ARC2.php');
The folder with the library is renamed from
semsol-arc2-xxxxxxx to simple arc2, and
tuukka-arc2-starter-pack-xxxxxxx to tuukka-arc2-starter-pack.
In addition to the path in the config.php, you need to register the database settings, the section below is responsible for this:
$arc_config = array( 'db_host' => 'localhost',
If the starter pack is installed in the web server directory, then a request form will appear by requesting
localhost / tuukka-arc2-starter-pack / endpoint.php .
Requests can be sent via the console, for this you need to allow execution of the file
tuukka-arc2-starter-pack / cli.php (chmod + x cli.php)
There is no big difference between running from under the console or the web, except for the result format.
Web sandbox returns XML.
Lab rat will be the page of
Eric MillerIn the source code of the page, the link to the data for download is of interest:
<link rel="meta" type="application/rdf+xml" title="Contact" href="contact" />
The first thing to start with is to add data to our repository using a query:
LOAD <http://www.w3.org/People/EM/contact
To run from under the console in the folder
tuukka-arc2-starter-pack, run the command:
./cli.php "LOAD <http://www.w3.org/People/EM/contact#me>"
The answer should be:
Loaded 32 triples
Via the web interface:
<?xml version="1.0"?> <sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> </head> <inserted>32</inserted> </sparql>
We successfully loaded the data, now we will request all the names in our repository:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?x foaf:name ?name }
Answer:
name Eric Miller
Technically, the query language is somewhat similar to SQL, but there is one important difference:
SQL is a query language for a table with data ,
SPARQL is a query language for a graph .
We will request all the nicknames and contact mailboxes that are in our repository:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?nick ?mbox WHERE { ?x foaf:name ?name . ?x foaf:nick ?nick . x foaf:mbox ?mbox }
Answer:
name nick mbox Eric Miller em mailto:em@w3.org
Get a list of all familiar contacts:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?knows WHERE { ?x foaf:name ?name . ?x foaf:knows ?knows }
Answer:
name knows Eric Miller http://www.w3.org/People/Berners-Lee/card#i Eric Miller http://www.w3.org/People/Connolly/#me Eric Miller http://www.w3.org/People/djweitzner/public/foaf.rdf#DJW
Download the data of the last contact - Daniel Weitzner:
LOAD <http://www.w3.org/People/djweitzner/public/foaf.rdf
Answer:
Loaded 113 triples
Check again all familiar contacts:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?knows WHERE { ?x foaf:name ?name . ?x foaf:knows ?knows }
Answer:
name knows Eric Miller http://www.w3.org/People/Berners-Lee/card#i Eric Miller http://www.w3.org/People/Connolly/#me Eric Miller http://www.w3.org/People/djweitzner/public/foaf.rdf#DJW Daniel Weitzner _:b279264211_arc8d9fb5 Daniel Weitzner _:b2309778025_arc8d9fb6 Daniel Weitzner _:b420706296_arc8d9fb9 Daniel Weitzner _:b2549551164_Tim Daniel Weitzner _:b2407349360_arc8d9fb10 Daniel Weitzner _:b1728586196_DanC Daniel Weitzner _:b4168748262_arc8d9fb11 Daniel Weitzner _:b1634950492_arc8d9fb12
Strange personalities knows Daniel.
We learn the nickname of Daniel:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?nick WHERE { ?x foaf:name 'Daniel Weitzner' . ?x foaf:nick ?nick }
Answer
nick Danny
Usually the task to collect information from a web source is solved in several steps:
- find data source
- grab data (using regular expressions, queries to the tree) or get them from the site owner (RSS, API)
- save to convenient storage
- use data - receiving and processing
This solution has one big disadvantage - for each source it is necessary to implement data acquisition.
As a consequence, the number of implementations (grabbers, repositories, queries) can be very large.
The presence of an API greatly simplifies the process, but it is not necessary to think about unification and standardization (besides, the API is not always data).
The concept of the semantic network suggests the availability of sources, like the page of contacts of Eric Miller, which are convenient for working with content on it.
This is different from today and yesterday.
To battle
Let's take a closer look at the ARC2 library.
The code that receives data from the page of Eric and displays them in raw form:
include_once("arc2/ARC2.php"); $parser = ARC2::getRDFParser(); $parser->parse('http://www.w3.org/People/EM/contact#me'); print_r($parser->getTriples());
Conclusion:
Array ( [0] => Array ( [s] => http:
The structure of the array of triplets is specified:
- s subject value (URI, Bnode ID, or variable)
- p URI of a property or variable
- o object value (URI, Bnode ID, literal or variable)
- s_type "uri", "bnode" or "var"
- o_type "uri", "bnode", "literal" or "var"
- o_datatype type URI
- o_lang language identifier, for example, "en-us"
Instead of
getTriples () you can use
getSimpleIndex () , then the output will become more meaningful:
Array ( [http:
Use the sandbox storage for an example with requests (copy from config.php):
$config = array( 'db_host' => 'localhost', 'db_user' => 'root', 'db_pwd' => '', 'db_name' => 'arc2test1', 'store_name' => 'sandbox', ); $store = ARC2::getStore($config); $q = ' PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?nick WHERE { ?x foaf:nick ?nick .} '; $rows = $store->query($q); print_r($rows); print_r($store->getErrors());
Result:
Array ( [query_type] => select [result] => Array ( [variables] => Array ( [0] => nick ) [rows] => Array ( [0]=> Array ( [nick] => em [nick type] => literal ) [1]=> Array ( [nick] => Danny [nick type]=> literal ) ) ) [query_time] => 0.060401201248169 )
Next, we work with the array as we need.
The interesting thing does not end there, the ARC2 library has many interesting functions, and you can write dissertations about semantics for another ten years, but this is all beyond the scope of the article.
However, I hope the acquaintance took place.
Thanks for attention!