📜 ⬆️ ⬇️

From the command line for knowledge


One of the most common standards for working with knowledge bases is the presentation of RDF and the query language SPARQL . The database is usually accessed via the SPARQL-endpoint via HTTP ( Jena and Sesame can be used as embedded databases, for example, using a banana-rdf wrapper, and you can also access Virtuoso via ODBC by adding the prefix 'SPARQL' to the query string).
There are many open “SPARQL access points” - according to wikipedia DBpedia , a large set of biological knowledge bases , geodata .
A web interface is usually attached to the endpoint, but the browser is too cumbersome, and we want to access them directly from the command line!

For serious work, you can use ready-made libraries that exist for many languages, including those focused on data analysis (for example, R ). We are also interested in the ability to quickly create a request for obtaining information or debugging the request itself.

The SPARQL query language is syntactically similar to SQL, and semantically similar to Prolog. Knowledge is represented by a kind of graph "marked with" on the nodes and edges. “Marks” are usually URLs (which are not required to go anywhere), and vertices without outgoing edges are also typed data. In SELECT, a subgraph template and a list of fields of this template that interest us are specified.

In Unix-like operating systems (for example, Windows 10), you can use bash, curl, and a special jq package for working with json:
curl -H "Accept: application/sparql-results+json" "http://data.semanticweb.org/sparql?query=PREFIX%20foaf%3A%20%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0ASELECT%20DISTINCT%20%3Fperson%20%3Fname%0AWHERE%20%7B%20%3Fperson%20a%20foaf%3APerson%3B%0Afoaf%3Aname%20%3Fname%20%7D%20LIMIT%2010" | jq .results.bindings 

Using PowerShell allows you to do all this more humanly.
Let us describe a function that makes a request for receiving data to a SPARQL server:
 function sparql_raw([String]$query, [String]$endpoint, [String]$graph="", [String]$prefix="", [String]$format="application/sparql-results+json") { $dg = if ($graph -eq "") { "" } else { "default-graph-uri=$([uri]::EscapeDataString($graph))&" } $req = "${endpoint}?${dg}query=$([uri]::EscapeDataString($prefix+$query))&format=$([uri]::EscapeDataString($format))" Invoke-RestMethod -Headers @{"Accept"=$format} -uri $req } 

For convenience, you can set default parameters.
 function prefixes([String]$key) { "PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>" } function defaultGraph([String]$key) { switch -Regex ($key) { "http://dbpedia.org/sparql" { "http://dbpedia.org" } default { "" } } } function sparql_raw([String]$query, [String]$endpoint="http://dbpedia.org/sparql", [String]$graph=(defaultGraph $endpoint), [String]$prefix=(prefixes $endpoint), [String]$format="application/sparql-results+json") { $dg = if ($graph -eq "") { "" } else { "default-graph-uri=$([uri]::EscapeDataString($graph))&" } $req = "${endpoint}?${dg}query=$([uri]::EscapeDataString($prefix+$query))&format=$([uri]::EscapeDataString($format))" Invoke-RestMethod -Headers @{"Accept"=$format} -uri $req } 

In addition to the query string, it receives the URL of the server, the default column (analogous to the name of the database in traditional DBMS) and the expected response format. The standard describes a set of acceptable formats, from html, to csv, I chose the simplest with the preservation of meta-information.
The answer looks like this:
 { "head": { "vars": [ "person", "name" ] }, "results": { "bindings": [ { "person": { "type": "uri", "value": "http:\/\/kantenwerk.org\/metadata\/foaf.rdf#me" }, "name": { "type": "literal", "value": "Knud Möller" } }, { "person": { "type": "uri", "value": "http:\/\/tomheath.com\/id\/me" }, "name": { "type": "literal", "value": "Tom Heath" } } ] } } 

The response format is expected by some servers in the GET request parameters, and some in the Accept header. Our function, just in case, transfers it both there and there.
')
Now you can present the answer in a readable form:
 function sparql_light([String]$query, [String]$endpoint="http://dbpedia.org/sparql", [String]$graph=(defaultGraph $endpoint)) { $res = (sparql_raw -format 'application/sparql-results+json' $query $endpoint $graph) $vars = $res.head.vars $r = $res.results.bindings foreach ($i in $r) { $h = @{} foreach ($n in $vars) { $h[$n] = $i.$n.value } new-object PSCustomObject -Property $h } } 


Now we can learn a lot of new things without looking up from the terminal!

Find out in which metabolic pathways which substances are involved:
 sparql_light -endpoint "http://kegg.bio2rdf.org/sparql" ' select distinct ?subst ?path where { ?x <http://bio2rdf.org/kegg_vocabulary:interaction> ?y. ?x <http://bio2rdf.org/kegg_vocabulary:pathway>?p. ?p <http://purl.org/dc/terms/title> ?path. ?y <http://purl.org/dc/terms/title> ?subst. } LIMIT 100' 


Or what Wikipedia data is about London:
 sparql_light ' select distinct ?label ?type ?value where { ?x ?p <http://en.wikipedia.org/wiki/London>. ?x ?y ?value. BIND(DATATYPE(?value) as ?type). FILTER(bound(?type)). ?y <http://www.w3.org/2000/01/rdf-schema#label> ?label. FILTER (LANG(?label) = "en" ). FILTER (not exists {?x ?y ?a. ?x ?y ?b. FILTER(?a != ?b).}). }' 


Get a list of hackspaces:
 sparql_light -endpoint "http://linkedgeodata.org/sparql" -graph "http://linkedgeodata.org" " select ?name ?addr ?home where { ?xa <http://linkedgeodata.org/ontology/Hackerspace>. ?x <http://linkedgeodata.org/ontology/addr%3Acity> ?addr. ?x <http://xmlns.com/foaf/0.1/homepage> ?home. ?x <http://www.w3.org/2000/01/rdf-schema#label> ?name. } limit 100" 


Find out who is not IBM Research affiliated:
 sparql_light -endpoint "http://data.semanticweb.org/sparql" ' PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX swrc: <http://swrc.ontoware.org/ontology#> SELECT DISTINCT ?person ?affiliation WHERE { ?personid a foaf:Person. ?personid swrc:affiliation $affiliationid. ?perfonid foaf:name ?person. ?affiliationid foaf:name ?affiliation. filter( ?affiliation != "IBM Research" && ?affiliation != "IBM Research Laboratory") } limit 100' 


And what interesting queries did you come up with?
Good luck in the extraction of knowledge!

Source: https://habr.com/ru/post/282067/


All Articles