rdfDB Query Language

rdfDB uses a high level SQLish query language. The data is modelled as a directed labelled graph (RDF). Nodes in graph can be
  1. Resources : Every Resource is identified by a URI (e.g., foo, http://dmoz.org/#Top). Resources are written as URIs. The Resource whose URI is mailto:guha@guha.com is referred to (in the query language) as mailto:guha@guha.com.
  2. Integers. Integers are written as such (e.g., 42, 9, 18).
  3. Strings : Strings are UTF8, enclosed by single quotes ('). e.g., 'foo', 'foo bar'. To insert a single quote iselft into a string, escape it with a slash (\). e.g., 'Paul O\'Brien'.
Other datatypes such as floats and dates are coming soon.

All operations revolve around the concept of a "triple". A triple is intended to model the concept of a object with a property value. It consists of

The triple is written using the predicate logic syntax : (<arc-label> <object> <property-value>).

A collection of triples forms a database. There are no constraints on the set of triples that constitutes the database. (Some other RDF implementation refer to the concept of database as a "model").

Database Operations are divided into the following categories:

Database Creation

Result Codes :
  1. 0 : success
  2. -10 : database could not be deleted. Most likely cause is that the file permissions were wrong. Make sure that rdfDB is allowed to write into the directory RDFDB_DIR. There is no return value.

Loading Files

rdfDB is designed to act as a cache for RDF, RSS, edge-labelled XML and other data out on the network. To facilitate this, it supports the ability to load the contents of a url (that points to an RDF, RSS ... file) into the database.

Result Codes:
  1. 0 : success
  2. -2 : syntax error
  3. -5 : database does not exist
  4. -6 : could not access the url
  5. -9 : unknown file format

Namespace Commands

RDF vocabularies may come from different namespaces. When parsing XML files, rdfDB creates URIs by concatenating the namespace uri (of an element's namespace) with the character '#' and the element name. So, if we have
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dmoz="http://dmoz.org/rdf">

<rdf:Description rdf:about="http://dmoz.org/Auto/CarSeats">
  <dmoz:relatedTo rdf:resource="http://dmoz.org/Children/Safety"/>
</rdf:Description>

</rdf:RDF>
The triple that is added to the database is
(http://dmoz.org/rdf#relatedTo http://dmoz.org/Auto/CarSeats http://dmoz.org/Children/Safety)
In order to simplify the statement of queries, one can set a namespace prefix to correspond to a namespace uri. Result Codes:
  1. 0 : success
  2. -2 : syntax error

Inserts & Deletes

The following two commands are used to add and remove triples.
  1. insert into {database_name} (arc1 source1 target1) [, (arc2 source2 target2) ... ]
    e.g., insert into dmoz (narrow http://dmoz.org/Top FlyingPizzas) </>

  2. delete from {database_name} (arc1 source1 target1) [ , (arc2 source2 target2) ... ]
    e.g., delete from foo (narrow http://dmoz.org/Top FlyingPizzas) </>

    Here arcs, sources and targets must be explicitly specified values. If you want to delete from the database all triples that satisfy some condition, use the next form of delete.

  3. delete from {database_name} where (arc1 source1 target1) [, (arc2 source2 target2) ... ]

    Here you can use variables in the same way as you'll use them in the select queries, described below. All the triples, that match the requirements of the where (...) conditions will be deleted.

    With delete ... where ... statements you can also use optional output [format] exactly as in the select queries, to have all the matching values of the variables be printed out as the triples are deleted.
Result Codes :
  1. 0 : success
  2. -2 : general syntax error
  3. -3 : malformed literal
  4. -5 : database does not exit
  5. -10 : wrong file permissions (could not open DB)

Query

There is one query command which has the syntax : select variable1 [, variable2 ... ] from {database} where constraint1 [, constraint2 ...] [ output {output-format} ] </> which returns a set of variable bindings for the selected variables such that the triples in {database} satify constraint1, constraint2 ... under those variable substitutions.

Variables are syntactically designated by symbols starting with the character '?'. e.g., ?name, ?foo.

A constraint is of the form (arc-label source target) where any one or more of arc-label, source or target can be a variable or resource and in the case of the target, also an integer or string. The same variable can appear in multiple constraints.

e.g., select ?x ?y from dmoz where (title ?x ?y), (createdBy ?x RichSkrenta), (type ?x Topic)
List the id's and titles of all objects of type Topic created by RichSkrenta.

The query can optionally specify the output format by adding output {output-format} to the query. The supported output formats are "tab-limited" and "variable-list". I hope to add "javascript" and "rdf-xml" as supported output formats. The default is "tab-limited".

Result structure: The result contains zero or more lines of answers followed by the result code line. The syntax of the answer line depends on the chosen output format. In the case of the "variable-list" format (the default), there is one line per variable binding set which has the syntax
variable1=value1TABvariable2=value2...

In the case of "tab-limited", there is one line with tab-separated variable names, and all following lines are variable values set which has the syntax
value1TABvalue2...
where the order of the values is in the order of the variables in the query.

Result Codes:

  1. 0 : success
  2. -2 : syntax error
  3. -3 : malformed literal error
  4. -4 : general error
  5. -5 : database does not exist
  6. -6 : could not access data
  7. -8 : unconstrained variable

Sample Session

This is a simple sample session with rdfdb. Queries are terminated with " </>". The query returns any answers (as applicable) and an error code. The error code 0 is returned for successful operations.
telnet 209.157.132.197 7001
Connected to govinda.guha.meer.net (209.157.132.197).
Escape character is '^]'.
create database test1 </>
0 </>
insert into test1 (type DanB Person), (name DanB 'Dan Brickley') </>
0 </>
insert into test1 (worksFor DanB W3C)  (worksFor DanC W3C) </>
0 </>
insert into test1 (name DanC 'Dan Connolly') </>
0 </>
select ?x from test1 where (worksFor ?x W3C) (name ?x ?y) output variable-list </>
?x = DanC ?y = 'Dan Connolly'
?x = DanB ?y = 'Dan Brickley'
0 </>
<quit>

Here is a simple www front end browser for the dmoz hierarchy based on rdfDB DBI. Here is the code behind that.

Result Codes

Here is the complete list of result codes.
  1. 0 : success
  2. -1 : unknown query type
  3. -2 : general syntax error
  4. -3 : malformed literal error
  5. -4 : misc
  6. -5 : database does not exist
  7. -6 : could not access data
  8. -7 : unauthorizes access
  9. -8 : unconstrained variable
  10. -9 : unknown file format
  11. -10 : wrong file permissions