RDFDB
Up | Previous | Next | Down
About
I am putting up a (highly modified) win32 port of RDFDB here, I have also contacted the original author of the program.
RDFDB is a light-weight RDF server with a very simple interface, you only have to open a socket to it and send some plain-text SQL-like queries to it, and then receive and parse the reply. The original program can be found on these sites:
NOTE: If you would like to participate in development, please mail me (the pages above are outdated, although I hope to one day continue development on SourceForge).
This project is closely tied to MDDB, as this is the RDF server MDDB is currently based on.
Downloads
The latest version was uploaded on July 29, 2004.
NOTE: There is some documentation further down this page!
NOTE: RDFDB is very insecure at the moment, running it on a machine that is connected directly to the internet is not recommended (you should at least put a firewall between the internet and the computer RDFDB runs on and configure it in such a way that RDFDB can't be reached from the internet).
- RDFDB 0.47 mod 13 (2004-07-29) sources (112 KB)
- Mod 13 fixes a few small bugs and adds the export query type.
Mod 12 adds some security features to RDFDB, changes the way you can specify additional commandline parameters for the Win32 service, has seen a number of speed ups and some code rewriting/restructuring (as well as a few small bugfixes).
Mod 11 extends the functionality of the insert and delete queries to allow for inserting/deleting triples from a file (similar to the load/unload queries), adds support for commandline options and config files (adding support for logging) and has seen quite a bit of internal changes (all globally stored resources should now be released when the program is exited). It also solves a problem with deleting triples caused by how the original code stored triples (this has now been changed, unfortunately this means DB's will in general be slightly bigger, it also means you may want to rebuild your DB's).
Mod 10 primarily makes it possible to run RDFDB as a service under Windows (not 95/98/ME) and adds support for unconstrained triples (see the version history below for an example).
Mod 9.1 fixes a few bugs with literals (could cause RDFDB to crash) and optional triples (not sure if it could really cause problems, but the logic to determine the number of non-null values in the current result was slightly flawed).
Mod 9 fixes two bugs, replaces null value support with support for optional triples (this also means the non-null keyword is now deprecated) and uses Raptor for parsing files (the old RDF/XML and RSS parsers have been removed from the source). Raptor adds support for N-Triples and Turtle (both similar to N3, N-Triples is used a lot for test cases by the W3C and Turtle is a lot more compact than RDF/XML). And as Curl is used for reading files RDFDB now supports reading from both HTTP and FTP (in addition to local files), in the future other protocols might be enabled if needed (Curl can handle quite a number of protocols).
Mod 8 makes the output format of RDFDB more robust, improves error reporting and adds the "non-null" option to disable null-values.
NOTE: Because both RDFDB and MDDBLib are still in an early stage it is usually necessary to have corresponding versions of both (versions that were released together), the changelog on the MDDBLib page shows you which MDDBLib versions corresponds to which RDFDB version.
NOTE: To compile the above sources you need to download the following:
- RDFDB 0.47 mod 13 (2004-07-29) win32 binary (437 KB)
- A compiled version of the above sources. Comes with pthreadVC.dll and the rest of the libraries are linked statically, so you don't need anything else to run it.
- RDFDB 0.47 mod 3 (2004-03-14) sources (151 KB)
- This version shouldn't be needed anymore, only download it if you have some use for a version of RDFDB that's closer to the original than the latest version.
Mostly adds support for global namespace mappings through the environment variable RDFDB_NAMESPACES and the output format is changed slightly to make it easier to communicate with.
Previous release notes:
This adds support for more advanced SquishQL-like queries and fixes a few bugs.
- RDFDB 0.47 mod 3 (2004-03-14) win32 binary (319 KB)
- This version shouldn't be needed anymore, only download it if you have some use for a version of RDFDB that's closer to the original than the latest version.
A compiled version of the above sources. This now comes with pthreadVC.dll, so you don't need anything else to run it.
- RDFDB C# frontend (14 KB)
- Very simple C-Sharp frontend for RDFDB (just lets you enter a query, press a button and it shows the result), it connects to port 7001 on the localhost.
I am unsure as to what exact encoding this tries to use (it uses the default), but I haven't experienced any problems yet.
This frontend is slightly more sophisticated than the JAVA frontend below, and also has some extra functionality unrelated to RDFDB (it can add LUIDs to filenames). But please keep in mind I just created this to have a tool to test some things and is no way meant to be an actual implementation of anything, it's just a test program to work-out some ideas.
You can start the frontend by unzipping it to a directory and starting CSFrontend.exe (MS .NET Framework required).
This currently does not work with Mono or DotGNU (last time I checked at least), but there is no specific reason why it shouldn't in the future (once their class libraries are more complete).
The frontend has been updated to be compatible with the new output format of RDFDB.
- RDFDB Java frontend (7 KB)
- Very simple JAVA frontend for RDFDB (just lets you enter a query, press a button and it shows the result), it connects to port 7001 on the localhost.
This assumes data is sent as UTF-8, but that isn't so (although it should be). Fortunately this usually doesn't give any problems.
You can start the frontend by unzipping it to a directory and starting start.bat.
The frontend has been updated to be compatible with the new output format of RDFDB.
It has now been modified to not reconnect all the time (it has connect/disconnect buttons), so you can log in.
If you have any questions, just mail me.
This modification adds support for simple constraints, optional triples, a new syntax, namespace prefixes (without having to use special commands) and some other useful stuff, see the release notes, changelog and documentation below for more information.
Running as a service
On a Linux system (or other UNIX-like system) all kinds of programs can be run in the background, programs don't need to do a lot of special things to accomplish that. On Windows programs have to be written specifically for this purpose however. The main advantages of running as a service is that you don't have to have an RDFDB window open all the time and that it is easily available to every user on the system (you can even configure it to start automatically at boot time). RDFDB has three commandline options that enable it to run as a service (under Windows):
--service=run
- This starts RDFDB as a service. Normally you shouldn't use this directly, this only used by the service manager (part of Windows) when RDFDB is started as a service.
--service=install
- Tries to install RDFDB as a service, it uses the full path of the RDFDB executable used with this option to register it with the service manager. If the RDFDB service already exists it tries to update the current information.
--service=uninstall
- Deletes the RDFDB service, this does NOT delete any files, RDFDB just isn't known as a service anymore after this.
To specify commandline parameters to be used by the service you can add any you'd like to the commandline when you run RDFDB with --service=install
.
If you want to specify an (alternative) users file for the service you can use the following commandline to install the service:
rdfdb --service=install --users=d:/users.txt
White space in a parameter passed in this way will not be handled correctly in the current version.
Currently RDFDB registers itself as a service that needs to be started manually!
Optional triples
Null values are usefull because it they allow you to do a query like this:
select ?x ?type from musicbrainztest where (<a> <dc:title> ?x) ?(?x <rdf:type> ?type)
Which could return something like the following:
?x |
?type |
"title1" |
|
<title2> |
|
<title3> |
<http://musicbrainz.org/mm/mm-2.1#Artist> |
Without support for optional triples the query would have looked like this:
select ?x ?type from musicbrainztest where (<a> <dc:title> ?x) (?x <rdf:type> ?type)
And only title3 would have been returned.
Global namespace mappings
You can use global namespace mappings by setting the environment variable RDFDB_NAMESPACES. Just specify a list of prefixes followed by the URIs they map to, some examples:
- dc<http://purl.org/dc/elements/1.1/>
- mm<http://musicbrainz.org/mm/mm-2.1#>
- dc<http://purl.org/dc/elements/1.1/>mm<http://musicbrainz.org/mm/mm-2.1#>rdf<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Query syntax
The new query syntax (for the select statement) looks like this (this is not entirely correct, but if you follow this you shouldn't get into too much trouble later on):
query |
= |
'select' variable-list 'from' database-name 'where' triple-list ('and' constraint-list)? ('using' for-list)? 'non-null'? ('output' output-format)? |
variable-list |
= |
'*' | (variable)+ |
database-name |
= |
identifier |
triple-list |
= |
( '(' var-or-literal var-or-literal var-or-literal ')' )+ |
constraint-list |
= |
(num-expression | text-expression)+ |
for-list |
= |
(identifier 'for' uri)+ |
output-format |
= |
'rdf-xml' | 'javascript' | 'tab-delimited' | 'variable-list' |
num-expression |
= |
variable ('<'|'>'|'<='|'>='|'=='|'='|'!='|'<>') number-literal |
text-expression |
= |
variable ('eq'|'ne'|'=~'|'~~'|'!~') (text-literal | uri-literal) |
var-or-literal |
= |
variable | number-literal | text-literal | uri-literal |
variable |
= |
'?'identifier |
number-literal |
= |
An integer (123 for example). |
text-literal |
= |
A sequence of characters enclosed by double quotes (") or apostrophes ('), quoting any occurrences of the quoting character used with a backslash (example: 'Joe\'s band "hollywood"' , another example: "Joe's band \"hollywood\"" ).
NOTE:If a text-literal is used in a text-expression using the '=~', '~~' or '!~' operator it will be interpreted as a regular expression. |
uri-literal |
= |
A sequence of characters enclosed by '<' and '>' ('<someuri>' for example).
A URI can also be prepended with a prefix that has been mapped to a URI by either a special command or the 'using' clause ('rdf:type' for example).
NOTE:If a uri-literal is used in a text-expression using the '=~', '~~' or '!~' operator it will be interpreted as a regular expression. |
identifier |
= |
A sequence of characters that can't be interpreted as a delimiter (so no spaces or commas for example). |
Some examples of valid queries (the URI's are NOT valid BTW):
- select * from test1 where (?x <http://purl.org/dc/elements/1.1/worksFor> ?y)
- select ?x ?y from test1 where (?x <http://purl.org/dc/elements/1.1/worksFor> ?y)
- select ?x ?y from test1 where (?x <dc:worksFor> ?y) using dc for <http://purl.org/dc/elements/1.1/>
- select ?x ?y from test1 where (?x <http://purl.org/dc/elements/1.1/worksFor> ?y) and ?x like <Dan[BC]> ?y ne 'Company'
- select ?x ?y from test1 where (?x <http://purl.org/dc/elements/1.1/worksFor> ?y) and ?x like <Dan[BC]> ?y like W.* output tab-limited
- select ?x ?y from test1 where (?x <dc:worksFor> ?y) and ?x like Dan[BC] ?y ne 'Company' using dc for <http://purl.org/dc/elements/1.1/> output tab-limited
- select ?name from musicbrainztest
where (?x <rdf:type> mm:Artist) (?x <dc:title> ?name)
using rdf for <http://www.w3.org/1999/02/22-rdf-syntax-ns#> dc for <http://purl.org/dc/elements/1.1/> mm for <http://musicbrainz.org/mm/mm-2.1#>
- select ?name from musicbrainztest
where (?x <rdf:type> mm:Artist) (?x <dc:title> ?name)
NOTE: This would require some global mappings to be set.
- select ?name from musicbrainztest
where (?x <rdf:type> mm:Artist) (?x <dc:title> ?name)
and ?x like <http.*>
using rdf for <http://www.w3.org/1999/02/22-rdf-syntax-ns#> dc for <http://purl.org/dc/elements/1.1/> mm for <http://musicbrainz.org/mm/mm-2.1#>
- select ?name from musicbrainztest
where (?x <rdf:type> mm:Artist) (?x <dc:title> ?name)
and ?name like 'rine [Dd]re'
using rdf for <http://www.w3.org/1999/02/22-rdf-syntax-ns#> dc for <http://purl.org/dc/elements/1.1/> mm for <http://musicbrainz.org/mm/mm-2.1#>
- select ?name from musicbrainztest
where (?x <rdf:type> mm:Artist) (?x <dc:title> ?name)
and ?name like rine\ [Dd]re
using rdf for <http://www.w3.org/1999/02/22-rdf-syntax-ns#> dc for <http://purl.org/dc/elements/1.1/> mm for <http://musicbrainz.org/mm/mm-2.1#>
Settings
Settings are controlled through environment variables, commandline arguments and config files.
RDFDB looks for settings in the following order:
- Commandline switches (of the form
--setting=value
).
- Contents of the config file. (none by default, can be overridden with the
--config-file=x
switch or RDFDB_CONFIG_FILE environment variable) The config file should contain lines of the form setting=value
.
- Environment variables (of the form
RDFDB_SETTING=value
, to set the port RDFDB listens to through an environment variable you should set the environment variable RDFDB_PORT
for example).
The following settings are available:
Variable |
Description |
dir (required) |
The directory where the RDFDB databases should reside. |
port (required) |
The port RDFDB listens to. |
users (required) |
The file containing the user information RDFDB should use.
This file should consist of lines with the following syntax: username:rights:password . Whitespace is ignored in obvious places (at boundaries between fields, beginning and end of line, between keywords in the rights field). The username field can not be empty, the password field CAN. The rights field has the following syntax:
rights = 'all' | 'all except' permissions | permissions
permissions = ('connect' | 'admin' | 'read_db' | 'write_db' | 'read_url' | 'write_url')* |
ip_ranges (required) |
The file containing information about IP ranges that are allowed to connect to RDFDB.
This file should consist of lines with the following syntax: IP-range:rights . Where IP-range is either a single IP address, a single IP address (example: 192.168.1.32 with a wildcard (examples: 192.168.1.* , 192.168.* , * ) or a range (two IP addresses with a dash (-) between them, example: 192.168.1.34-192.168.1.45 ). rights is as with the users setting. |
namespaces |
A list of namespace definitions, see the documentation above for the syntax (under Global namespace mappings). |
verbosity_mask |
A mask that determines which kinds of messages RDFDB should show (an integer in hexadecimal notation), the default depends on whether it's a debug build (all messages are shown) or a release build (700, or NOTICE, CRITICAL and INFO).
The following values can be ORed (added) for valid values (all hexadecimal): NOTICE (100), CRITICAL (200), INFO (400), FAILURE (800) and DEBUG_INFO (1000). So if you want critical messages to show, as well as any (non-critical) failures, but not any of the other messages, you should set the verbosity_mask setting to A00 (200+800 in hexadecimal).
The header RDFDB normally shows when it starts can't be disabled. |
config_file |
The config file RDFDB should use (see the part about how RDFDB looks for settings above). |
log_file |
A log file RDFDB should mirror all output to (the header RDFDB normally shows when it starts isn't written to the log). |
Version history
July 29, 2004 (0.47 mod 13):
- Added the export query (
export DB [raw] to FORMAT FILE_URI
, FORMAT can be ntriples, turtle or csv).
- Fixed (hopefully) some issues with the database interface (it used to create faulty indexes in some cases).
- Updated string quoting support.
- Cleaned up and restructured some code (also fixed a few small memory leaks).
July 15, 2004 (0.47 mod 12):
- Rewrite of the code handling the client connections (it's now much easier to understand and I've made it clearer where certain responsibilities lie).
- Implemented some security by letting client authenticate themselves (simple username/password) and letting the user assign rights to IP addresses (so only people from the LAN can connect for example).
- Changed the Win32 service to use the commandline parameters passed when it was installed.
- Sped up writing the output of a query by using an intermediate buffer to reduce the number of calls to send. For queries where most of the time was spent outputting data this can make a huge difference, a query that queried for all track titles in a DB with nearly 200.000 tracks saw a two-fold speed up, a slightly smaller gain was accomplished with a query that queried for all artists in a much smaller DB (about 10 artists).
- Replaced PrintMsg by FastPrintMsg in addAnswer to reduce the number of calls to PrintMsg (not the fastest function around).
- Added a typedef for SOCKET and a define for closesocket for non-win32 environments (not sure they're needed, but the old code used int and close), reducing the number of ifdef's.
- Connections should now be cleaned up properly.
- Slight improvement of the HashTable code (fixed a small bug and made it store ordered lists instead of unordered lists).
- Resolved some potentially problematic locking situations.
- Some code restructurings.
June 25, 2004 (0.47 mod 11):
- Added support for file insert and delete queries (similar to load/unload, but with a slightly different syntax and the assertions are stored as if they didn't come from a file, the file delete query also doesn't work in the same way as the unload query, as it simply deletes all triples found in the file from the db, instead of triples originally inserted from that file into the db).
- Added support for commandline settings and a config file.
- Updated to a new version of Raptor (changes the way file URIs are interpreted, you should now use
file:/c:/foo/bar.rdf
or file:///c:/foo/bar.rdf
instead of file://c:/foo/bar.rdf
.
- It can now log its output to a file (this came with the added bonus of the verbosity mask instead of the verbosity level).
- The create database query now checks whether the database already exists.
- Fixed the generation of id's for blank nodes, it now generates id's unique to a file for blank nodes with a nodeID too.
- Restructured the initialization and deinitialization of the various parts of RDFDB, everything should now be inited and deinited (hopefully correctly).
- Fixed checking whether or not an assertion is actually deleted (it used to assume in almost all cases that an assertion was deleted).
- Made sure inserting an assertion with the same arc/source pair as an existing assertion would cause an entry to be added to the arcindex (otherwise the original would be deleted if one of the two assertions is deleted, which would cause certain lookups to fail).
- Fixed the HashTable, it used to only use two bins (the first one and the last one) due to the incorrect use of the and operator.
- Various internal changes (moved some functions, made a lot of functions and variables static, removed some dead code, etc.).
May 30, 2004 (0.47 mod 10):
May 23, 2004 (0.47 mod 9.1):
May 22, 2004 (0.47 mod 9):
- Deprecated non-null in favour of optional triples, this means the behaviour is more conformant to RDQL again (if you don't use any optional triples the behaviour will be identical) and it allows for a much finer control.
- The RDF/XML parser now supports the XMLSchema integer data type for literals.
- Raptor is now used for parsing, all old parsers have been removed from the source (the binary is now about 300KB larger, but the parser supports quite a bit more than we used to, including N-Triples, Turtle and ftp).
- The previous version still output a blank line as the result of a load query, this is now fixed.
- Unload queries now close the file they have unloaded (they used to keep it open).
- Updated documentation that comes with RDFDB.
- Moved all third party packages to a separate directory and documented what changes were made.
- Updated to Berkeley DB 4.2.52 and the latest Pthreads (2004-05-16-fixed).
May 11, 2004 (0.47 mod 8):
- Modified the output format to be more robust and to provide space for returning error messages, instead of just error codes.
- Added the non-null option to revert to the old behaviour without null values (used by MDDBLib in ML_GetMetadata).
- Made sure non-result queries indeed don't return any result lines (the load query used to return a line with the code of the file that was loaded).
- Changed the query parser to be better at reporting syntax errors (and also reporting them through the new output format), it's also a bit stricter now.
- Improved the compatability with RDQL by allowing for both " and ' as quotes and supporting the text comparison operators RDQL uses.
- Removed a lot of old (disabled) code.
April 24, 2004 (0.47 mod 7):
- Added an option to be increase the default verbosity level (through the RDFDB_VERBOSITY environment variable).
- Fixed the duplicate detection of addremoveIndex, this means that it no longer mistakenly thinks an entry already exists when an entry which is a prefix of the new entry exists.
- Fixed the literal handling of the new RDF/XML parser, it used to concatenate two literals if they appear immediately after one another.
- Fixed handling of negative integers in a query.
- Made the assertion count extern, the assertion counts displayed after a load or unload should now be correct.
- The database directory is now appended with a (back)slash if this isn't the case already.
April 15, 2004 (0.47 mod 6):
- The select * from ... syntax is now supported (simply selects all variables used in the where clause).
- The result of a select query now always begins with a header (if no errors occurred during parsing and the output format is tab delimited, which is the default).
- Qualified names (dc:title for example) should now be resolved correctly.
- The RDF/XML parser has been replaced by a new one, which supports nearly all RDF/XML features (most notable exceptions are data types, the xml:lang attribute and the rdf:parseType attribute) and does so correctly (as far as I can tell).
- Queries of the form: select ?a ?b from db where (<uri> ?a ?b) (?b <prop> val) are fixed.
- It now checks whether the client closed the connection early in addAnswer (so very long results can be aborted this way).
March 19, 2004 (0.47 mod 5):
- Loading of files should now be pretty much functional (including reloading files, which simply "refreshes" the db).
- Unloading files is fixed (the only thing it doesn't do yet is delete the .dbfl file).
- The load and unload queries now expect a quoted URI as argument (quote it with angle brackets like done in triples, this allows for spaces in filenames).
- A few regressions that existed in mod 4 are now fixed.
March 18, 2004 (0.47 mod 4):
- Added support for null values.
- Changed the syntax to be more consistent and logical.
- Fixed loading of files over HTTP, made it possible to use drive letters (file://d:/something for example) and fixed a bug concerning loading files with CRLFs in them.
- Modified the internals quite a bit, more consistent and it should be a bit faster (a lot of things were done more than once).
- Fixed the crash when exiting with Ctrl-C (which is the proper way to exit RDFDB).
- Updated expat to the latest version (1.95.7).
- RDFDB no longer generates URIs with two hashes as namespace separator (at least in normal circumstances), this involved modifying expat (the addBinding function now checks whether the uri already ends with namespaceSeparator).
March 14, 2004 (0.47 mod 3):
- Added support for global namespace mappings through the environment variable RDFDB_NAMESPACES.
- Modified the output format a bit to make it easier to parse (MDDBLib needs this).
March 10, 2004 (0.47 mod 2):
- Added support for constraints (using the 'and' keyword).
- Added support for the 'using' keyword (allows namespace mappings within a single select).
- Fixed some bugs.
February 14, 2004 (first public release, 0.47 mod 1):
- Ported RDFDB to win32.
- Made loading files work (at least RDFDB tries to load them, and it looks like something sensible is put in the database).
Up | Previous | Next | Down
To contact me, please mail to: th.v.d.gronde@hccnet.nl
I hope you'll enjoy my program(s).