Future Plans for rdfDB
Here are some of the future plans for rdfDB.
- In Memory Database (IMDB) : The goal is to support both on disk and in-memory databases.
The in-memory database is non-volatile. When it starts up, all the files that were loaded
into the database will be read it along with all the inserts/deletes (which will be recorded
in a log). In memory databases provide extremely high performance. The problems with them
are start up time (the time it takes to read in all the contents) and memory footprint.
Still, if the database has less than a million tuples, IMDBs can be very good.
- Thread Pool : rdfDB currently spawns a new thread for every connection. This could be
expensive for short lived threads. Since one of the goals of rdfDB is to make database connections
as cheap as http connections, it would make more sense to use a thread pool.
- On disk storage : rdfDB currently uses Berkeley DB from Sleepycat. There are two problems
with this.
- Sleepycat is GPLed which makes rdfDB GPLed which is not good if we want companies to use rdfDB.
- The ideal indexing structure for triples is a cross between b-trees and hash tables.
Hash tables, with their constant time performance are desirable, but we need a b-tree
like structure for queries where the arc label is a variable (i.e., to determine which
arcs leave an object).
The solution to these two problems is to do our own on-disk storage. The proposed storage is
to use an indexing structure that is a cross between b-trees and hash tables. More details soon.
- Support for RDF Schemas