rdfDB Internals

This page describes some of the rdfDB internals. At the moment, it is highly incomplete, so please bear with me.

On Disk Structure

Every rdfDB database has 3 Sleepycat B-Tree databases (arcindex.db, findex.db, rindex.db) associated with it.

Given a triple where the arc is A1, source is S1 and target is T1, this is recorded on disk as three key-data pairs :

  1. Into findex.db : key = S1 + "|" + A1
    Value = Type + "\t" + Source + "\n" + T1.
    Type = 1 if T1 is a resource, 2 if its an int or 3 if its a string.
    Source is an integer representing the source url of the triple (non-zero if it is being loaded via a "load file from ..." command).

  2. Into rindex.db : key = T1 + "|" + A1
    Value = Type + "\t" + Source + "\n" + S1.

  3. Into arcindex.db : key = A1 Value = Type + "\t" + Source + "\n" + S1.