MDDB Discussion

Back to MDDB

Below is a transcript of the discussion that took place on friday evening (March 20, 2004). Unnecessary data has been removed from the transcript (log on, log off, etc.) and it has been divided into a few sections for easier reading.

Start of the discussion/File identification

JaspervdG good evening
nerochiaro so, everything is fine ?
JaspervdG I was just going through the MDDB thread at HA to see if there is anything we might discuss now
nerochiaro right, find anything we were missing ?
nerochiaro by the way, i don't think many people from HA will show up, seeing that no one responded to the announcement
JaspervdG well, most of thread is about the unique identification of files, but I don't think that's such a big issue for now
JaspervdG and you are probably right about not too many people showing up here
nerochiaro wait a sec, incoming phone call
nerochiaro ah, ok. i'm back.
nerochiaro yes, file identification it's not such a big issue right now.

XPath-like query syntax

nerochiaro i was reading your last email, instead
nerochiaro tell me a little bit how you plan to go on with that xpath like syntax
nerochiaro i think it can be very useful, if done right
JaspervdG if possible I'd like to be able to point to specific nodes
JaspervdG but I'm in still thinking of the best syntax,
JaspervdG one of the problems is that with RDF relations aren't simply child, parent, sibling, etc. but can instead be anything
JaspervdG which includes backlinks (which seem logical to do with something like parent)
nerochiaro yes, backlinks are similar to parent, but that similarity breaks when you have multiple parents
nerochiaro at least i think
nerochiaro any ideas on how to handle that ?
JaspervdG it could simply return more than one parent, just like child can "return" more than one child
JaspervdG I also saw that XPath 2 has some nice features we could use, especially the following syntax:
JaspervdG something/(somethine_else|another thing)/something
JaspervdG which would match both something/somethine_else/something and something/another thing/something
nerochiaro like in regular expression
JaspervdG yes, if you could do something like <dc:creator>/(.|<rdf:li>)/<dc:title>, that would be great
nerochiaro seems useful indeed
nerochiaro you are planning to write some kind of parser for xpath expressions inside MDDBlib, then, or you think to use something already made
JaspervdG I think I'll have to write something new, as the syntax will probably be slightly different (to deal with more than one parent for example)
JaspervdG I'll keep it simply for the time being though
nerochiaro ok, in this initial phase i think we should focus on something basic
JaspervdG I want to at least support backlinks and some way of specifying specific nodes (using a URI somewhere probably)

Metadata enumeration

JaspervdG while on the subject of paths, what about something like this
JaspervdG FindFirstProperty/FindNextProperty for enumerating properties in a certain node
JaspervdG disadvantage would be that some kind of internal state needs to be kept, advantage is that it has similar properties to callback based systems without the added complexity (on the part of the client)
nerochiaro wouldn't just GetMetadata for a node return a list of available properties for that node ?
JaspervdG that's also a (good) possibility, the only trouble with that is that GetMetadata can perhaps do too much
nerochiaro you mean, return too much data ?
nerochiaro it might in fact be a problem
JaspervdG for too many nodes at least
JaspervdG if you ask (the new) GetMetadata for something like <dc:creator>/<dc:title> it could return a lot of values, possibly spread among different nodes
nerochiaro returning to your first/next approach, we can implement it in such a way that the "internal state" is in fact maintained from the outisde, by forcing the caller to allocate a structure that then is used by the lib
nerochiaro and passed in on each GetNext call
JaspervdG that could be done (or the lib could allocate it and let the caller pass it on, wouldn't make much difference)
nerochiaro yes, and i don't see that as a problem.
JaspervdG should I perhaps also create a first/next variant of ML_GetMetadata and RDFDB_Query?
JaspervdG I was already planning on creating a callback version of GetMetadata.
nerochiaro well, the first/next way is certainly better than callbacks, for a non-C(++) caller
JaspervdG that's true, I think I'll have a go at it
nerochiaro that would be nice to have

Visualization of metadata structure (of an RDF graph)

JaspervdG also, do you have any idea on how to handle backlinks graphically (in the frontend for example)
nerochiaro well, i don't think we want to use graph visulization widgets. that would probably be too complicated (even if cool)
JaspervdG no, I too think that wouldn't be a good idea :)
nerochiaro i think that we can have something like two panels, one with the list of "parents" (in the xpath-ish sense we talked about before) and one with the list of "childs". and then a panel where the current node is fleshed out in detail
nerochiaro so one can use the two parent-child panels to navigate the hierarchy, and the current-node panel to view/edit the current node
nerochiaro (of course this is the raw idea, it needs to be perfected a little)
JaspervdG would the parent panel only show backlink relations (in a tree view) or also child relations of backlinks?
JaspervdG because that was one of the problems I couldn't get around
JaspervdG at least without writing my own tree class that can have nodes on both sides
nerochiaro mmh, showing that backwards tree can be nasty
nerochiaro i think we should visualize things one "step at a time"
nerochiaro list of parents->pick a parent (becomes the current node) ->list of childs
nerochiaro the latter being the original parent's childs
JaspervdG that might work, if it's fast enough and doesn't look to confusing
JaspervdG it would certainly be something to try
nerochiaro yes, i'm not sure if i will have time in the weekend, but i might be able to give it a go and experiment with it a bit
nerochiaro to see if it doesn't look too messy

MDDBLib interface

nerochiaro can we try to recap briefly all the API we have "defined" so far for MDDBlib
nerochiaro ?
JaspervdG did you succeed in interfacing with MDDBLib btw?
JaspervdG and if you didn't, why?
JaspervdG sure
JaspervdG I think the RDFDB connection functions are relatively complete:
JaspervdG   RDFDB_Connect
JaspervdG   RDFDB_Disconnect
JaspervdG   RDFDB_Query
JaspervdG   RDFDB_QueryCallback
JaspervdG   RDFDB_QueryFirst/Next (would still have to be written, and could perhaps replace one or more of the above)
nerochiaro i have been doing some tries with JNI, and things seem to work against a simple C library i built for the test (was the first time i sued JNI, never needed that before). i will try MDDBlib later tonight, though
JaspervdG sounds good (the JNI bit)
nerochiaro ok, as far as RDFDB interface, it seems fine

Simplified interface/Mappings

nerochiaro now, for the "simplified" interface to navigate the graph, we have already laid out some function right ?
JaspervdG the MDDB "file management" functions (still somewhat shaky)
JaspervdG   ML_IdentifyFile, should try to do as little as possible, returns a list of possible identifications?
JaspervdG   ML_AddFileToDB, should try to gather information about the file and assign a (given?) LUID to it?
JaspervdG   Should we have something in between those two that tries to gather information and then let ML_AddFileToDB just handle binding the LUID to the file?
JaspervdG which simplified interface? the "really" simplified interface (using mappings)?
JaspervdG if so, I don't think we have defined any functions for that yet
JaspervdG any ideas?
nerochiaro well, if we use "pseudo xpaths" for identifying nodes, we can have something as simple as AddMapping(xpath, friendly_path) and Translate(friendly_path) (and of course the Edit/Delete functions) ... friendly_mapping is something like Author.Name or Track.Title
JaspervdG as long as they don't allow "updating" the current that would probably work
JaspervdG the only problem would be how to handle containers in that case, as the simplified interface could easily be mixed with more sophisticated interfaces
nerochiaro well, since they can contain objects of different types, that is indeed a problem
nerochiaro if they can only contain one kind of object that would be easier
JaspervdG the function handling the mappings should also be able to more or less automatically create properties that point to URIs rather than literals if applicable (artists for example)
JaspervdG which would be far from trivial
nerochiaro wait, if a property contains a literal, that would stay that way. properties are user-defined, and if one chooses to define a prop as a literal, shouldn't we leave it that way (unless asked otherwise) ?
JaspervdG surely, but a simple client probably won't know about URIs, while a user using MDDB probably does want to use them (especially if that user also has other software that can handle them)
JaspervdG it would be a major pain if one application would set an artist property to a literal while a different application would set it to a URI
JaspervdG one way of dealing with this would be to restrict the possible value types for properties by using RDF Schema, OWL or some kind of custom schema-like system
JaspervdG but I doubt that would be the preferred way of dealing with it
nerochiaro in my opinion, if we give users the possibility to create custom classes of objects, then we should let the users create properties that refer to these objects
nerochiaro it's not necessary that they "know" about URIs...
nerochiaro ...we can simply create a function that, given a node and a propery on another node creates the association
nerochiaro that is, putting the URI in the prop instead of a literal
nerochiaro of course it's the user's responsibility to call that function instead of inserting the literal
nerochiaro (not sure if that explanation was clear)
JaspervdG but the whole point of a simplified interface would be that the application shouldn't care about such things
JaspervdG it could perhaps be a setting somewhere ("Automatically try to match literals to URIs)
JaspervdG perhaps we should wait with creating the simplified interface until we have more experience in using the system, perhaps that it will also be clearer how to implement it then

New interface based on schema's(?)

nerochiaro think about this: to do the matching automatically, you have to know that for example Track.Artist refers to Artist.Title (if it's a literal)
nerochiaro "refers" means "we should use those two fields to attempt a mathing"
nerochiaro but...
JaspervdG I'm not sure I follow that completely, why would it matter whether it's a literal or not?
nerochiaro the sentence was not complete ;) wait a sec
nerochiaro ...but if we know that, it means that the metadata can only have one fixed structure - that it have only a predefined set of objects possible. and if it have only a set of possible objects than we can build our UI in a way that forces the URI in the props in the first place
nerochiaro ok. that is it. sorry for the slowness in writing :)
JaspervdG do you mean by using some kind of system to specify what the legal values are for certain properties (or at least ought to be)
JaspervdG ?
nerochiaro mmh, no. i'll try to explain in a different way
nerochiaro let's suppose that MDDBlib is used only by one client
nerochiaro and that the client can only create three types of objects (like the "basic" client i mentioned in one email)
nerochiaro Artist/Track/Album
nerochiaro that client would, of course, do the right thing, and insert URIs where needed.
nerochiaro in that case we would not need automatic matching, right ?
nerochiaro because there would be no need to trasform literals into URIs
nerochiaro now, if instead we have a situation where all kind of objects can exist. how can you idea of automatic matching work, if it does not know in advance what kind of object it will operate on ?
nerochiaro see what i mean ? (i'm can't seem to be able to explain this :()
JaspervdG I see (I think), I'm just thinking of a reply
nerochiaro ok, sorry (that's one of the problem of IRC chats)
JaspervdG mappings could be extended to specify what value to insert, something like this:
JaspervdG   Put(Track.Artist,'Madonna') ->Put(<dc:creator>,//<mm:Artist>[<dc:title>='Madonna'])
JaspervdG alternatively the "vocabulary" of the client could also be restricted beforehand
nerochiaro both options are worth considering, i thin
JaspervdG (the //<mm:Artist> bit is not completely conforming to the XPath syntax we talked about earlier, but it is supposed to mean all nodes of type artist whose title is 'Madonna')
JaspervdG the RDFDB query would be:
JaspervdG   select ?artist where (?artist <rdf:type> <mm:Artist>) (?artist <dc:title>'Madonna')
nerochiaro regarding the // thing. what happens if there is more than one "Madonna" ?
JaspervdG that would be a problem :)
JaspervdG one possible way of dealing with this is to report this in some way to the client application and let it ask the user
nerochiaro that's a valid alternative. somehow i still think that it would be better to tackle this whole problem from an UI perspective. like having an UI that...
nerochiaro ...when editing certain props would prompt a dialog that allows to select among all already defined Artists (or to create new ones)
JaspervdG that would be totally cool! it would probably require some helper functions to provide the client application with possible properties, etc.
nerochiaro yes, but IMHO is the way to go
nerochiaro it's a way to restrict a bit the user freedom, but at the same time to increase a lot the coherence of the DB
JaspervdG perhaps using RDF Schema/OWL isn't such overkill after all, some basic support would help alleviate these kinds of problems
nerochiaro well, i have an idea about this "basic support"
JaspervdG lets hear it
nerochiaro a sec, it's long
nerochiaro (it does not really involve OWL, but still) when creating props, one can choose to create them with the usual "literal" types, or to create them with an "association" type. i'll explain that: when creating these props, the user must also define a class of objects allowed in there (e.g. Artists). Then these props, when edited, will ONLY allow them to pick valid Artists (with the method outlined above) and insert the correct URI in the prop
JaspervdG seems like a reasonable idea
nerochiaro so you say we go down this route ?
JaspervdG fine by me
nerochiaro good.
JaspervdG how do you propose to store these definitions? inside MDDB, in a separate file?
JaspervdG and in what format? it seems rdfs:range would take care of restricting properties to certain types.
nerochiaro mmmh, maybe we should let them be an application specific thing.
JaspervdG why? some functions could relatively easily be made around this system that would allow any application to work with it
nerochiaro i said that because i figured out that it would be complicated to add to the library. but if you have ideas that would make it relatively simple, then i'm all for it
nerochiaro why did you say it would be relatively easy ?
JaspervdG the good thing about something like this is that is extremely easy to do on top of RDFDB, suppose you want to specify that the property <dc:creator> can only have <mm:Artist's> as value, all you need to do is insert the following triple:
JaspervdG   insert (<dc:creator> <rdfs:range> <mm:Artist>)
JaspervdG then if you want to later retrieve all possible values for <dc:creator> you can issue the following query:
JaspervdG   select ?x where (<dc:creator> <rdfs:range> ?y) (?x <rdf:type> ?y)
nerochiaro cool.
nerochiaro i did not think it would be that easy
nerochiaro of course that query is one that is absolutely well suited for the first/next treatement, as it will return a lot of values
JaspervdG I still see the following problems though:
JaspervdG   - specifying that a property should have a bag as value that has specific types as li's would be slightly more complicated (I think it would need an extra type for the bag)
JaspervdG    - if you interpret the rdfs:range and rdfs:domain properties as they are defined in the standard they are a major pain, as the following would mean that <dc:creator> values should be BOTH an mm:Artist and a ex:Painter:
JaspervdG      <dc:creator> <rdfs:range> <mm:Artist>
JaspervdG      <dc:creator> <rdfs:range> <ex:Painter>
JaspervdG    - it's rather difficult to specify something like that the value of a <dc:creator> inside an <mm:Artist> should have a value of a different type than the value of a <dc:creator> in an <mm:Track>, but I don't think this last one is really important
nerochiaro a sec. i'm trying to digest what you just wrote
nerochiaro ...
nerochiaro 1) i would add something to the bag definition that specifies the type of the bag (the problem would be how to query for it)
nerochiaro 2) i am not very coonversant in RDF schema, but i don't think we should follow it all, just the subset we need for our goals
nerochiaro 3) again this is not a subject where i'm really prepared. but if rdf schema don't allow us to do this, we can create something easier that just for use in MDDB
nerochiaro .
JaspervdG ...
JaspervdG 1) a new type of Bag could be made for such purposes (or the bag could have more than one type), the only problem being that it would not be compatible with Musicbrainz for example.
JaspervdG 2) I agree, we could always change to something more conformant later on
JaspervdG 3) I'm not sure whether it would even be needed, also OWL is able to specify something like this, so if it does become necessary we could always use that.
JaspervdG .
nerochiaro ok, i'll answer
nerochiaro oh, compatibility. that is an interesting topic we have not touched yet. with compatibility here you mean compatibility in _importing_ metadata from musicbrainz, right ? or you are talking about complete compatibility of our data format with theirs ?
nerochiaro .
nerochiaro (the other 2 points were ok)
JaspervdG at least exporting (data we generate would not be compatible with their generated data)
JaspervdG importing is more difficult, it would depend on how strict the various MDDB operations are
JaspervdG more difficult to say
JaspervdG .
nerochiaro ...
nerochiaro about exporting. mbrainz have a single specific format for their data. we seem to be aiming at having both custom objects and custom properties. a consequence of this is that one mad user can wipe away the properties used by musicbrainz, and then exporting would not be possible anymore.
nerochiaro unless we enforce a specific set of props that must always exist.
nerochiaro .
nerochiaro like mm:Artist.dc:title
JaspervdG I agree that it probably wouldn't be possible to (always) export "plain" MDDB data to Musicbrainz, so I guess it won't mind much if it needs another transformation
JaspervdG also, such a function could be made as a feature of a specific application, as not every application will need it
JaspervdG .
nerochiaro you hit the nail on head here. exporting/importing seems to me something that should be kept application specific. at most we can write in/exp "plugins", but nothing of that should go into MDDBlib, imho
nerochiaro Also, when writing the previous sentences about exporting, another problem popped up in my head. if we allow custom properties, the users will want to create them with friendly_names. but we should use the same friendly names in the RDF, maybe attaching to them a namespace like
nerochiaro .
nerochiaro the last was a question, forgot the "?"
nerochiaro  ;)
JaspervdG that would certainly be a good idea, although I don't know if creating such a "real" looking URI for that purpose would be right, I'd rather use a blank node or something terribly obvious (like local:someprop)
nerochiaro yeah, it was just an example, of course
nerochiaro good. so, are there any other topics to discuss you can think about ?
JaspervdG do you have any requests as to what I should implement first?
nerochiaro well, i would like to start creating the test UI we talked about before to navigate the metadata
nerochiaro so i will need a way to retrieve: props of current node, list of parent nodes, list of child nodes
JaspervdG then I'll try to implement those first
JaspervdG btw, did you succeed in loading test data into RDFDB?
nerochiaro yes. it worked. as i said in email i was doing stupid things myself, and RDFDB was working good.
nerochiaro i have imported data from musicbrainz without problems
nerochiaro again, sorry if i made you waste time looking for am inexistent problem
JaspervdG I didn't have look too well, as I had spent quite some time in that area (I knew where to look)
JaspervdG alright, then lets call it a day (night/evening) and I'll mail you when I have implemented something