Below is a transcript of the discussion that took place on friday evening (March 20, 2004). Unnecessary data has been removed from the transcript (log on, log off, etc.) and it has been divided into a few sections for easier reading.
nerochiaro |
i was reading your last email, instead |
nerochiaro |
tell me a little bit how you plan to go on with that xpath like syntax |
nerochiaro |
i think it can be very useful, if done right |
JaspervdG |
if possible I'd like to be able to point to specific nodes |
JaspervdG |
but I'm in still thinking of the best syntax, |
JaspervdG |
one of the problems is that with RDF relations aren't simply child, parent, sibling, etc. but can instead be anything |
JaspervdG |
which includes backlinks (which seem logical to do with something like parent) |
nerochiaro |
yes, backlinks are similar to parent, but that similarity breaks when you have multiple parents |
nerochiaro |
at least i think |
nerochiaro |
any ideas on how to handle that ? |
JaspervdG |
it could simply return more than one parent, just like child can "return" more than one child |
JaspervdG |
I also saw that XPath 2 has some nice features we could use, especially the following syntax: |
JaspervdG |
something/(somethine_else|another thing)/something |
JaspervdG |
which would match both something/somethine_else/something and something/another thing/something |
nerochiaro |
like in regular expression |
JaspervdG |
yes, if you could do something like <dc:creator>/(.|<rdf:li>)/<dc:title>, that would be great |
nerochiaro |
seems useful indeed |
nerochiaro |
you are planning to write some kind of parser for xpath expressions inside MDDBlib, then, or you think to use something already made |
JaspervdG |
I think I'll have to write something new, as the syntax will probably be slightly different (to deal with more than one parent for example) |
JaspervdG |
I'll keep it simply for the time being though |
nerochiaro |
ok, in this initial phase i think we should focus on something basic |
JaspervdG |
I want to at least support backlinks and some way of specifying specific nodes (using a URI somewhere probably) |
nerochiaro |
now, for the "simplified" interface to navigate the graph, we have already laid out some function right ? |
JaspervdG |
the MDDB "file management" functions (still somewhat shaky) |
JaspervdG |
ML_IdentifyFile, should try to do as little as possible, returns a list of possible identifications? |
JaspervdG |
ML_AddFileToDB, should try to gather information about the file and assign a (given?) LUID to it? |
JaspervdG |
Should we have something in between those two that tries to gather information and then let ML_AddFileToDB just handle binding the LUID to the file? |
JaspervdG |
which simplified interface? the "really" simplified interface (using mappings)? |
JaspervdG |
if so, I don't think we have defined any functions for that yet |
JaspervdG |
any ideas? |
nerochiaro |
well, if we use "pseudo xpaths" for identifying nodes, we can have something as simple as AddMapping(xpath, friendly_path) and Translate(friendly_path) (and of course the Edit/Delete functions) ... friendly_mapping is something like Author.Name or Track.Title |
JaspervdG |
as long as they don't allow "updating" the current that would probably work |
JaspervdG |
the only problem would be how to handle containers in that case, as the simplified interface could easily be mixed with more sophisticated interfaces |
nerochiaro |
well, since they can contain objects of different types, that is indeed a problem |
nerochiaro |
if they can only contain one kind of object that would be easier |
JaspervdG |
the function handling the mappings should also be able to more or less automatically create properties that point to URIs rather than literals if applicable (artists for example) |
JaspervdG |
which would be far from trivial |
nerochiaro |
wait, if a property contains a literal, that would stay that way. properties are user-defined, and if one chooses to define a prop as a literal, shouldn't we leave it that way (unless asked otherwise) ? |
JaspervdG |
surely, but a simple client probably won't know about URIs, while a user using MDDB probably does want to use them (especially if that user also has other software that can handle them) |
JaspervdG |
it would be a major pain if one application would set an artist property to a literal while a different application would set it to a URI |
JaspervdG |
one way of dealing with this would be to restrict the possible value types for properties by using RDF Schema, OWL or some kind of custom schema-like system |
JaspervdG |
but I doubt that would be the preferred way of dealing with it |
nerochiaro |
in my opinion, if we give users the possibility to create custom classes of objects, then we should let the users create properties that refer to these objects |
nerochiaro |
it's not necessary that they "know" about URIs... |
nerochiaro |
...we can simply create a function that, given a node and a propery on another node creates the association |
nerochiaro |
that is, putting the URI in the prop instead of a literal |
nerochiaro |
of course it's the user's responsibility to call that function instead of inserting the literal |
nerochiaro |
(not sure if that explanation was clear) |
JaspervdG |
but the whole point of a simplified interface would be that the application shouldn't care about such things |
JaspervdG |
it could perhaps be a setting somewhere ("Automatically try to match literals to URIs) |
JaspervdG |
perhaps we should wait with creating the simplified interface until we have more experience in using the system, perhaps that it will also be clearer how to implement it then |
nerochiaro |
think about this: to do the matching automatically, you have to know that for example Track.Artist refers to Artist.Title (if it's a literal) |
nerochiaro |
"refers" means "we should use those two fields to attempt a mathing" |
nerochiaro |
but... |
JaspervdG |
I'm not sure I follow that completely, why would it matter whether it's a literal or not? |
nerochiaro |
the sentence was not complete ;) wait a sec |
nerochiaro |
...but if we know that, it means that the metadata can only have one fixed structure - that it have only a predefined set of objects possible. and if it have only a set of possible objects than we can build our UI in a way that forces the URI in the props in the first place |
nerochiaro |
ok. that is it. sorry for the slowness in writing :) |
JaspervdG |
do you mean by using some kind of system to specify what the legal values are for certain properties (or at least ought to be) |
JaspervdG |
? |
nerochiaro |
mmh, no. i'll try to explain in a different way |
nerochiaro |
let's suppose that MDDBlib is used only by one client |
nerochiaro |
and that the client can only create three types of objects (like the "basic" client i mentioned in one email) |
nerochiaro |
Artist/Track/Album |
nerochiaro |
that client would, of course, do the right thing, and insert URIs where needed. |
nerochiaro |
in that case we would not need automatic matching, right ? |
nerochiaro |
because there would be no need to trasform literals into URIs |
nerochiaro |
now, if instead we have a situation where all kind of objects can exist. how can you idea of automatic matching work, if it does not know in advance what kind of object it will operate on ? |
nerochiaro |
see what i mean ? (i'm can't seem to be able to explain this :() |
JaspervdG |
I see (I think), I'm just thinking of a reply |
nerochiaro |
ok, sorry (that's one of the problem of IRC chats) |
JaspervdG |
mappings could be extended to specify what value to insert, something like this: |
JaspervdG |
Put(Track.Artist,'Madonna') ->Put(<dc:creator>,//<mm:Artist>[<dc:title>='Madonna']) |
JaspervdG |
alternatively the "vocabulary" of the client could also be restricted beforehand |
nerochiaro |
both options are worth considering, i thin |
JaspervdG |
(the //<mm:Artist> bit is not completely conforming to the XPath syntax we talked about earlier, but it is supposed to mean all nodes of type artist whose title is 'Madonna') |
JaspervdG |
the RDFDB query would be: |
JaspervdG |
select ?artist where (?artist <rdf:type> <mm:Artist>) (?artist <dc:title>'Madonna') |
nerochiaro |
regarding the // thing. what happens if there is more than one "Madonna" ? |
JaspervdG |
that would be a problem :) |
JaspervdG |
one possible way of dealing with this is to report this in some way to the client application and let it ask the user |
nerochiaro |
that's a valid alternative. somehow i still think that it would be better to tackle this whole problem from an UI perspective. like having an UI that... |
nerochiaro |
...when editing certain props would prompt a dialog that allows to select among all already defined Artists (or to create new ones) |
JaspervdG |
that would be totally cool! it would probably require some helper functions to provide the client application with possible properties, etc. |
nerochiaro |
yes, but IMHO is the way to go |
nerochiaro |
it's a way to restrict a bit the user freedom, but at the same time to increase a lot the coherence of the DB |
JaspervdG |
perhaps using RDF Schema/OWL isn't such overkill after all, some basic support would help alleviate these kinds of problems |
nerochiaro |
well, i have an idea about this "basic support" |
JaspervdG |
lets hear it |
nerochiaro |
a sec, it's long |
nerochiaro |
(it does not really involve OWL, but still) when creating props, one can choose to create them with the usual "literal" types, or to create them with an "association" type. i'll explain that: when creating these props, the user must also define a class of objects allowed in there (e.g. Artists). Then these props, when edited, will ONLY allow them to pick valid Artists (with the method outlined above) and insert the correct URI in the prop |
JaspervdG |
seems like a reasonable idea |
nerochiaro |
so you say we go down this route ? |
JaspervdG |
fine by me |
nerochiaro |
good. |
JaspervdG |
how do you propose to store these definitions? inside MDDB, in a separate file? |
JaspervdG |
and in what format? it seems rdfs:range would take care of restricting properties to certain types. |
nerochiaro |
mmmh, maybe we should let them be an application specific thing. |
JaspervdG |
why? some functions could relatively easily be made around this system that would allow any application to work with it |
nerochiaro |
i said that because i figured out that it would be complicated to add to the library. but if you have ideas that would make it relatively simple, then i'm all for it |
nerochiaro |
why did you say it would be relatively easy ? |
JaspervdG |
the good thing about something like this is that is extremely easy to do on top of RDFDB, suppose you want to specify that the property <dc:creator> can only have <mm:Artist's> as value, all you need to do is insert the following triple: |
JaspervdG |
insert (<dc:creator> <rdfs:range> <mm:Artist>) |
JaspervdG |
then if you want to later retrieve all possible values for <dc:creator> you can issue the following query: |
JaspervdG |
select ?x where (<dc:creator> <rdfs:range> ?y) (?x <rdf:type> ?y) |
nerochiaro |
cool. |
nerochiaro |
i did not think it would be that easy |
nerochiaro |
of course that query is one that is absolutely well suited for the first/next treatement, as it will return a lot of values |
JaspervdG |
I still see the following problems though: |
JaspervdG |
- specifying that a property should have a bag as value that has specific types as li's would be slightly more complicated (I think it would need an extra type for the bag) |
JaspervdG |
- if you interpret the rdfs:range and rdfs:domain properties as they are defined in the standard they are a major pain, as the following would mean that <dc:creator> values should be BOTH an mm:Artist and a ex:Painter: |
JaspervdG |
<dc:creator> <rdfs:range> <mm:Artist> |
JaspervdG |
<dc:creator> <rdfs:range> <ex:Painter> |
JaspervdG |
- it's rather difficult to specify something like that the value of a <dc:creator> inside an <mm:Artist> should have a value of a different type than the value of a <dc:creator> in an <mm:Track>, but I don't think this last one is really important |
nerochiaro |
a sec. i'm trying to digest what you just wrote |
nerochiaro |
... |
nerochiaro |
1) i would add something to the bag definition that specifies the type of the bag (the problem would be how to query for it) |
nerochiaro |
2) i am not very coonversant in RDF schema, but i don't think we should follow it all, just the subset we need for our goals |
nerochiaro |
3) again this is not a subject where i'm really prepared. but if rdf schema don't allow us to do this, we can create something easier that just for use in MDDB |
nerochiaro |
. |
JaspervdG |
... |
JaspervdG |
1) a new type of Bag could be made for such purposes (or the bag could have more than one type), the only problem being that it would not be compatible with Musicbrainz for example. |
JaspervdG |
2) I agree, we could always change to something more conformant later on |
JaspervdG |
3) I'm not sure whether it would even be needed, also OWL is able to specify something like this, so if it does become necessary we could always use that. |
JaspervdG |
. |
nerochiaro |
ok, i'll answer |
nerochiaro |
oh, compatibility. that is an interesting topic we have not touched yet. with compatibility here you mean compatibility in _importing_ metadata from musicbrainz, right ? or you are talking about complete compatibility of our data format with theirs ? |
nerochiaro |
. |
nerochiaro |
(the other 2 points were ok) |
JaspervdG |
at least exporting (data we generate would not be compatible with their generated data) |
JaspervdG |
importing is more difficult, it would depend on how strict the various MDDB operations are |
JaspervdG |
more difficult to say |
JaspervdG |
. |
nerochiaro |
... |
nerochiaro |
about exporting. mbrainz have a single specific format for their data. we seem to be aiming at having both custom objects and custom properties. a consequence of this is that one mad user can wipe away the properties used by musicbrainz, and then exporting would not be possible anymore. |
nerochiaro |
unless we enforce a specific set of props that must always exist. |
nerochiaro |
. |
nerochiaro |
like mm:Artist.dc:title |
JaspervdG |
I agree that it probably wouldn't be possible to (always) export "plain" MDDB data to Musicbrainz, so I guess it won't mind much if it needs another transformation |
JaspervdG |
also, such a function could be made as a feature of a specific application, as not every application will need it |
JaspervdG |
. |
nerochiaro |
you hit the nail on head here. exporting/importing seems to me something that should be kept application specific. at most we can write in/exp "plugins", but nothing of that should go into MDDBlib, imho |
nerochiaro |
Also, when writing the previous sentences about exporting, another problem popped up in my head. if we allow custom properties, the users will want to create them with friendly_names. but we should use the same friendly names in the RDF, maybe attaching to them a namespace like http://MDDB.org/custom-prop/ |
nerochiaro |
. |
nerochiaro |
the last was a question, forgot the "?" |
nerochiaro |
;) |
JaspervdG |
that would certainly be a good idea, although I don't know if creating such a "real" looking URI for that purpose would be right, I'd rather use a blank node or something terribly obvious (like local:someprop) |
nerochiaro |
yeah, it was just an example, of course |
nerochiaro |
good. so, are there any other topics to discuss you can think about ? |
JaspervdG |
do you have any requests as to what I should implement first? |
nerochiaro |
well, i would like to start creating the test UI we talked about before to navigate the metadata |
nerochiaro |
so i will need a way to retrieve: props of current node, list of parent nodes, list of child nodes |
JaspervdG |
then I'll try to implement those first |
JaspervdG |
btw, did you succeed in loading test data into RDFDB? |
nerochiaro |
yes. it worked. as i said in email i was doing stupid things myself, and RDFDB was working good. |
nerochiaro |
i have imported data from musicbrainz without problems |
nerochiaro |
again, sorry if i made you waste time looking for am inexistent problem |
JaspervdG |
I didn't have look too well, as I had spent quite some time in that area (I knew where to look) |
JaspervdG |
alright, then lets call it a day (night/evening) and I'll mail you when I have implemented something |