-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF-star for contextualizing historical assertions #12
Comments
@tla Thanks for all the information. I'm going to create a Wiki page that will hold current information about your use case. A couple of questions first, though. Do you have a pointer to a good but not too long description of CIDOC-CRM that shows the properties in your diagram? To go along with this do you have a pointer to the full CIDOC-CRM ontology that you are using? I can see two ways of going with your use case. Either you do something like labelled property graphs, where there are assertions (that conflict in some sense) and the sources are annotations on these assertions, or you treat everything as unasserted, for example as you might do if you turned everything into RDF reified statements. Do you know which way you want to go here or do you not care? |
@pfps I am not sure I have ever seen a succinct description of CIDOC-CRM, and I am fairly certain that one describing our use of it doesn't exist; the website is here and the version we are using is 7.1.2 (documentation and RDFS file). The main thing is that it is a heavily event-based model, originally for representing the holdings of museums and galleries etc. and extended to cover some limited data about people and places. We are leaning heavily on the event-based nature with our use of the Concerning your second question, I don't currently have an opinion, though I also think the implications of either approach are not yet clear to me. |
I was planning on updating that wiki page with a bit more information, but I haven't had time to get back to it. |
@tla I'm trying to provide information on CIDOC-CRM for readers of the use case. This is going fine. I'm also trying to convert your diagram into RDF. This is not going so well as I am running into a couple of problems. I'm converting the ovals in the circles to rdf:type relationships, which works out OK for most but it seems to me that the relationship between "assigned gender" and E13 should be a subclass relationship not an instance relationship. I don't understand why the P177 links go to P42 and P41. Shouldn't the P177 links for "assigned gender" entities go to a gender relationship and the links for "had subject" go to a subject relationship? |
@pfps To understand the diagram there, it is also important to know that we are modeling gender assignment as an event that happens in the life of a person, as opposed to being a static attribute of the person. This allows us to handle cases where, for example, a person was born male and (as in the diagram) may or may not have been castrated and thus become a eunuch. If that is too confusing for the wiki, though, I attach another example here, which records a dispute about the date of death of a Norman mercenary called Hervé Frangopoulos. Maybe that is more straightforward. |
@tla I would like to have both, if possible. This one looks simpler so I'll take a look at it now. |
@tla Do you have either of these written down using triples? Alternatively, is there a definition of the graphical language you are using? |
@pfps @tla The Erlangen CRM files might be helpful: http://www.erlangen-crm.org/ They are the most well known OWL implementations of CIDOC CRM. WissKI for example uses those files. Maybe these OWL ontologies can be improved by using RDF-star ? |
Here is the gender assignment one as triples. I haven't done the Frangopoulos death one yet, but will soon.
|
@tla the column "RDFS" on the wiki page is wrong . |
2 similar comments
@tla the column "RDFS" on the wiki page is wrong . |
@tla the column "RDFS" on the wiki page is wrong . |
@tla the column "RDFS" on the wiki page is wrong . |
@VladimirAlexiev E13 and E17 are not wrong, just not the model you are expecting. Here E13 is the claim made by the author, and E17 is the event that actually happened where the person's gender was recognised by the people around them. All the statements in our database are E13 objects, with the original subject/predicate/object tied to them; in many of these cases, the subject is an event such as a gender assignment or a death. You can find out more from this presentation; note that we changed the 'source' relation to P17 from P70i. To follow the diagram:
In both these cases, the E17 event would have a different agent, e.g. the emperor (and not Michael the Syrian) is the agent responsible for the castration having happened. As for the wiki, I'm not sure who is responsible for that page - @pfps ? @afs ? |
Aren't you overcomplicating things? |
@VladimirAlexiev I do not find this quote:
|
Within the scope of the project and given its data sources and goals, no, I am not overcomplicating things. |
@VladimirAlexiev I'm the one putting the Wiki page together for now. Right now it has some of my guesses. What changes should be made there? It's fine for anyone to add information to the page or make corrections, as long as they realize that the page is supposed to be a consensus view and may be modified to serve the purposes of the working group. This issue is where discussion of the page can play out. |
@tla OK, I think I now can see what is going on, mostly. What is the relationship between Anna Komnene and Alexiad_VI_8? Is she the author of the literary object? |
@tla What would happen if the castration was only the cutting of the vas deferens, rendering Konstantinos infertile, and later reversed? Would the same Gender_Assignment_Male be used or would there be a separate type assignment? |
@tla My guess is that a simpler representation would also be consistent with CIDOC-CRM, where there are no attribute assignments to the gender assignments and instead direct links to, e.g., Konstantine and Gender_Male. If this is correct, I'll add the simpler case as a separate Wiki page. |
@pfps To answer in order:
Yes - this is clarified with more assertions, which are left out of the example.
Insofar as these days a vasectomy is common and doesn't change the gender designation of the person it happens to, the second gender assignment would never have taken place at all. In 11th-century Byzantium, on the other hand, becoming a eunuch was irreversible and did change one's role in society. The gender is about the social role rather than the biology, since in the vast majority of cases we don't have access to specific information about the biology or genetics of 11th-century people. To answer the question I think you're asking, though: since a gender assignment is an event in the CIDOC-CRM, every change in designation would be a new gender assigment.
Yes, the simpler case is entirely possible, and I can paste an example here of another famous Byzantine eunuch (which I made for a different context). However, there is then no need for RDF* to represent it...
|
Thanks. I'll work on updating the Wiki page. |
@tla I modified your RDF and put it in the wiki page. I changed the relationships between the genders and Gender, as I think it should be an instance relationship instead of a subclass. I changed the P42 links to P141 to match the diagram. After thinking about this use case for a while, I'm a bit puzzled exactly where quoted triples would help. As far as I can tell your need is to have a way of requiring that the that the object of a crm:P141_assigned triple with subject like ex:Anna_Assertion_B belongs to ex:Gender. But I don't see what role quoted triples would play. Instead what I think is needed is a new class for gender assignments with an OWL property restriction. |
I also created https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-CIDOC-CRM-events to show how quoted triples could be used with your simpler events. |
Hi, thanks for the update. In fact, in the latest version of our data model, we do have a whole lot of subclasses of E13, one per predicate we use (where P41 and P42 are two of these predicates, and The point of quoted triples would not be to use them with this particular data model, but to change our data model so that we can make the original triples directly, e.g.
Then we would be able to ditch this layer of indirection where every triple is part of an E13 Attribute Assignment. |
(Note that |
I think that one way to do all this would be to start with an actual domain triple as an instance of CIDOC-CRM type assignment and hang all the support stuff off that, as in:
This is the approach I put together in https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-CIDOC-CRM-events. One problem here is that the provenance information is only associated with the triple, not the other pieces of information. I suppose that it could be repeated on them. An advantage of this representation is that if the quoted triple is current then it can also be asserted. A significant problem here is how to represent two events that give rise to the same quoted triple. The other problem is that the provenance is four the subject and object (and predicate) all at once. To be able to split these up requires something like what you did, perhaps like the following. (The {| |} syntax puts triples on the quoted version of an asserted triple.)
This eliminates the problem with two events that give rise to the same quoted triple but replaces it with two different supports for parts of an event. Comments? |
The {| |} syntax has a usability problem: it might be misunderstood as referring to an instance/occurrence of a triple instead of the type - a problem already present with the quoted triple syntax, but even more suggestive with the shorthand syntax. I think it would be better to investigate the modelling that the CG report suggests - introducing an occurrence identifier via an :occurrenceOf relation - first and only then explore the effects of syntactic sugar-coating. Also I wonder if the RDF reification vocabulary could be used to refer to individual terms in a quoted triple, like
EDIT: ahem, RDF reification of course does refer to an instance/occurrence, not the type. So that makes this idea directly applied to quoted triples invalid. But maybe not on occurrences, as discussed in the first paragraph, e.g.
|
@tla Please take a look at https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-contextualizing-historical-assertions, particularly the last two pieces and let me know whether you agree with my casting of your use case. |
@pfps — Can you make the UCR wiki pages editable? Or move them to git-managed documents, such that we can make PRs (and change requests on same)? Of immediate interest is a typo fix, from |
I don't seem to have the ability to change settings for the UCR repository. In any case, you should be able to create PRs for the wiki pages (unless this is also restricted). See https://docs.github.com/en/communities/documenting-your-project-with-wikis/adding-or-editing-wiki-pages for more information. |
Yes this looks right to me, thanks! Apologies for the delayed reply. |
A couple more questions. Is the form of literals important, i.e., does it matter whether a number is specified at "7"^^xsd:int or "7"^^xsd:byte? Are the actual identifiers important? For example, one source might use ex:Ioannes_68 and another ex2:Ioannes_79. If it is known that these two identifiers refer to the same person is it reasonable to merge the information onto one identifier or is it necessary to keep two identifiers and have a relationship (like known-to-be-the-same) between them? |
Only insofar as the underlying ontology is respected.
In the data I'm working with for this use case, we are assigning our own identifiers and linking them to identifiers used in other data collections. For the example I was just taking over the (more readable) identifiers from the biggest of these collections that we are using. In general they will all be disambiguated though, so all the information about Ioannes 68 would be associated with the one identifier. |
What does that mean? Does it matter for the underlying ontology? Is the ontology differentiating between forms of literals in some cases but not in others?
The last sentence seems contradictory to me. One interpretation would be that information about a person identified as Ioannes_68 would not be merged with data linked to other identifiers for the same person, although those identifiers are identifying the same person, meaning that the identifiers are used to disambiguate data coming from different collections. Another interpretation would be that all data about co-denoting identifiers is merged, because it is all about the same person. Which one is it? |
I think it would help me to answer if I knew why these questions are being asked. The example was based purely on the CIDOC-CRM ontology, but in the larger project we are incorporating other existing ontologies where we need to. I can subject you all to an exhaustive explanation of all the technical underpinnings of the whole project, including when and where the type of literals might be important and how exactly we are managing disambiguation across multiple data sets, but it's not clear to me what this example use case will gain from such long-winded documentation. |
The idea is for all the information to be merged under our own identifiers (which are UUIDs and don't look like "Ioannes 68"); some of the information under our UUID will be the information that "this person is identified in the Byzantine prosopography as <Ioannes 68>" and "this person is identified in VIAF as http://viaf.org/viaf/ <some number>". |
Maybe that's indeed not necessary and it is sufficient to know that the answer always is: "It depends". But I'll let @pfps continue this conversation. |
There is an issue on whether the syntactic form of literals should matter in quoted triples. That is, does
mean something different from
in the semantics of RDF (when xsd:integer is a supported datatype). Note that
means the same as
in the semantics of RDF (when xsd:integer is a supported datatype). So the question is whether the exact form of literals matters in RELEVEN. There is a similar issue with IRIs, but you said that RELEVEN only has a single identifier for an object. |
I question whether this level of detail is mandatory for this (or any) use case, given that it already exists in one or more use cases. I do not believe we intend to radically rewire RDF, such that the meaning of the triples In other words, for RDF-star (or RDF 1.2) Use Cases, we should be concentrating on scenarios which are not (easily, reasonably, cost-effectively) achievable with RDF 1.1; not on reworking scenarios which underpinned RDF 1.0 or 1.1. If a use case for RDF 1.0 or 1.1 cared about the exact form of literals, we must ensure that RDF 1.2 (or RDF-star) supports such differentiation. We need not find further justification for it. |
I see - I can't think of any reason we would care about the exact form of literals as in this example. Your initial question seemed to be asking whether we care about data types; the answer is yes in most cases, but I think we will stick to a single consistent datatype for each of whole numbers, decimal numbers, strings, etc., so I can't think of a reason we would ever have both |
See https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-contextualizing-historical-assertions for the current status of this use case.
Contact information
Tara Andrews, University of Vienna
[email protected]
Brief Description of your use case:
In the RELEVEN project (https://releven.univie.ac.at/) we are developing a data model for recording contextual information about triples, so that users of our datastore can understand who has asserted the information in the triples and what they are basing these assertions on. (We actually call this the STAR model, for STructured Assertion Record, since the project proposal was submitted in February 2020 and we hadn't heard of RDF* yet...)
What you want to be able to do:
The motivation is to record information about historical figures even when the information we have is contradictory and cannot be definitively resolved one way or another (which, after all, frequently happens), and to be able to do this without causing naive validation errors in ontology-based software.
Why it is hard or impossible to do what you want to do without quoted triples:
At the moment our model is based on regular RDF, and specifically on the CIDOC-CRM entity
E13 Attribute Assignment
. The subject and object of the original triple become the objects of relationshipsP140 assigned attribute to
andP141 assigned
respectively, and the predicate is reified as anE55 Type
to become the object of relationshipP177 assigned property type
. We can then use the predicatesP14 carried out by
to indicate who is responsible for the assertion, andP17 was motivated by
to indicate the evidence for the assertion (e.g. a text passage, an inscription on an object, or even another assertion).While our current approach works fine for representing the data, we don't have a good way to validate the content of the attribute assignments - that is, to make sure that the subject, reified predicate, and object conform to the specification of the ontology. The range of the predicates P140 and P141 is sent intentionally broadly (to
E1 CRM Entity
) and the range of P177 can likewise be any reified predicate. For the time being we have to handle this in the application logic, taking care not to allow users to create assertions of invalid triples.What is the role of RDF-star quoted triples in your use case:
The role of quoted triples would be to allow us to have the validation on the base triples, and still be able to attach the information about authority and context that we currently express via P14 and P17 properties.
How you want quoted triples to behave in your use case:
I don't understand this question well enough to be able to answer.
An example RDF graph that shows part of your use case:
The attached image shows a pair of gender assignments of Konstantinos Doukas; according to Anna Komnene he was male (presuambly from birth) but according to Michael the Syrian, he was castrated (thus assigned to the eunuch category) sometime during the reign of the emperor Botaneiates.
The text was updated successfully, but these errors were encountered: