Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to target direct class (without subclasses) : "sh:targetDirectClass" #168

Open
tfrancart opened this issue Jan 17, 2025 · 7 comments
Labels
Core For SHACL 1.2 Core spec

Comments

@tfrancart
Copy link

tfrancart commented Jan 17, 2025

This is a proposal to introduce sh:targetDirectClass to express that a shape targets instances of a particular class without looking for the subclasses.

This would solve modelling deadlocks which I encountered at least 3 times while designing SHACL specs over the years.

Consider the following graph containing the classes A > B > C, each with 2 instances, and a few properties (A has P1, B has P2, C has P3):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ex: <https://data.exemple.fr/>.

ex:A a rdfs:Class.
ex:B a rdfs:Class;
	rdfs:subClassOf ex:A .
ex:C a rdfs:Class;
	rdfs:subClassOf ex:B .

ex:P1 a owl:DatatypeProperty .
ex:P2 a owl:DatatypeProperty .
ex:P3 a owl:DatatypeProperty .

ex:A1 a ex:A ;
	ex:P1 "A1 has P1" .
ex:A2 a ex:A .
	ex:P1 "A2 has P1" .

ex:B1 a ex:B ;
	ex:P2 "B1 has P2" .
ex:B2 a ex:B .
	ex:P2 "B2 has P2" .

ex:C1 a ex:C ;
	ex:P3 "C1 has P3" .
ex:C2 a ex:C .
	ex:P3 "C2 has P3" .

I need to encode the structure of this graph in SHACL to ensure its consistency. I need to create shapes that target A, B and C, and I need to close the shapes. Please note that instances of B use property P2, while instances of the subclass C use property P3.

If I target class B with sh:targetClass, then C1 and C2 will be targeted, as per the semantics of sh:targetClass that follows rdfs:subclassOf. Thus if shape B only has property P2 (as per the structure of the graph), and I mark shape B as sh:closed, then I will get violations for C1 and C2, because they don't have P2. Furthermore if I give a minCount=1 to P2 on B, then I will get violations for C1 and C2:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix sh:   <http://www.w3.org/ns/shacl#>.
@prefix ex: <https://data.exemple.fr/>.

ex:ShapeB a sh:NodeShape ;
  sh:targetClass ex:B ;
  sh:property [
    sh:path ex:P2 ;
    # This is what I want to write, but I will get violations for C1 and C2
    sh:minCount 1 ;
  ] ;
  # This is what I want to write, but I will get violations for C1 and C2
  sh:closed true ; sh:ignoredProperties (rdf:type) ;
.

Sure I can write a SPARQL target as a workaround, but then I loose the targetClass as a structured target in my shapes file, plus this is shacl-af, plus it seems unnecessarily complex:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix sh:   <http://www.w3.org/ns/shacl#>.
@prefix ex: <https://data.exemple.fr/>.

ex:ShapeB a sh:NodeShape ;
  sh:target [
    sh:select "SELECT ?this WHERE { ?this a <https://data.exemple.fr/B>.}"
  ]
  sh:property [
    sh:path ex:P2 ;
    sh:minCount 1 ;
  ] ;
  sh:closed true ; sh:ignoredProperties (rdf:type) ;
.

It seems to me that in this particular case, having a target specification that only targets B without the instances of its subclasses would solve my issue.

FYI, B is FRBRoo Work, C is FRBRoo ComplexWork, P2 is "is_realized_by" (pointing to Expression), and ComplexWorks don't have expressions.

@ajnelson-nist
Copy link

First, a quick typo check - is your definition of ex:C2 spelled as you intend?

# [...]
ex:C1 a ex:C ;
	ex:P3 "C1 has P3" .
ex:C2 a ex:B .
	ex:P3 "C2 has P3" .

Second: Unfortunately, I think you're in a natural consequence of sh:closed and subclasses. I think sh:closed is generally incompatible with any class that is not designed as a "leaf" class in the subclass tree, intended to never be subclassed or specialized further with new properties available only to some subclass.

By saying that all C's are B's, then SHACL's baked-in subclass hierarchy review will always cause your ex:ShapeB to target all of your ex:Cs, in the form not using sh:target. With RDFS or OWL entailment, then you also end up with the entailed type triple assigning ex:B, so the shape form using sh:target would also catch your (pre-entailment) ex:C's.

Have the ontologies where you've encountered this issue been reviewed for satisfiability? That is, if each B must have a P2 value, and each C is a B that is defined as never having any P2 values, then C is not instantiable (i.e., can never have an individual in the class---it can't satisfy both C and the parent class B).
If the ontology is encoded in OWL, I think this would be flagged as an unsatisfiable class by available tooling, from just review of the TBox (no owl:NamedIndividuals should be necessary).
If the ontology is encoded in just SHACL, then instantiating each class with a test individual should show you an analogous unattainable individual, as you encountered in your write-up. I don't recall that SHACL uses a specific term for this.

(Note: I am not personally reviewing the noted ontology, which I happen to not have heard of before. I described the above as a general condition any ontology might find itself in.)

I think it would be problematic to try revising sh:closed in SHACL-Core to support this use case, due to the interactions with subclassing and entailment that SHACL has been designed to use.

I think this use case would be better resolved by reviewing the subject ontology's class design; OR, removing sh:closed and instead using an sh:xone constraint, with one branch of the exactly-one being the property shape suggested, and the other being a class constraint check. E.g.:

ex:ShapeB-for-ex-P2-xor-C
  a sh:NodeShape ;
  sh:targetClass ex:B ;
  sh:xone (
    [
      a sh:NodeShape ;
      sh:property [
        sh:path ex:P2 ;
        sh:minCount 1 ;
      ] ;
    ]
    [
      a sh:NodeShape ;
      sh:class ex:C ;
    ]
  ) ;
.

@HolgerKnublauch
Copy link
Contributor

FWIW in case we decide to follow point 1 of #215 this could be expressed using

ex:ShapeB
    sh:targetExpression [
        sh:inversePath rdf:type ;
        sh:nodes <https://data.exemple.fr/B> ;
    ] ;

The interpretation is that the value of sh:targetExpression is a node expression, and (in the future) path expressions such as sh:inversePath would count as node expressions too, and accept an optional parameter sh:nodes to specify the focus node(s). A SHACL 1.2 processor would evaluate this expression as "?instance rdf:type https://data.exemple.fr/B".

The benefit here would be that we don't need hard-coded new target types for specific cases such as direct instances, but rather have a generic solution that uses a unified mechanism throughout. The downside, of course, is that the above is a little bit more verbose.

@tfrancart
Copy link
Author

@ajnelson-nist thanks for your answer

I think sh:closed is generally incompatible with any class that is not designed as a "leaf" class in the subclass tree, intended to never be subclassed or specialized further with new properties available only to some subclass.

Yes, and this is what I would like to see fixed. However note that the problem is not limited to sh:closed, but also to the semantics of sh:targetClass, for the use-case I give with sh:minCount.

Have the ontologies where you've encountered this issue been reviewed for satisfiability? That is, if each B must have a P2 value, and each C is a B that is defined as never having any P2 values, then C is not instantiable (i.e., can never have an individual in the class---it can't satisfy both C and the parent class B).
If the ontology is encoded in OWL, I think this would be flagged as an unsatisfiable class by available tooling, from just review of the TBox (no owl:NamedIndividuals should be necessary).

This is where the misunderstanding resides : the ontology does not state that "each B must have a P2", nor does it states that "each C is a B that is defined as never having any P2 values". It is the system that implements this ontology that works in such a way that, in its graph, every instance of B has P2, while every instance of C does not. And this is what I need to capture. I am not trying to use SHACL to model the knowledge domain in an abstract way, I am using SHACL to encode the particular implementation of that ontology inside a specific system (writing an application profile). And the current SHACL semantics of sh:targetClass simply prevents me to do so (I think)

I think this use case would be better resolved by reviewing the subject ontology's class design

Unfortunately this is not possible. The ontology is out of my hands. As I wrote, my need is to encode how the ontology is applied.

To put it differently : what I need is to design a "relational-DB-like", flat, model of the knowledge graph used in that particular system. I don't care about the hierarchy of the classes in the ontology. This is why I would like to have the possibility to target direct instances of particular classes.

OR, removing sh:closed

My goal is precisely to encode the structure of the graph to prevent wrongly placed properties. So I need sh:closed, definitely.

and instead using an sh:xone constraint

Thanks for the suggestion, I will give it a try. I have more than a single subclass in my real world model, so this might become complicated, but I will definitely give it a try.

@ajnelson-nist
Copy link

@tfrancart Thank you for the discussion on this. I'm glad we seem to mostly align on the background points, and thank you for going through assumption-checking with me and my overexplaining-just-in-case.

Thank you for describing that the concerns seem to be at the "database" level, rather than the ontology level. I agree that this is a good general space for sh:closed. My concerns about sh:closed came from discussions I'd had previously on whether it'd be appropriate in a data model that intended for "downstream" modeling efforts. Those concerns don't seem to apply in your reported case.

I think your need is in a spot that SHACL carves a space out for, but not with a direct name. There is support---or the start of support?---in SHACL for describing which entailment is required of an engine, in Section 1.5 of the current standard. If you follow the links through to the SPARQL 1.1 Entailment Regimes document, you find a bunch of IRIs like http://www.w3.org/ns/entailment/OWL-RDF-Based, and the SHACL spec specifies required behaviors on if any of these are used with sh:entailment. I don't know what kind of support exists for the sh:entailment predicate today - my understanding is that its usage is not part of the SHACL test suite.
Please, anybody correct me on this if you know otherwise. This is coming from a grep -R 'sh:entailment' on this repository's source tree, which showed two slightly different SHACL-SHACL files with syntax-checking shapes only, here and here (current versions linked).

I think another option that would satisfy your need is a "No entailments" regime for some of your shapes. My understanding is this would be an addition to the spec under revision---but probably not an easy one. A friction point is that your case happens to want to avoid subclass inferencing, and that's quite pervasive via "SHACL Type" in the current spec's Terminology section.

I have felt your need myself. I have some pipelines that start with a graph that is manually maintained, and I want to run some "quality control" rules on that before any inferencing comes in and papers over anything that I might not want automatically handled. I also want other rules to run after the inferencing, to catch if I've knowledge-expanded into a bad data condition. The latter rules want maximal expansion, the former want none.

So, if your use case happens to have:

  • an ABox/individuals graph in a pre-inferencing state (which it sounds like it does?), AND
  • a separate TBox graph that supplies the subclass hierarchy, AND
  • a shapes graph that is separable into what you want to run pre-inferencing and post-inferencing,

then you may have a use case for sh:entailment on a specifically "Do not entail anything" regime. That IRI would go onto the pre-inferencing shapes graph.

Is this something that should carry forward for proposal?

@ajnelson-nist
Copy link

FWIW in case we decide to follow point 1 of #215 this could be expressed using

ex:ShapeB
    sh:targetExpression [
        sh:inversePath rdf:type ;
        sh:nodes <https://data.exemple.fr/B> ;
    ] ;

The interpretation is that the value of sh:targetExpression is a node expression, and (in the future) path expressions such as sh:inversePath would count as node expressions too, and accept an optional parameter sh:nodes to specify the focus node(s). A SHACL 1.2 processor would evaluate this expression as "?instance rdf:type https://data.exemple.fr/B".

The benefit here would be that we don't need hard-coded new target types for specific cases such as direct instances, but rather have a generic solution that uses a unified mechanism throughout. The downside, of course, is that the above is a little bit more verbose.

@HolgerKnublauch : How would entailment regimes (say, either RDFS or any OWL) impact your solution-supposing-point-1?

@HolgerKnublauch
Copy link
Contributor

@ajnelson-nist when RDFS or OWL entailment is activated then the data graph will contain additional rdf:type triples that will make it impossible to distinguish direct from indirect instances of a class. In particular, any instance will automatically also become direct instances of the declared superclasses of their asserted type. Algorithms such as the node expressions would evaluate differently in those modes.

@ajnelson-nist
Copy link

@ajnelson-nist when RDFS or OWL entailment is activated then the data graph will contain additional rdf:type triples that will make it impossible to distinguish direct from indirect instances of a class. In particular, any instance will automatically also become direct instances of the declared superclasses of their asserted type. Algorithms such as the node expressions would evaluate differently in those modes.

Thank you for confirming. I feel affirmed in suggesting a "No entailments" regime for the use case in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core For SHACL 1.2 Core spec
Projects
None yet
Development

No branches or pull requests

3 participants