-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representing triple origin information during Federated SPARQL querying #18
Comments
Can you provide an example of just what you want, including a description of the behaviour of the remote SPARQL system, the graphs that it uses, and the resulting quoted triples? From your description it seems to me that significant changes to SPARQL are required so that all the remote triples are passed back to the calling SPARQL system which then constructs a set of local triples. Also, wouldn't this be a useful service locally so that you could see what triples a non-federated query used to generate its results? |
Indeed, significant changes would be required to SPARQL engines.
Assume we have the following endpoints with datasets: http://example.org/endpoint1/sparql:
http://example.org/endpoint2/sparql:
Federated query across the two endpoints: SELECT * WHERE {
?personA :knows ?personB.
?personB :name ?name.
<< ?personB :name ?name >> :federatedSource ?sourceOfName.
} Results:
|
Thanks for the quick clarification. As far as I can tell all this can be done without having quoted triples in any RDF graph. The connection to quoted triples is that the SPARQL query has quoted triples, perhaps in this form:
My thought is that this could also be done by using some special SPARQL syntax, perhaps like:
SELECT * WHERE {
|
Adding a custom keyword for this to SPARQL could be an option indeed, |
I think that the failure mode with the explicit quoted triples is just as bad. No SPARQL 1.1 engine would be able to understand either the << >> or the {| |} syntax. And if there are SPARQL-star engines that understand this syntax they would not retrieve any triples unless they understood this extra built-in predicate. That is unless you are suggesting that RDF stores include or provide source annotations for all their triples. |
That could be an option, if explicit entailment would be preferred, but that's not the goal of this use case. I want to emphasize again that the above does not exist yet, it's simply a possible use case for quoted triples in future federated SPARQL engines. A keyword such as |
OK, so only a SPARQL query engine that accepts requests for federated queries needs to be changed. But this can't be just something that passes the query off to a regular SPARQL query engine as it needs to have access to the underlying matches against RDF graphs. |
How about an example with a federated construct query that constructs a triple annotated with a source? That seems to draw a closer connection to RDF-star and would probably be closer to the interests of working group members. |
It seems that this wish must start by radically redesigning Federated SPARQL, which today works only through the Alternatively, something similar to what @pfps has proposed, with a federated |
This is similar to recording an observation of a triple in another graph.
|
I realize now that I wasn't explicit about the fact that I was referring to federated SPARQL query execution that includes source selection. Concretely, this allows users to write queries without |
@rubensworks — It seems to me that before anyone can do much meaningful work on "representing triple origin information" in that scenario, someone(s) must adequately specify the "federated SPARQL query execution that includes source selection [which] allows users to write queries without Of particular interest to me is how the "federation engine" is to "autonomously [determine] relevant sources for each part of the query". What do you envision as the clues in the user's query, that would allow the federation engine to determine that some part(s) of the SPARQL query should be run against serverA rather than serverB? The best clues I know of, VoID graphs, are absent on an embarrassingly high plurality of public datasets, and generally outdated where they do exist — and even if they were present, past efforts have shown them as far from equivalent to the schema mappings available (or constructible from some number of relatively cheap queries) on most table-relational (SQL-style) DBMS, which allow for dynamic query cost optimization when joining across multiple local and/or remote tables (which we put to substantial use in Virtuoso, in its VDBMS feature, only available in Enterprise Edition). |
@TallTed This domain has been extensively studied and is well-defined within academic research. |
@rubensworks Take a look as https://github.com/w3c/rdf-ucr/wiki/Capturing-triple-origin-in-SPARQL-star and see whether it captures your use case. |
@rubensworks — "Extensively studied and ... well-defined within academic research" does not come close to what I meant by "adequately specify", which I would have hoped you would understand in this context to mean globally standardized, through W3C or similar; unencumbered by patents, license fees, etc.; and available for for royalty-free use, interoperable implementation, and permissionless extension. The single paper of academic research you pointed me to is not free to read, and even if it were, one paper is hardly enough for anything to be considered "extensively studied" nor "well-defined". (I did find other paths through which to download no-cost PDFs of that paper (provided here for others: [1], [2], [3]), but there are multiple dates in their footers, and I'm not certain which is actually the latest version, nor which version you intended.) Similarly, FedX appears (after some but not exhaustive research, stifled in part by a lot of unintended collision with FedEx) to be a thing built into RDF4J, and not discussed much of anywhere not involving RDF4J. |
Looks good to me, thanks @pfps! |
one interpretation of this issue is that it concerns annotating sparql solutions rather than triples. |
See https://github.com/w3c/rdf-ucr/wiki/Capturing-triple-origin-in-SPARQL-star for a version of this use case.
Provide sufficient information so that a member of the working group's Use Case Task Force can contact you and enhance your description so that it can be used by the working group to guide their activities. You do not have to fill out all the information requested.
** Contact information
** Brief Description of your use case:
When executing a Federated SPARQL Query (i.e., a query across multiple SPARQL endpoints), users may want to know which sources contributed to which query results.
*** What you want to be able to do:
When executing a Federated SPARQL Query, I want to annotate triples with the source they originate from.
*** What is the role of RDF-star quoted triples in your use case:
For example, the following query could produce all triples with corresponding
?source
URL.*** Why it is hard or impossible to do what you want to do without quoted triples:
This could be achieved using named graphs, but semantics may clash with other usages of named graphs.
*** How you want quoted triples to behave in your use case:
(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)
N/A
*** An example RDF graph that shows part of your use case:
N/A
Similar to the "Combination of RDF-star and graph-level metadata (named graphs)" use case, this use case has as limitation that it's not possible to annotate triples inside named graphs.
For instance, the following may be desired by users, but this is not possible given the restriction of RDF-star to only annotate triples:
If extending RDF-star to named graphs is not desired, then this limitation could be worked around as follows (alternatives may be possible):
The text was updated successfully, but these errors were encountered: