Representation of Language Tags in the Abstract Syntax #22

gkellogg · 2023-08-10T20:29:50Z

Provide sufficient information so that a member of the working group's Use Case Task Force can contact you and enhance your description so that it can be used by the working group to guide their activities. You do not have to fill out all the information requested.

** Contact information

Your name: Gregg Kellogg
How to contact you: @gkellogg

** Brief Description of your use case:

As an aggregator of RDF information, I want to have a predictable number of triples when parsing triples where literals may vary only in the case of the language tag element. I would also like the serialized (possibly canonicalized) form to use the BCP14 formatting recommendations, so that the language tag en-us might canonically be represented as en-US.

[ISO639-1] recommends that language codes be written in lowercase ('mn' Mongolian).
[ISO15924] recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
[ISO3166-1] recommends that country codes be capitalized ('MN' Mongolia).

When aggregating data, input can be combined from different documents, where different conventions of formatting language tags are in use, leading the potential duplication of data.

*** What you want to be able to do:

When parsing a document that may be composed of several overlapping triples, I would like the resulting graph to have a unique abstract representation for otherwise equal language tags. As it is, the following Turtle can generate either one or two triples in the abstract representation, depending on if the implementation chooses to normalize language tags, e.g., to lower case.

_:a rdf:value "foo"@en-us, "foo"@en-US .

Implementations that normalize language tags will result in a single triple, those that do not will result in two triples.

*** What is the role of RDF-star quoted triples in your use case:

Not related to quoted triples.

*** Why it is hard or impossible to do what you want to do without quoted triples:

Not related to quoted triples.

*** How you want quoted triples to behave in your use case:
(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)

From the start, RDF should have mandated a normalized form for language tags in literals, ideally based on BCP47 formatting. It would also be acceptable if all parsers normalized language tags to lower case for the abstract representation. Concrete syntaxes which can perform canonicalization could then require a particular form for language tags without danger of potentially serializing different graphs, depending on how they were parsed on input.

*** An example RDF graph that shows part of your use case:

_:a rdf:value "foo"@en-us, "foo"@en-US .

If changed to require normalizing to lower case, this would be the same as the following:

_:a rdf:value "foo"@en-us .

N-Triples/N-Quads canonicalization could then either represent using that lower case form, or use BCP47 formatting.

The text was updated successfully, but these errors were encountered:

pfps · 2023-09-08T17:03:57Z

This use case for RDF 1.2 places constraints on how language tags are handled. As it doesn't have implications for the RDF-star semantics it can be just tracked here, without creating a wiki page for it.

lisp · 2023-12-21T15:11:29Z

unique abstract reputation

is "representation" intended?

gkellogg · 2023-12-21T22:17:16Z

Thanks, fixed. This UC can probably be marked as addressed at this point.

gkellogg added the use case Issue to record discussion on a use case label Aug 10, 2023

gkellogg mentioned this issue Aug 10, 2023

formulate use case on language tag canonicalization [1] w3c/rdf-star-wg#83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of Language Tags in the Abstract Syntax #22

Representation of Language Tags in the Abstract Syntax #22

gkellogg commented Aug 10, 2023 •

edited

Loading

pfps commented Sep 8, 2023

lisp commented Dec 21, 2023

gkellogg commented Dec 21, 2023

Representation of Language Tags in the Abstract Syntax #22

Representation of Language Tags in the Abstract Syntax #22

Comments

gkellogg commented Aug 10, 2023 • edited Loading

pfps commented Sep 8, 2023

lisp commented Dec 21, 2023

gkellogg commented Dec 21, 2023

gkellogg commented Aug 10, 2023 •

edited

Loading