Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of Language Tags in the Abstract Syntax #22

Open
gkellogg opened this issue Aug 10, 2023 · 3 comments
Open

Representation of Language Tags in the Abstract Syntax #22

gkellogg opened this issue Aug 10, 2023 · 3 comments
Labels
use case Issue to record discussion on a use case

Comments

@gkellogg
Copy link
Member

gkellogg commented Aug 10, 2023

Provide sufficient information so that a member of the working group's Use Case Task Force can contact you and enhance your description so that it can be used by the working group to guide their activities. You do not have to fill out all the information requested.

** Contact information

  • Your name: Gregg Kellogg
  • How to contact you: @gkellogg

** Brief Description of your use case:

As an aggregator of RDF information, I want to have a predictable number of triples when parsing triples where literals may vary only in the case of the language tag element. I would also like the serialized (possibly canonicalized) form to use the BCP14 formatting recommendations, so that the language tag en-us might canonically be represented as en-US.

  • [ISO639-1] recommends that language codes be written in lowercase ('mn' Mongolian).
  • [ISO15924] recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
  • [ISO3166-1] recommends that country codes be capitalized ('MN' Mongolia).

When aggregating data, input can be combined from different documents, where different conventions of formatting language tags are in use, leading the potential duplication of data.

*** What you want to be able to do:

When parsing a document that may be composed of several overlapping triples, I would like the resulting graph to have a unique abstract representation for otherwise equal language tags. As it is, the following Turtle can generate either one or two triples in the abstract representation, depending on if the implementation chooses to normalize language tags, e.g., to lower case.

_:a rdf:value "foo"@en-us, "foo"@en-US .

Implementations that normalize language tags will result in a single triple, those that do not will result in two triples.

*** What is the role of RDF-star quoted triples in your use case:

Not related to quoted triples.

*** Why it is hard or impossible to do what you want to do without quoted triples:

Not related to quoted triples.

*** How you want quoted triples to behave in your use case:
(For example, do you want the precise syntax of subjects, predictes, and objects in quoted triples to be important?)

From the start, RDF should have mandated a normalized form for language tags in literals, ideally based on BCP47 formatting. It would also be acceptable if all parsers normalized language tags to lower case for the abstract representation. Concrete syntaxes which can perform canonicalization could then require a particular form for language tags without danger of potentially serializing different graphs, depending on how they were parsed on input.

*** An example RDF graph that shows part of your use case:

_:a rdf:value "foo"@en-us, "foo"@en-US .

If changed to require normalizing to lower case, this would be the same as the following:

_:a rdf:value "foo"@en-us .

N-Triples/N-Quads canonicalization could then either represent using that lower case form, or use BCP47 formatting.

@gkellogg gkellogg added the use case Issue to record discussion on a use case label Aug 10, 2023
@pfps
Copy link
Contributor

pfps commented Sep 8, 2023

This use case for RDF 1.2 places constraints on how language tags are handled. As it doesn't have implications for the RDF-star semantics it can be just tracked here, without creating a wiki page for it.

@lisp
Copy link
Contributor

lisp commented Dec 21, 2023

unique abstract reputation

is "representation" intended?

@gkellogg
Copy link
Member Author

Thanks, fixed. This UC can probably be marked as addressed at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case Issue to record discussion on a use case
Projects
None yet
Development

No branches or pull requests

3 participants