Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Exploring true nullability" blog post #1731

Draft
wants to merge 3 commits into
base: source
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ public/
pnpm-lock.yaml
*.mdx
!src/pages/blog/2024-04-11-announcing-new-graphql-website/index.mdx
!src/pages/blog/2024-08-14-exploring-true-nullability.mdx
*.jpg
284 changes: 284 additions & 0 deletions src/pages/blog/2024-08-14-exploring-true-nullability.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
---
title: "Exploring 'True' Nullability in GraphQL"
tags: ["spec"]
date: 2024-08-14
byline: Benjie Gillam
---

One of GraphQL's early decisions was to allow "partial success"; this was a
critical feature for Facebook - if one part of their backend infrastructure
became degraded they wouldn't want to just render an error page, instead they
wanted to serve the user a page with as much working data as they could.

## Null propagation

To accomplish this, if an error occured within a resolver, the resolver's value
would be replaced with a `null`, and an error would be added to the `errors`
array in the response. However, what if that field was marked as non-null? To
solve that apparent contradiction, GraphQL introduced the "error propagation"
behavior (also known colloquially as "null bubbling") - when a `null` (from an
error or otherwise) occurs in a non-nullable position, the parent position
(either a field or a list item) is made `null` instead. This behavior would
repeat if the parent position was also non-nullable, and this could cascade (or
"bubble") all the way up to the root of the query if everything in the path is
non-nullable.

This solved the issue, and meant that GraphQL's nullability promises were still
honoured; but it wasn't without complications.

### Complication 1: partial success

We want to be resilient to systems failing; but errors that occur in
non-nullable positions cascade to surrounding parts of the query, making less
and less data available to be rendered. This seems contrary to our "partial
success" aim, but it's easy to solve - we just make sure that the positions
where we expect errors to occur are nullable so that errors don't propagate
further. Clients now needed to ensure they handle any nulls that occur in these
positions; but that seemed like a fair trade.

### Complication 2: nullable epidemic

Almost any field in your GraphQL schema could raise an error - errors might not
only be caused by backend services becoming unavailable or responding in
unexpected ways; they can also be caused by simple programming errors in your
business logic, data consistency errors (e.g. expecting a boolean but receiving
a float), or any other cause.

Since we don't want to "blow up" the entire response if any such issue occurred,
we've moved to strongly encourage nullable usage throughout a schema, only
adding the non-nullable `!` marker to positions where we're truly sure that
field is extremely unlikely to error. This has the effect of meaning that
developers consuming the GraphQL API have to handle potential nulls in more
positions than they would expect, making for additional work.

### Complication 3: normalized caching

Many modern GraphQL clients use a "normalized" cache, such that updates pulled
down from the API in one query can automatically update all the previously
rendered data across the application. This helps ensure consistency for users,
and is a powerful feature.

However, if an error occurs in a non-nullable position, it's
[no longer safe](https://github.com/graphql/nullability-wg/issues/20) to store
the data to the normalized cache.

## The Nullability Working Group

At first, we thought the solution to this was to give clients control over the
nullability of a response, so we set up the Client-Controlled Nullability (CCN)
Working Group. Later, we renamed the working group to the Nullability WG to show
that it encompassed all potential solutions to this problem.

### Client-controlled nullability

The first Nullability WG proposal came from a collaboration between Yelp and
Netflix, with contributions from GraphQL WG regulars Alex Reilly, Mark Larah,
and Stephen Spalding among others. They proposed we could adorn the queries we
issue to the server with sigils indicating our desired nullability overrides for
the given fields - client-controlled nullability.

A `?` would be added to fields where we don't mind if they're null, but we
definitely want errors to stop there; and add a `!` to fields where we
definitely don't want a null to occur (whether or not there is an error). This
would give consumers control over where errors/nulls were handled.

However, after much exploration of the topic over years we found numerous issues
that traded one set of concerns for another. We kept iterating whilst we looked
for a solution to these tradeoffs.

### True nullability schema

Jordan Eldredge
[proposed](https://github.com/graphql/nullability-wg/discussions/22) that making
fields nullable to handle error propagation was hiding the "true" nullability of
the data. Instead, he suggested, we should have the schema represent the true
nullability, and put the responsibility on clients to use the `?` CCN operator
to handle errors in the relevant places.

However, this would mean that clients such as Relay would want to add `?` in
every position, causing an "explosion" of question marks, because really what
Relay desired was to disable null propagation entirely.

### A new type

Getting the relevant experts together at GraphQLConf 2023 re-energized the
discussions and sparked new ideas. After seeing Stephen's "Nullability Sandwich"
talk and chatting with Jordan, Stephen and others in the corridor, Benjie Gillam
was inspired to [propose](https://github.com/graphql/graphql-spec/pull/1046) a
"null only on error" type. This type would allow us to express the "true"
nullability of a field whilst also indicating that errors may happen that should
be handled, but would not "blow up" the response.

To maintain backwards compatibility, clients would need to opt in to seeing this
new type (otherwise it would masquerade as nullable). It would be up to the
client how to handle the nullability of this position knowing that a "null only
on error" position would only contain a `null` if a matching error existed in
the `errors` list.

A
[number of alternative syntaxes](https://gist.github.com/benjie/19d784721d1658b89fd8954e7ee07034)
were suggested for this new type, but none were well liked.

### A new approach to client error handling

Also around the time of GraphQLConf 2023 the Relay team shared
[a presentation](https://docs.google.com/presentation/u/2/d/1rfWeBcyJkiNqyxPxUIKxgbExmfdjA70t/edit?pli=1#slide=id.p8)
on some of the things they were thinking around errors. In particular they
discussed the `@catch` directive which would give users control over how errors
were represented in the data being rendered, allowing the client to
differentiate an error from a legitimate null. Over the coming months, many
behaviors were discussed at the Nullability WG; one particularly compelling one
was that clients could throw the error when an errored field was read, and rely
on framework mechanics (such as React's
[error boundaries](https://legacy.reactjs.org/docs/error-boundaries.html)) to
handle them.

### Strict semantic nullability

GraphQL Foundation director Lee Byron
[proposed](https://github.com/graphql/graphql-wg/discussions/1410) that we
introduce a schema directive, `@strictNullability`, whereby we would change what
the syntax meant - `Int?` for nullable, `Int` for null-only-on-error, and `Int!`
for never-null. This proposal was well liked, but wasn't a clear win; it
introduced many complexities including migration costs and concerns over schema
evolution.

### A pivotal discussion

Lee and Benjie had a call where they discussed the history of GraphQL
nullability and all the relevant proposals in depth, including their two
respective solutions. It was clear that though no solution was quite there, the
solutions converging hinted we were getting closer and closer to an answer. This
long and detailed highly technical discussion inspired
[a new proposal](https://github.com/graphql/nullability-wg/discussions/58),
which has been iterated further, and we aim to describe below.

## Our latest proposal

We're now proposing a new opt-in execution mode to solve the nullability
problem. It's important to note that both the client and the server must opt-in
to this new mode for it to take effect, otherwise the traditional execution mode
will be used.

### No-error-propagation mode

The new proposal centers around the premise of allowing clients to disable the
"error propagation" behavior discussed above.

Clients that opt-in to this behavior take responsibility for interpretting the
response as a whole, correlating the `data` and `errors` properties of the
response. With error propagation disabled and the previously discussed fact that
any field could potentially throw an error, all positions in `data` can
potentially contain a `null` value. Clients in this mode must cross-check any
`null` values against `errors` to determine if it represents a true `null`, or
an error.

### "Smart" clients

The no-error-propagation mode is intended for use by "smart" clients such as
Relay, Apollo Client, URQL and others which understand GraphQL deeply and are
responsible for the storage and retrieval of fetched GraphQL data. These clients
are well positioned to handle the responsibilities outlined above.

By disabling error propagation, these clients will be able to safely update
their stores (including normalized stores) even when errors occur. They can also
re-implement traditional GraphQL error propagation on top of these new
foundations, shielding applications developers from needing to learn this new
behavior (whilst still allowing them to reap the benefits!). They can even take
on advanced behaviors, such as throwing the error when the application developer
attempts to read from an errored field, allowing the developer to handle errors
with their system's native error boundaries.

### True nullability

Just like in traditional mode, for clients operating in no-error-propagation
mode fields are either nullable or non-nullable. However; unlike in traditional
mode, no-error-propagation mode allows for errors to be represented in any
position:

- nullable (e.g. `Int`): a value, an error, or a true `null`;
- non-nullable (e.g. `Int!`): a value, **or an error**.

_(In traditional mode, non-nullable fields cannot represent an error because the
error propagates to the nearest nullable position.)_

Since this mode allows every field, whether nullable or non-nullable, to
represent an error, the schema can safely indicate to clients in this mode the
true intended nullability of a field. If the schema designer knows that a field
should never be null unless an error occurs, they can mark the field as
"non-nullable for clients in no-error-propagation mode" (see "schema developers"
below).

### Client reflection of true nullability

Smart clients can ask the schema about the "true" nullability of each field via
introspection, and can generate a derived SDL by combining that information with
their knowledge of how the client handles errors. This derived SDL, dependent on
client behavior, would look like the traditional representation of the schema,
but with more fields potentially marked as non-nullable where the true
nullability of the underlying schema has been reflected. Application developers
would issue queries and mutations in the same way they always had, but now their
generated types may not need to handle `null` in as many positions as before,
increasing developer happiness.

### Schema developers

Schemas that wish to add support for indicating the "true nullability" of a
field in no-error-propagation mode need to be able to discern which types show
up as non-nullable in both modes (traditional non-null types), and which types
show up as non-nullable only in no-error-propagation mode. For this later
concern we've introduced the concept, of a "semantic" non-null type:

- "strict" (traditional) non-nullable - shows up as non-nullable in both
traditional mode and no-null-propagation mode
- "semantic" non-nullable, aka "null only on error" - shows up as non-nullable
in no-null-propagation mode and masquerades as nullable in traditional mode

Only clients that opt-in to seeing the "true" nullability will see these two
different types of nullability, otherwise the nullability of the chosen mode
(traditional or no-error-propagation) will be reflected by introspection.

### Representation in SDL

Application developers will only need to deal with traditional SDL that
represents traditional nullability concerns. If these developers are using
"smart" clients then they should source this SDL from the client rather than
from the server, this allows them to see the nullability that the client
guarantees based on how it will handle the "true" nullability of the schema, how
it handles errors, and factoring in any local schema extensions that may have
been added.

Client-derived SDL (see "client reflection of true nullability" above) can be
used for concerns such as code generation, which will work in the traditional
way with no need for changes (but happier developers if there are fewer nullable
positions!).

Schema developers and people working on "smart" clients may need to represent
the differences between "strict" and "semantic" non-nullable in SDL. For these
people, we're introducing the `@extendedNullability` document directive. When
this directive is present at the top of a document, the `!` symbol means that a
type will appear as non-nullable only in no-error-propagation mode, and a new
`!!` symbol will represent that a type will appear as non-nullable in both
traditional and no-error-propagation mode.

| Traditional Mode | No-error-propagation mode | Example |
| ---------------- | ------------------------- | ------- |
| Nullable | Nullable | `Int` |
| Nullable | Non-nullable | `Int!` |
| Non-nullable\* | Non-nullable | `Int!!` |

The `!!` symbol is designed to look a little scary - it should be used with
caution (like `!` in traditional schemas) because it is the symbol that means
that errors will propagate in traditional mode, "blowing up" parent selection
sets.

## Get involved

Like all GraphQL Working Groups, the Nullability Working Group is open to all.
Whether you work on a GraphQL client or are just a GraphQL user with thoughts on
nullability, we want to hear from you - add yourself to an
[upcoming working group](https://github.com/graphql/nullability-wg/) or chat
with us in the #nullability-wg channel in
[the GraphQL Discord](https://discord.graphql.org). This solution is not yet
merged into the specification, so there's still time for iteration and
alternative ideas!