graphql · benjie · Jul 26, 2024 · Aug 15, 2024 · Aug 22, 2024
diff --git a/.prettierignore b/.prettierignore
@@ -2,4 +2,5 @@ public/
 pnpm-lock.yaml
 *.mdx
 !src/pages/blog/2024-04-11-announcing-new-graphql-website/index.mdx
+!src/pages/blog/2024-08-14-exploring-true-nullability.mdx
 *.jpg
diff --git a/src/pages/blog/2024-08-14-exploring-true-nullability.mdx b/src/pages/blog/2024-08-14-exploring-true-nullability.mdx
@@ -0,0 +1,284 @@
+---
+title: "Exploring 'True' Nullability in GraphQL"
+tags: ["spec"]
+date: 2024-08-14
+byline: Benjie Gillam
+---
+
+One of GraphQL's early decisions was to allow "partial success"; this was a
+critical feature for Facebook - if one part of their backend infrastructure
+became degraded they wouldn't want to just render an error page, instead they
+wanted to serve the user a page with as much working data as they could.
+
+## Null propagation
+
+To accomplish this, if an error occured within a resolver, the resolver's value
+would be replaced with a `null`, and an error would be added to the `errors`
+array in the response. However, what if that field was marked as non-null? To
+solve that apparent contradiction, GraphQL introduced the "error propagation"
+behavior (also known colloquially as "null bubbling") - when a `null` (from an
+error or otherwise) occurs in a non-nullable position, the parent position
+(either a field or a list item) is made `null` instead. This behavior would
+repeat if the parent position was also non-nullable, and this could cascade (or
+"bubble") all the way up to the root of the query if everything in the path is
+non-nullable.
+
+This solved the issue, and meant that GraphQL's nullability promises were still
+honoured; but it wasn't without complications.
+
+### Complication 1: partial success
+
+We want to be resilient to systems failing; but errors that occur in
+non-nullable positions cascade to surrounding parts of the query, making less
+and less data available to be rendered. This seems contrary to our "partial
+success" aim, but it's easy to solve - we just make sure that the positions
+where we expect errors to occur are nullable so that errors don't propagate
+further. Clients now needed to ensure they handle any nulls that occur in these
+positions; but that seemed like a fair trade.
+
+### Complication 2: nullable epidemic
+
+Almost any field in your GraphQL schema could raise an error - errors might not
+only be caused by backend services becoming unavailable or responding in
+unexpected ways; they can also be caused by simple programming errors in your
+business logic, data consistency errors (e.g. expecting a boolean but receiving
+a float), or any other cause.
+
+Since we don't want to "blow up" the entire response if any such issue occurred,
+we've moved to strongly encourage nullable usage throughout a schema, only
+adding the non-nullable `!` marker to positions where we're truly sure that
+field is extremely unlikely to error. This has the effect of meaning that
+developers consuming the GraphQL API have to handle potential nulls in more
+positions than they would expect, making for additional work.
+
+### Complication 3: normalized caching
+
+Many modern GraphQL clients use a "normalized" cache, such that updates pulled
+down from the API in one query can automatically update all the previously
+rendered data across the application. This helps ensure consistency for users,
+and is a powerful feature.
+
+However, if an error occurs in a non-nullable position, it's
+[no longer safe](https://github.com/graphql/nullability-wg/issues/20) to store
+the data to the normalized cache.
+
+## The Nullability Working Group
+
+At first, we thought the solution to this was to give clients control over the
+nullability of a response, so we set up the Client-Controlled Nullability (CCN)
+Working Group. Later, we renamed the working group to the Nullability WG to show
+that it encompassed all potential solutions to this problem.
+
+### Client-controlled nullability
+
+The first Nullability WG proposal came from a collaboration between Yelp and
+Netflix, with contributions from GraphQL WG regulars Alex Reilly, Mark Larah,
+and Stephen Spalding among others. They proposed we could adorn the queries we
+issue to the server with sigils indicating our desired nullability overrides for
+the given fields - client-controlled nullability.
+
+A `?` would be added to fields where we don't mind if they're null, but we
+definitely want errors to stop there; and add a `!` to fields where we
+definitely don't want a null to occur (whether or not there is an error). This
+would give consumers control over where errors/nulls were handled.
+
+However, after much exploration of the topic over years we found numerous issues
+that traded one set of concerns for another. We kept iterating whilst we looked
+for a solution to these tradeoffs.
+
+### True nullability schema
+
+Jordan Eldredge
+[proposed](https://github.com/graphql/nullability-wg/discussions/22) that making
+fields nullable to handle error propagation was hiding the "true" nullability of
+the data. Instead, he suggested, we should have the schema represent the true
+nullability, and put the responsibility on clients to use the `?` CCN operator
+to handle errors in the relevant places.
+
+However, this would mean that clients such as Relay would want to add `?` in
+every position, causing an "explosion" of question marks, because really what
+Relay desired was to disable null propagation entirely.
+
+### A new type
+
+Getting the relevant experts together at GraphQLConf 2023 re-energized the
+discussions and sparked new ideas. After seeing Stephen's "Nullability Sandwich"
+talk and chatting with Jordan, Stephen and others in the corridor, Benjie Gillam
+was inspired to [propose](https://github.com/graphql/graphql-spec/pull/1046) a
+"null only on error" type. This type would allow us to express the "true"
+nullability of a field whilst also indicating that errors may happen that should
+be handled, but would not "blow up" the response.
+
+To maintain backwards compatibility, clients would need to opt in to seeing this
+new type (otherwise it would masquerade as nullable). It would be up to the
+client how to handle the nullability of this position knowing that a "null only
+on error" position would only contain a `null` if a matching error existed in
+the `errors` list.
+
+A
+[number of alternative syntaxes](https://gist.github.com/benjie/19d784721d1658b89fd8954e7ee07034)
+were suggested for this new type, but none were well liked.
+
+### A new approach to client error handling
+
+Also around the time of GraphQLConf 2023 the Relay team shared
+[a presentation](https://docs.google.com/presentation/u/2/d/1rfWeBcyJkiNqyxPxUIKxgbExmfdjA70t/edit?pli=1#slide=id.p8)
+on some of the things they were thinking around errors. In particular they
+discussed the `@catch` directive which would give users control over how errors
+were represented in the data being rendered, allowing the client to
+differentiate an error from a legitimate null. Over the coming months, many
+behaviors were discussed at the Nullability WG; one particularly compelling one
+was that clients could throw the error when an errored field was read, and rely
+on framework mechanics (such as React's
+[error boundaries](https://legacy.reactjs.org/docs/error-boundaries.html)) to
+handle them.
+
+### Strict semantic nullability
+
+GraphQL Foundation director Lee Byron
+[proposed](https://github.com/graphql/graphql-wg/discussions/1410) that we
+introduce a schema directive, `@strictNullability`, whereby we would change what
+the syntax meant - `Int?` for nullable, `Int` for null-only-on-error, and `Int!`
+for never-null. This proposal was well liked, but wasn't a clear win; it
+introduced many complexities including migration costs and concerns over schema
+evolution.
+
+### A pivotal discussion
+
+Lee and Benjie had a call where they discussed the history of GraphQL
+nullability and all the relevant proposals in depth, including their two
+respective solutions. It was clear that though no solution was quite there, the
+solutions converging hinted we were getting closer and closer to an answer. This
+long and detailed highly technical discussion inspired
+[a new proposal](https://github.com/graphql/nullability-wg/discussions/58),
+which has been iterated further, and we aim to describe below.
+
+## Our latest proposal
+
+We're now proposing a new opt-in execution mode to solve the nullability
+problem. It's important to note that both the client and the server must opt-in
+to this new mode for it to take effect, otherwise the traditional execution mode
+will be used.
+
+### No-error-propagation mode
+
+The new proposal centers around the premise of allowing clients to disable the
+"error propagation" behavior discussed above.
+
+Clients that opt-in to this behavior take responsibility for interpretting the
+response as a whole, correlating the `data` and `errors` properties of the
+response. With error propagation disabled and the previously discussed fact that
+any field could potentially throw an error, all positions in `data` can
+potentially contain a `null` value. Clients in this mode must cross-check any
+`null` values against `errors` to determine if it represents a true `null`, or
+an error.
+
+### "Smart" clients
+
+The no-error-propagation mode is intended for use by "smart" clients such as
+Relay, Apollo Client, URQL and others which understand GraphQL deeply and are
+responsible for the storage and retrieval of fetched GraphQL data. These clients
+are well positioned to handle the responsibilities outlined above.
+
+By disabling error propagation, these clients will be able to safely update
+their stores (including normalized stores) even when errors occur. They can also
+re-implement traditional GraphQL error propagation on top of these new
+foundations, shielding applications developers from needing to learn this new
+behavior (whilst still allowing them to reap the benefits!). They can even take
+on advanced behaviors, such as throwing the error when the application developer
+attempts to read from an errored field, allowing the developer to handle errors
+with their system's native error boundaries.
+
+### True nullability
+
+Just like in traditional mode, for clients operating in no-error-propagation
+mode fields are either nullable or non-nullable. However; unlike in traditional
+mode, no-error-propagation mode allows for errors to be represented in any
+position:
+
+- nullable (e.g. `Int`): a value, an error, or a true `null`;
+- non-nullable (e.g. `Int!`): a value, **or an error**.
+
+_(In traditional mode, non-nullable fields cannot represent an error because the
+error propagates to the nearest nullable position.)_
+
+Since this mode allows every field, whether nullable or non-nullable, to
+represent an error, the schema can safely indicate to clients in this mode the
+true intended nullability of a field. If the schema designer knows that a field
+should never be null unless an error occurs, they can mark the field as
+"non-nullable for clients in no-error-propagation mode" (see "schema developers"
+below).
+
+### Client reflection of true nullability
+
+Smart clients can ask the schema about the "true" nullability of each field via
+introspection, and can generate a derived SDL by combining that information with
+their knowledge of how the client handles errors. This derived SDL, dependent on
+client behavior, would look like the traditional representation of the schema,
+but with more fields potentially marked as non-nullable where the true
+nullability of the underlying schema has been reflected. Application developers
+would issue queries and mutations in the same way they always had, but now their
+generated types may not need to handle `null` in as many positions as before,
+increasing developer happiness.
+
+### Schema developers
+
+Schemas that wish to add support for indicating the "true nullability" of a
+field in no-error-propagation mode need to be able to discern which types show
+up as non-nullable in both modes (traditional non-null types), and which types
+show up as non-nullable only in no-error-propagation mode. For this later
+concern we've introduced the concept, of a "semantic" non-null type:
+
+- "strict" (traditional) non-nullable - shows up as non-nullable in both
+  traditional mode and no-null-propagation mode
+- "semantic" non-nullable, aka "null only on error" - shows up as non-nullable
+  in no-null-propagation mode and masquerades as nullable in traditional mode
+
+Only clients that opt-in to seeing the "true" nullability will see these two
+different types of nullability, otherwise the nullability of the chosen mode
+(traditional or no-error-propagation) will be reflected by introspection.
+
+### Representation in SDL
+
+Application developers will only need to deal with traditional SDL that
+represents traditional nullability concerns. If these developers are using
+"smart" clients then they should source this SDL from the client rather than
+from the server, this allows them to see the nullability that the client
+guarantees based on how it will handle the "true" nullability of the schema, how
+it handles errors, and factoring in any local schema extensions that may have
+been added.
+
+Client-derived SDL (see "client reflection of true nullability" above) can be
+used for concerns such as code generation, which will work in the traditional
+way with no need for changes (but happier developers if there are fewer nullable
+positions!).
+
+Schema developers and people working on "smart" clients may need to represent
+the differences between "strict" and "semantic" non-nullable in SDL. For these
+people, we're introducing the `@extendedNullability` document directive. When
+this directive is present at the top of a document, the `!` symbol means that a
+type will appear as non-nullable only in no-error-propagation mode, and a new
+`!!` symbol will represent that a type will appear as non-nullable in both
+traditional and no-error-propagation mode.
+
+| Traditional Mode | No-error-propagation mode | Example |
+| ---------------- | ------------------------- | ------- |
+| Nullable         | Nullable                  | `Int`   |
+| Nullable         | Non-nullable              | `Int!`  |
+| Non-nullable\*   | Non-nullable              | `Int!!` |
+
+The `!!` symbol is designed to look a little scary - it should be used with
+caution (like `!` in traditional schemas) because it is the symbol that means
+that errors will propagate in traditional mode, "blowing up" parent selection
+sets.
+
+## Get involved
+
+Like all GraphQL Working Groups, the Nullability Working Group is open to all.
+Whether you work on a GraphQL client or are just a GraphQL user with thoughts on
+nullability, we want to hear from you - add yourself to an
+[upcoming working group](https://github.com/graphql/nullability-wg/) or chat
+with us in the #nullability-wg channel in
+[the GraphQL Discord](https://discord.graphql.org). This solution is not yet
+merged into the specification, so there's still time for iteration and
+alternative ideas!