Skip to content

Commit

Permalink
p4682: Array forwards to the prelude
Browse files Browse the repository at this point in the history
We propose to add `Core.Array(T, N)` as a library type in the `prelude`
library of the `Core` package. Since arrays are a very frequent type,
we propose to privilege use of this type by providing a builtin
`Array(T, N)` type in the global scope that resolves to the
`Core.Array(T, N)` type. Users can model this as an implicit import of
the `Core.Array(T, N)` type into the global scope, much like the
implicit import of the `prelude` library of the `Core` package.
  • Loading branch information
danakj committed Jan 8, 2025
1 parent f2c8c1f commit 7729072
Showing 1 changed file with 331 additions and 26 deletions.
357 changes: 331 additions & 26 deletions proposals/p4682.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,56 +15,361 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Abstract](#abstract)
- [Problem](#problem)
- [Background](#background)
- [Rust](#rust)
- [Swift](#swift)
- [Safe C++](#safe-c)
- [Goals](#goals)
- [Privileging the most common type names](#privileging-the-most-common-type-names)
- [Absence of syntax should make clear defaults](#absence-of-syntax-should-make-clear-defaults)
- [Avoiding confusion with other languages](#avoiding-confusion-with-other-languages)
- [Proposal](#proposal)
- [Details](#details)
- [Rationale](#rationale)
- [Alternatives considered](#alternatives-considered)

<!-- tocstop -->

## Abstract

TODO: Describe, in a succinct paragraph, the gist of this document. This
paragraph should be reproduced verbatim in the PR summary.
We propose to add `Core.Array(T, N)` as a library type in the `prelude` library
of the `Core` package. Since arrays are a very frequent type, we propose to
privilege use of this type by providing a builtin `Array(T, N)` type in the
global scope that resolves to the `Core.Array(T, N)` type. Users can model this
as an implicit import of the `Core.Array(T, N)` type into the global scope, much
like the implicit import of the `prelude` library of the `Core` package.

## Problem

Carbon's current syntax for a fixed-size, direct storage array (hereafter called
"array") is the provisional `[T; N]` and there is no syntax yet for a
dynamically-sized indirect storage buffer (hereafter called "heap-buffer").

Arrays and heap-buffers are some of the most commonly used types, after
primitive types. The syntax, whatever it is, will be incredibly frequent in
Carbon source code.

TODO: What problem are you trying to solve? How important is that problem? Who
is impacted by it?

## Background

TODO: Is there any background that readers should consider to fully understand
this problem and your approach to solving it?
We have developed a matrix for enumerating and describing the vocabulary of
owning array and buffer types. Direct refers to an in-place storage buffer, as
with arrays. Indirect refers to heap allocation, where the type itself holds
storage of a pointer to the buffer, as with heap-buffers.

To provide familiarity, here is the table for the C++ language as a baseline:

| Owning type | Runtime Sized | Compile-time Sized |
| ------------------------ | ---------------------- | ----------------------------------- |
| Direct, Immutable Size | - | `std::array<T, N>` / `T[N]` |
| Indirect, Immutable Size | `std::unique_ptr<T[]>` | `std::unique_ptr<std::array<T, N>>` |
| Indirect, Mutable Size | `std::vector<T>` | - |

### Rust

The Rust vocabulary is as follows:

| Owning type | Runtime Sized | Compile-time Sized |
| ------------------------ | ------------- | ------------------ |
| Direct, Immutable Size | - | `[T; N]` |
| Indirect, Immutable Size | `Box<[T]>` | `Box<[T; N]>` |
| Indirect, Mutable Size | `Vec<T>` | - |

There are a few things of note when comparing to C++:

- The Rust `Box` and `Vec` types are part of `std` but are imported into the
current scope automatically, so they do not need any prefix.
- The `[T]` type represents a fixed-runtime-size buffer. The type itself is
not instantiable since its size is not known at compile time. `Box` is
specialized for the type to store a runtime size in its own type.
- The array type syntax matches the Carbon provisional syntax.
- The heap-buffer type name matches the C++ `vector` type, but it is
privileged with a shorter name. The `Vec` type name is at most the same
length as an array type name (for the same `T`).

### Swift

The Swift vocabulary is significantly smaller, to support automatic refcounting:

| Owning type | Runtime Sized | Compile-time Sized |
| ------------------------ | ------------------ | ------------------ |
| Direct, Immutable Size | - | - |
| Indirect, Immutable Size | - | - |
| Indirect, Mutable Size | `Array<T>` / `[T]` | - |

Because there is no direct storage option, only one name is needed, and "Array"
is used to refer to a heap-buffer.

### Safe C++

The [Safe C++ proposal](https://safecpp.org/draft.html#tuples-arrays-and-slices)
introduces array syntax very similar to Rust:

| Owning type | Runtime Sized | Compile-time Sized |
| ------------------------ | --------------------- | ------------------- |
| Direct, Immutable Size | - | `[T; N]` |
| Indirect, Immutable Size | `std2::box<[T; dyn]>` | `std2::box<[T; N]>` |
| Indirect, Mutable Size | `std2::vector<T>` | - |

There are a few things of note:

- While Rust omits a size to indicate the size is known only at runtime, Safe
C++ uses a `dyn` keyword indicate the same.
- The heap-buffer type name is unchanged from C++, sticking with `vector`.

### Goals

It will help to establish some goals in order to weigh alternatives against.
These goals are based on the
[open discussion from 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?usp=sharing&resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA#heading=h.h0tg34pzq5yz),
where we discussed the
[Pointers, Arrays, Slices](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?usp=sharing)
document.

The goals here are largely informed by and trying to achieve the top level goal
of
["Code that is easy to read, understand, and write"](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write).
We define some more specific targets here as relate to the specifics of the
array syntax.

#### Privileging the most common type names

- "Explicitness must be balanced against conciseness, as verbosity and
ceremony add cognitive overhead for the reader, while explicitness reduces
the amount of outside context the reader must have or assume."

The more common it will be for a type to be used, the shorter we would like the
name to be. This follows from the presumption that we weigh conciseness as
increasingly valuable for types that will appear more frequently in Carbon code.

We expect the ordering of frequency in Carbon code to be:

- primitives ≈ tuples >> heap-buffers > arrays >> everything else[^1].

Where primitives are: machine-sized integers (8 bit, 16 bit, etc.),
machine-sized floating points, and pointers including slices[^2]. Function
parameters/arguments are an example of tuples.

From this, we derive that we want:

- Primitives and tuples to have the most concise names.
- We can lean on special syntax/keywords as needed to make them concise
but descriptive.
- Heap-buffers to have a concise name, even more so than arrays.
- We could use special syntax if needed to achieve conciseness.
- Arrays to have a concise name, but they do not need to be comparably concise
to primitives and tuples.
- We should try to avoid special syntax.
- Everything else should be written as idiomatic types with descriptive names.

[^1]:
"[chandlerc] Prioritize: slices first, then relocatable, then compile-time
sized, then everything else is vastly less common. Between those three, the
difference in frequency between the first two is the biggest." from
[open discussion on 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA&tab=t.0)

[^2]:
Slices are included with primitives for simplicity, since they will take the
place of many pointers in C++, giving them similar frequency to pointers,
and can be logically thought of as a bounded pointer.

#### Absence of syntax should make clear defaults

One way to write arrays and compile-time-sized slices is like we see in Rust:
`[T; N]` and `&[T; N]`. This suggests a relationship where array is like slice,
and the default form. But they are very different types, rather than a
modification of a single type, and this can be confusing[^3] for developers
learning the language.

[^3]: https://fire.asta.lgbt/notes/a1iay7r3e7or0a59 (content-warning: swearing)

We want to avoid the situation where
[absence of syntax](https://www.youtube.com/watch?v=-Hb-9TUyjoo), such as a
missing pointer indicator, changes the entire meaning of the remaining syntax or
is otherwise confusing.

#### Avoiding confusion with other languages

The most general meaning of "array" is a range of consecutive values in memory.

However in many languages it is used, either in formally or informally, to refer
to a direct-storage, immutably-sized memory range:

- C,
[colloquial](https://en.wikibooks.org/wiki/C_Programming/Arrays_and_strings)
- C++, colloquial (from C) and
[`std::array`](https://en.cppreference.com/w/cpp/container/array)
- Go, [colloquial](https://go.dev/tour/moretypes/6)
- Rust, [colloquial](https://doc.rust-lang.org/std/primitive.array.html)[^4]

In particular, this is the usage in the languages which Carbon will most
frequently interoperate, and/or from which code will be migrated to Carbon and
thus comments and variable names would use these terms in this way.

[^4]:
Maybe this is more formal than colloquial, but the name is not part of the
typename/syntax.

Languages which require shared ownership _don't have direct-storage arrays_, so
the same term gets used for indirect storage:

- Swift, [`Array`](https://developer.apple.com/documentation/swift/array)
- Javascript,
[`Array`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array)
- Java and Kotlin,
[`ArrayList`](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html)

And some languages use array to refer to both direct and indirect storage types.

- Dlang has direct-storage arrays
[colloqually](https://dlang.org/spec/arrays.html) and the indirect-storage
[`Array`](https://dlang.org/phobos/std_container_array.html) type.
- Pascal uses the presence or absence of a size to determine if
[`Array`](https://www.freepascal.org/docs-html/ref/refsu14.html) uses direct
(immutably-sized) or indirect (mutably-sized) storage.

TODO: Any other languages we'd like to include here?

In sum, languages which have direct-storage immutably-sized arrays use the term
"array" to refer to those, and most then use a separate name for the
indirect-storage type.

## Proposal

TODO: Briefly and at a high level, how do you propose to solve the problem? Why
will that in fact solve it?
The
[All APIs are library APIs principle](/docs/project/principles/library_apis_only.md)
states:

> In Carbon, every public function is declared in some Carbon API file.
As such, we propose a `Core` library type for a direct-storage immutably-sized
array, and then a shorthand for referring to that library type.

In line with other languages surveyed above, given the presence of a
direct-storage immutably-sized array in Carbon, we will reserve the unqualified
name "array" for this type. In full, its name is `Core.Array(T, N)`.

## Details
Because arrays will be very common in Carbon code, we want to privilege their
usage. We stated above that we want "arrays to have a concise name, but they do
not need to be comparably concise to primitives and tuples". As such, we will
allow the use of `Array` without naming `Core` through the use of a builtin
shortcut that simply forwards `Array(T, N)` to `Core.Array(T, N)`. Notably this
leaves room for supporting multi-dimensional arrays by adding further optional
size parameters, either in the `Array` type or in a similar sibling type.

TODO: Fully explain the details of the proposed solution.
Since the `Core.Array` type has a builtin shorthand, and the type it resolves to
should always be available, `Core.Array` will be placed in the `prelude` library
of `Core`. The `Core.Array` library type will need to interact with the compiler
through builtins in order to define its direct storage, but its methods can
largely be written directly in the `Core` package.

## Rationale

TODO: How does this proposal effectively advance Carbon's goals? Rather than
re-stating the full motivation, this should connect that motivation back to
Carbon's stated goals and principles. This may evolve during review. Use links
to appropriate sections of [`/docs/project/goals.md`](/docs/project/goals.md),
and/or to documents in [`/docs/project/principles`](/docs/project/principles).
For example:

- [Community and culture](/docs/project/goals.md#community-and-culture)
- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
- [Performance-critical software](/docs/project/goals.md#performance-critical-software)
- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms)
- [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development)
- [Modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments)
- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
As this proposal is addressing the question of introducing a new builtin and
library type, it is mostly focused on the goal
[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)

This proposal aims to make code easy to understand by having the builtin
forwarding name match exactly with the type in the `Core` library. This choice
makes the builtin conceptually equivalent to an implicit `import` of the type
into the global scope, much like the implicit `import` of the `prelude` library
of the `Core` package, something developers will already need to model in their
minds when reading Carbon code. That `Array` will be provided as a compiler
builtin is an implementation detail that the Carbon developer need not worry
about. It could as easily be implemented as an implicit `import`.

By having the builtin shorthand use the same name as the library type, we make
the builtin the least magical it could possibly be, while still maintaining its
primary benefit of conciseness.

We introduced some more specific sub-goals above:

1. Privileging the most common type names

This proposal privileges `Array` in line with the frequency it will appear in
code: The written name of arrays are shortened to avoid writing `Core.`, because
of the expected high frequency of arrays in Carbon code. But it avoids
introducing additional syntax (such as with `[T; N]` or `(1, 2)`) or breaking
naming rules (such as with a lowercase type name) because the frequency of use
of arrays will be much lower than that of primitives and tuples.

1. Absence of syntax should make clear defaults

We introduce a type name rather than making arrays look more like slices but
without being a pointer, in order to avoid the confusion raised when removing
syntax changes the meaning significantly, and especially in ways that differ
from defaults/options for a single language concept.

1. Avoiding confusion with other languages

We propose using the `Array` type name in line with how other languages use the
same term. When a direct-storage array type is part of the language, it's
consistently referred to as an "array" without qualifications.

Most importantly, the name is consistent with the meaning in C++ and its
standard library (`std::array<T, N>`) as well as with Rust, the languages which
we expect Carbon code to interact with the most.

## Alternatives considered

TODO: What alternative solutions have you considered?
1. `[T; N]`

This is the current syntax used by the toolchain, however it had the following
problems raised:

- It's very similar to the syntax for slices, which is `[T]`, but very
different in nature, being storage instead of a reference to storage.
- Given `[T]` is a slice, `[T; N]` would better suit a compile-time-sized
slice.

The syntax for a slice may also be changed, we discussed
[adding a pointer annotation](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?tab=t.0#heading=h.fahgww8db6f0)
to it, such as `[T]*` and `[T; N]*`. Some downsides remained:

- The `[T; N]*` syntax would be a fixed-size slice, rather than a pointer to
an array. This leaves no room for writing a pointer to an array, which can
indicate a different intent, that it always includes the full memory range
of the array.
- Removing the pointer annotation would change the meaning of the type
expression more then we'd like, since it would change from a slice into an
array, rather than pointer-to-an-array into an array.

1. `array [T; N]`

This introduces a keyword as a modifier of a fixed-size slice, rather than a
builtin forwarding type. While arrays will be very common, it's not clear that
they rise to the level of requiring breaking the languages naming rules (using a
lowercase name) in order to provide a shorthand. And the shorthand is longer in
the end than the `Array(T, N)` being proposed here. So this uses a larger
weirdness budget for privileging the type while achieving less conciseness.

This has a similar issue as with `[T; N]` but in the reverse. Removing the
`array` modifier keyword changes the meaning of the type expression in ways that
are larger than a default/modifier relationship. Fixed-size slices are not the
more-default array.

The use of a lowercase keyword also costs us by preventing users from using the
word `array` in variables, a name which is quite common.

1. `Core.Array(T, N)`

Proving just the library type is possible, but arrays will be one of the most
common types in Carbon code, as described earlier. Privileging them with a
shorthand that avoids `Core.` will help make Carbon code significantly more
concise, due to the frequency, without hurting understandability. This makes it
worth the tradeoff of putting a name into the global scope (by way of a builtin
type).

1. `array(T, N)`

This is very similar to the current proposal, just using a lowercase name for
the type name. This would break the language rules without making the result any
more concise for developers. It could highlight that it is a builtin type, but
we argued earlier that this is an implementation detail. Since developers have
to model the implicit import of `Core`'s `prelude` library already, modelling
the implicit import of `Core.Array` as `Array` will be at least as
straightforward. Using a name consistent with the language naming rules (with a
leading capital letter) is preferable in the absence of any strong benefit to
breaking the rules, which we don't see here. And since the frequency of use will
be lower than that of primitives and tuples, the amount of rule-breaking budget
for privileging the type is lower.

0 comments on commit 7729072

Please sign in to comment.