p4682: Array forwards to the prelude

We propose to add `Core.Array(T, N)` as a library type in the `prelude` library of the `Core` package. Since arrays are a very frequent type, we propose to privilege use of this type by providing a builtin `Array(T, N)` type in the global scope that resolves to the `Core.Array(T, N)` type. Users can model this as an implicit import of the `Core.Array(T, N)` type into the global scope, much like the implicit import of the `prelude` library of the `Core` package.
carbon-language · Jan 8, 2025 · 7729072 · 7729072
1 parent f2c8c1f
commit 7729072
Showing 1 changed file with 331 additions and 26 deletions.
diff --git a/proposals/p4682.md b/proposals/p4682.md
@@ -15,56 +15,361 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -   [Abstract](#abstract)
 -   [Problem](#problem)
 -   [Background](#background)
+    -   [Rust](#rust)
+    -   [Swift](#swift)
+    -   [Safe C++](#safe-c)
+    -   [Goals](#goals)
+        -   [Privileging the most common type names](#privileging-the-most-common-type-names)
+        -   [Absence of syntax should make clear defaults](#absence-of-syntax-should-make-clear-defaults)
+        -   [Avoiding confusion with other languages](#avoiding-confusion-with-other-languages)
 -   [Proposal](#proposal)
--   [Details](#details)
 -   [Rationale](#rationale)
 -   [Alternatives considered](#alternatives-considered)
 
 <!-- tocstop -->
 
 ## Abstract
 
-TODO: Describe, in a succinct paragraph, the gist of this document. This
-paragraph should be reproduced verbatim in the PR summary.
+We propose to add `Core.Array(T, N)` as a library type in the `prelude` library
+of the `Core` package. Since arrays are a very frequent type, we propose to
+privilege use of this type by providing a builtin `Array(T, N)` type in the
+global scope that resolves to the `Core.Array(T, N)` type. Users can model this
+as an implicit import of the `Core.Array(T, N)` type into the global scope, much
+like the implicit import of the `prelude` library of the `Core` package.
 
 ## Problem
 
+Carbon's current syntax for a fixed-size, direct storage array (hereafter called
+"array") is the provisional `[T; N]` and there is no syntax yet for a
+dynamically-sized indirect storage buffer (hereafter called "heap-buffer").
+
+Arrays and heap-buffers are some of the most commonly used types, after
+primitive types. The syntax, whatever it is, will be incredibly frequent in
+Carbon source code.
+
 TODO: What problem are you trying to solve? How important is that problem? Who
 is impacted by it?
 
 ## Background
 
-TODO: Is there any background that readers should consider to fully understand
-this problem and your approach to solving it?
+We have developed a matrix for enumerating and describing the vocabulary of
+owning array and buffer types. Direct refers to an in-place storage buffer, as
+with arrays. Indirect refers to heap allocation, where the type itself holds
+storage of a pointer to the buffer, as with heap-buffers.
+
+To provide familiarity, here is the table for the C++ language as a baseline:
+
+| Owning type              | Runtime Sized          | Compile-time Sized                  |
+| ------------------------ | ---------------------- | ----------------------------------- |
+| Direct, Immutable Size   | -                      | `std::array<T, N>` / `T[N]`         |
+| Indirect, Immutable Size | `std::unique_ptr<T[]>` | `std::unique_ptr<std::array<T, N>>` |
+| Indirect, Mutable Size   | `std::vector<T>`       | -                                   |
+
+### Rust
+
+The Rust vocabulary is as follows:
+
+| Owning type              | Runtime Sized | Compile-time Sized |
+| ------------------------ | ------------- | ------------------ |
+| Direct, Immutable Size   | -             | `[T; N]`           |
+| Indirect, Immutable Size | `Box<[T]>`    | `Box<[T; N]>`      |
+| Indirect, Mutable Size   | `Vec<T>`      | -                  |
+
+There are a few things of note when comparing to C++:
+
+-   The Rust `Box` and `Vec` types are part of `std` but are imported into the
+    current scope automatically, so they do not need any prefix.
+-   The `[T]` type represents a fixed-runtime-size buffer. The type itself is
+    not instantiable since its size is not known at compile time. `Box` is
+    specialized for the type to store a runtime size in its own type.
+-   The array type syntax matches the Carbon provisional syntax.
+-   The heap-buffer type name matches the C++ `vector` type, but it is
+    privileged with a shorter name. The `Vec` type name is at most the same
+    length as an array type name (for the same `T`).
+
+### Swift
+
+The Swift vocabulary is significantly smaller, to support automatic refcounting:
+
+| Owning type              | Runtime Sized      | Compile-time Sized |
+| ------------------------ | ------------------ | ------------------ |
+| Direct, Immutable Size   | -                  | -                  |
+| Indirect, Immutable Size | -                  | -                  |
+| Indirect, Mutable Size   | `Array<T>` / `[T]` | -                  |
+
+Because there is no direct storage option, only one name is needed, and "Array"
+is used to refer to a heap-buffer.
+
+### Safe C++
+
+The [Safe C++ proposal](https://safecpp.org/draft.html#tuples-arrays-and-slices)
+introduces array syntax very similar to Rust:
+
+| Owning type              | Runtime Sized         | Compile-time Sized  |
+| ------------------------ | --------------------- | ------------------- |
+| Direct, Immutable Size   | -                     | `[T; N]`            |
+| Indirect, Immutable Size | `std2::box<[T; dyn]>` | `std2::box<[T; N]>` |
+| Indirect, Mutable Size   | `std2::vector<T>`     | -                   |
+
+There are a few things of note:
+
+-   While Rust omits a size to indicate the size is known only at runtime, Safe
+    C++ uses a `dyn` keyword indicate the same.
+-   The heap-buffer type name is unchanged from C++, sticking with `vector`.
+
+### Goals
+
+It will help to establish some goals in order to weigh alternatives against.
+These goals are based on the
+[open discussion from 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?usp=sharing&resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA#heading=h.h0tg34pzq5yz),
+where we discussed the
+[Pointers, Arrays, Slices](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?usp=sharing)
+document.
+
+The goals here are largely informed by and trying to achieve the top level goal
+of
+["Code that is easy to read, understand, and write"](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write).
+We define some more specific targets here as relate to the specifics of the
+array syntax.
+
+#### Privileging the most common type names
+
+-   "Explicitness must be balanced against conciseness, as verbosity and
+    ceremony add cognitive overhead for the reader, while explicitness reduces
+    the amount of outside context the reader must have or assume."
+
+The more common it will be for a type to be used, the shorter we would like the
+name to be. This follows from the presumption that we weigh conciseness as
+increasingly valuable for types that will appear more frequently in Carbon code.
+
+We expect the ordering of frequency in Carbon code to be:
+
+-   primitives ≈ tuples >> heap-buffers > arrays >> everything else[^1].
+
+Where primitives are: machine-sized integers (8 bit, 16 bit, etc.),
+machine-sized floating points, and pointers including slices[^2]. Function
+parameters/arguments are an example of tuples.
+
+From this, we derive that we want:
+
+-   Primitives and tuples to have the most concise names.
+    -   We can lean on special syntax/keywords as needed to make them concise
+        but descriptive.
+-   Heap-buffers to have a concise name, even more so than arrays.
+    -   We could use special syntax if needed to achieve conciseness.
+-   Arrays to have a concise name, but they do not need to be comparably concise
+    to primitives and tuples.
+    -   We should try to avoid special syntax.
+-   Everything else should be written as idiomatic types with descriptive names.
+
+[^1]:
+    "[chandlerc] Prioritize: slices first, then relocatable, then compile-time
+    sized, then everything else is vastly less common. Between those three, the
+    difference in frequency between the first two is the biggest." from
+    [open discussion on 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA&tab=t.0)
+
+[^2]:
+    Slices are included with primitives for simplicity, since they will take the
+    place of many pointers in C++, giving them similar frequency to pointers,
+    and can be logically thought of as a bounded pointer.
+
+#### Absence of syntax should make clear defaults
+
+One way to write arrays and compile-time-sized slices is like we see in Rust:
+`[T; N]` and `&[T; N]`. This suggests a relationship where array is like slice,
+and the default form. But they are very different types, rather than a
+modification of a single type, and this can be confusing[^3] for developers
+learning the language.
+
+[^3]: https://fire.asta.lgbt/notes/a1iay7r3e7or0a59 (content-warning: swearing)
+
+We want to avoid the situation where
+[absence of syntax](https://www.youtube.com/watch?v=-Hb-9TUyjoo), such as a
+missing pointer indicator, changes the entire meaning of the remaining syntax or
+is otherwise confusing.
+
+#### Avoiding confusion with other languages
+
+The most general meaning of "array" is a range of consecutive values in memory.
+
+However in many languages it is used, either in formally or informally, to refer
+to a direct-storage, immutably-sized memory range:
+
+-   C,
+    [colloquial](https://en.wikibooks.org/wiki/C_Programming/Arrays_and_strings)
+-   C++, colloquial (from C) and
+    [`std::array`](https://en.cppreference.com/w/cpp/container/array)
+-   Go, [colloquial](https://go.dev/tour/moretypes/6)
+-   Rust, [colloquial](https://doc.rust-lang.org/std/primitive.array.html)[^4]
+
+In particular, this is the usage in the languages which Carbon will most
+frequently interoperate, and/or from which code will be migrated to Carbon and
+thus comments and variable names would use these terms in this way.
+
+[^4]:
+    Maybe this is more formal than colloquial, but the name is not part of the
+    typename/syntax.
+
+Languages which require shared ownership _don't have direct-storage arrays_, so
+the same term gets used for indirect storage:
+
+-   Swift, [`Array`](https://developer.apple.com/documentation/swift/array)
+-   Javascript,
+    [`Array`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array)
+-   Java and Kotlin,
+    [`ArrayList`](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html)
+
+And some languages use array to refer to both direct and indirect storage types.
+
+-   Dlang has direct-storage arrays
+    [colloqually](https://dlang.org/spec/arrays.html) and the indirect-storage
+    [`Array`](https://dlang.org/phobos/std_container_array.html) type.
+-   Pascal uses the presence or absence of a size to determine if
+    [`Array`](https://www.freepascal.org/docs-html/ref/refsu14.html) uses direct
+    (immutably-sized) or indirect (mutably-sized) storage.
+
+TODO: Any other languages we'd like to include here?
+
+In sum, languages which have direct-storage immutably-sized arrays use the term
+"array" to refer to those, and most then use a separate name for the
+indirect-storage type.
 
 ## Proposal
 
-TODO: Briefly and at a high level, how do you propose to solve the problem? Why
-will that in fact solve it?
+The
+[All APIs are library APIs principle](/docs/project/principles/library_apis_only.md)
+states:
+
+> In Carbon, every public function is declared in some Carbon API file.
+
+As such, we propose a `Core` library type for a direct-storage immutably-sized
+array, and then a shorthand for referring to that library type.
+
+In line with other languages surveyed above, given the presence of a
+direct-storage immutably-sized array in Carbon, we will reserve the unqualified
+name "array" for this type. In full, its name is `Core.Array(T, N)`.
 
-## Details
+Because arrays will be very common in Carbon code, we want to privilege their
+usage. We stated above that we want "arrays to have a concise name, but they do
+not need to be comparably concise to primitives and tuples". As such, we will
+allow the use of `Array` without naming `Core` through the use of a builtin
+shortcut that simply forwards `Array(T, N)` to `Core.Array(T, N)`. Notably this
+leaves room for supporting multi-dimensional arrays by adding further optional
+size parameters, either in the `Array` type or in a similar sibling type.
 
-TODO: Fully explain the details of the proposed solution.
+Since the `Core.Array` type has a builtin shorthand, and the type it resolves to
+should always be available, `Core.Array` will be placed in the `prelude` library
+of `Core`. The `Core.Array` library type will need to interact with the compiler
+through builtins in order to define its direct storage, but its methods can
+largely be written directly in the `Core` package.
 
 ## Rationale
 
-TODO: How does this proposal effectively advance Carbon's goals? Rather than
-re-stating the full motivation, this should connect that motivation back to
-Carbon's stated goals and principles. This may evolve during review. Use links
-to appropriate sections of [`/docs/project/goals.md`](/docs/project/goals.md),
-and/or to documents in [`/docs/project/principles`](/docs/project/principles).
-For example:
-
--   [Community and culture](/docs/project/goals.md#community-and-culture)
--   [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
--   [Performance-critical software](/docs/project/goals.md#performance-critical-software)
--   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
--   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
--   [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms)
--   [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development)
--   [Modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments)
--   [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
+As this proposal is addressing the question of introducing a new builtin and
+library type, it is mostly focused on the goal
+[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+
+This proposal aims to make code easy to understand by having the builtin
+forwarding name match exactly with the type in the `Core` library. This choice
+makes the builtin conceptually equivalent to an implicit `import` of the type
+into the global scope, much like the implicit `import` of the `prelude` library
+of the `Core` package, something developers will already need to model in their
+minds when reading Carbon code. That `Array` will be provided as a compiler
+builtin is an implementation detail that the Carbon developer need not worry
+about. It could as easily be implemented as an implicit `import`.
+
+By having the builtin shorthand use the same name as the library type, we make
+the builtin the least magical it could possibly be, while still maintaining its
+primary benefit of conciseness.
+
+We introduced some more specific sub-goals above:
+
+1. Privileging the most common type names
+
+This proposal privileges `Array` in line with the frequency it will appear in
+code: The written name of arrays are shortened to avoid writing `Core.`, because
+of the expected high frequency of arrays in Carbon code. But it avoids
+introducing additional syntax (such as with `[T; N]` or `(1, 2)`) or breaking
+naming rules (such as with a lowercase type name) because the frequency of use
+of arrays will be much lower than that of primitives and tuples.
+
+1. Absence of syntax should make clear defaults
+
+We introduce a type name rather than making arrays look more like slices but
+without being a pointer, in order to avoid the confusion raised when removing
+syntax changes the meaning significantly, and especially in ways that differ
+from defaults/options for a single language concept.
+
+1. Avoiding confusion with other languages
+
+We propose using the `Array` type name in line with how other languages use the
+same term. When a direct-storage array type is part of the language, it's
+consistently referred to as an "array" without qualifications.
+
+Most importantly, the name is consistent with the meaning in C++ and its
+standard library (`std::array<T, N>`) as well as with Rust, the languages which
+we expect Carbon code to interact with the most.
 
 ## Alternatives considered
 
-TODO: What alternative solutions have you considered?
+1. `[T; N]`
+
+This is the current syntax used by the toolchain, however it had the following
+problems raised:
+
+-   It's very similar to the syntax for slices, which is `[T]`, but very
+    different in nature, being storage instead of a reference to storage.
+-   Given `[T]` is a slice, `[T; N]` would better suit a compile-time-sized
+    slice.
+
+The syntax for a slice may also be changed, we discussed
+[adding a pointer annotation](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?tab=t.0#heading=h.fahgww8db6f0)
+to it, such as `[T]*` and `[T; N]*`. Some downsides remained:
+
+-   The `[T; N]*` syntax would be a fixed-size slice, rather than a pointer to
+    an array. This leaves no room for writing a pointer to an array, which can
+    indicate a different intent, that it always includes the full memory range
+    of the array.
+-   Removing the pointer annotation would change the meaning of the type
+    expression more then we'd like, since it would change from a slice into an
+    array, rather than pointer-to-an-array into an array.
+
+1. `array [T; N]`
+
+This introduces a keyword as a modifier of a fixed-size slice, rather than a
+builtin forwarding type. While arrays will be very common, it's not clear that
+they rise to the level of requiring breaking the languages naming rules (using a
+lowercase name) in order to provide a shorthand. And the shorthand is longer in
+the end than the `Array(T, N)` being proposed here. So this uses a larger
+weirdness budget for privileging the type while achieving less conciseness.
+
+This has a similar issue as with `[T; N]` but in the reverse. Removing the
+`array` modifier keyword changes the meaning of the type expression in ways that
+are larger than a default/modifier relationship. Fixed-size slices are not the
+more-default array.
+
+The use of a lowercase keyword also costs us by preventing users from using the
+word `array` in variables, a name which is quite common.
+
+1. `Core.Array(T, N)`
+
+Proving just the library type is possible, but arrays will be one of the most
+common types in Carbon code, as described earlier. Privileging them with a
+shorthand that avoids `Core.` will help make Carbon code significantly more
+concise, due to the frequency, without hurting understandability. This makes it
+worth the tradeoff of putting a name into the global scope (by way of a builtin
+type).
+
+1. `array(T, N)`
+
+This is very similar to the current proposal, just using a lowercase name for
+the type name. This would break the language rules without making the result any
+more concise for developers. It could highlight that it is a builtin type, but
+we argued earlier that this is an implementation detail. Since developers have
+to model the implicit import of `Core`'s `prelude` library already, modelling
+the implicit import of `Core.Array` as `Array` will be at least as
+straightforward. Using a name consistent with the language naming rules (with a
+leading capital letter) is preferable in the absence of any strong benefit to
+breaking the rules, which we don't see here. And since the frequency of use will
+be lower than that of primitives and tuples, the amount of rule-breaking budget
+for privileging the type is lower.