Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Core.Array type for direct storage immutably-sized buffers #4682

Open
wants to merge 14 commits into
base: trunk
Choose a base branch
from

Conversation

danakj
Copy link
Contributor

@danakj danakj commented Dec 13, 2024

We propose to add Core.Array(T, N) as a library type in the Core package. Since arrays are a very frequent type, we propose to privilege use of this type by including it in the prelude library of the package.

We would like to see a shorthand where Core.Array is automatically imported into the file scope, and this proposal includes future work to this effect.

@danakj danakj added proposal A proposal proposal draft Proposal in draft, not ready for review labels Dec 13, 2024
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from e5ce9a2 to 1eac5a0 Compare December 13, 2024 15:02
@danakj danakj force-pushed the proposal-array-forwards-to-th branch 4 times, most recently from 7729072 to 41cdb12 Compare January 8, 2025 19:35
@danakj danakj marked this pull request as ready for review January 8, 2025 19:36
@github-actions github-actions bot added proposal rfc Proposal with request-for-comment sent out and removed proposal draft Proposal in draft, not ready for review labels Jan 8, 2025
@danakj danakj requested review from zygoloid and chandlerc January 8, 2025 19:36
@github-actions github-actions bot requested a review from KateGregory January 8, 2025 19:36
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 41cdb12 to fca98cc Compare January 8, 2025 19:39
@clavin
Copy link
Contributor

clavin commented Jan 13, 2025

Wonderful proposal! 😄 The rationale behind this design is very impressive.

I wanted to highlight this information for your consideration: Swift is currently working on a new standard array type whose size is fixed at compile-time. In the proposal it is called Vector, along with a section discussing the decision behind this name; however, the associated implementation PR that just landed renamed the type to Slab.

One part of that proposal stuck out to me and reminded me of this proposal. They establish that they cannot use the name Swift.Array for this new type as it is already taken. To justify the name Swift.Vector instead, they argue that the naming of std::vector may have been mistake in hindsight:

A. Stepanov mentions in his book, "From Mathematics to Generic Programming", that using the name std::vector for their dynamically allocated growable array type was perhaps a mistake for this same reason: [...]

However, they clearly acknowledge that this decision is a potential source of confusion, especially for developers coming from C++:

We fully acknowledge that the Swift types, Swift.Array and Swift.Vector, are complete opposites of the C++ ones, std::vector and std::array.

It seems they ultimately walked back this decision, given the rename to Slab, for what reason I can only guess is to avoid this confusion. This was the overwhelming topic of discussion/bikeshedding in the proposal's second review (based on a quick skim), but I couldn't find an official decision anywhere.

This work in Swift may be too recent and volatile to derive any design decisions in Carbon right now, but it could add beneficial discussion to this proposal.

Consider noting this ongoing design work in the Background section for Swift. Also, consider discussing the argument that std::vector and std::array are misnomers in reference to the mathematical terms, strengthening the decision to stick with the term "array" that is familiar to C++ and Rust developers.

proposals/p4682.md Show resolved Hide resolved
proposals/p4682.md Outdated Show resolved Hide resolved
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 3976e27 to 8d90337 Compare January 20, 2025 16:25
@danakj danakj changed the title Array forwards to the prelude The Core.Array type for direct storage immutably-sized buffers Jan 20, 2025
danakj and others added 12 commits January 20, 2025 13:47
We propose to add `Core.Array(T, N)` as a library type in the `Core`
package. Since arrays are a very frequent type, we propose to privilege
use of this type by including it in the `prelude` library of the
package.

We would like to see a shorthand where `Core.Array` is automatically
imported into the file scope, and this proposal includes future work to
this effect.
We propose to add `Core.Array(T, N)` as a library type in the `prelude`
library of the `Core` package. Since arrays are a very frequent type,
we propose to privilege use of this type by providing a builtin
`Array(T, N)` type that resolves to the `Core.Array(T, N)` type. Users
can model this as an implicit import of the `Core.Array(T, N)` type
into the global scope, much like the implicit import of the `prelude`
library of the `Core` package.
imports are typically in the file scope, they don't become part of the package
@danakj danakj force-pushed the proposal-array-forwards-to-th branch from 09c2e46 to 7d390dd Compare January 20, 2025 18:49
Copy link
Contributor Author

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, PTAL

proposals/p4682.md Show resolved Hide resolved
proposals/p4682.md Outdated Show resolved Hide resolved
@danakj danakj requested a review from zygoloid January 20, 2025 18:54
@danakj
Copy link
Contributor Author

danakj commented Jan 20, 2025

@clavin Thanks for all the links and context regarding the Swift Slab type, I've also added this context into the proposal's background. And the confusion re: vector as a name into the rationale discussion.

Comment on lines +67 to +68
with arrays. Indirect refers to heap allocation, where the type itself holds
storage of a pointer to the buffer, as with heap-buffers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it really influences this proposal in either direction, but I'm a bit uncomfortable with establishing a dichotomy between indirect vs. direct here. I'm not sure that's the focal point we want to rely on here.

For example, I worry that would lead towards designs that preclude small-size optimization or other techniques that don't use the heap necessarily...

I lean a bit towards static vs. dynamic allocation, static vs. dynamic immutable vs. dynamic mutable size as the classification basis...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely did not mean to weigh in on SSO, but also I would say SSO is an optimization of indirect storage in a type when it's value is small, rather than being a direct-storage type.

I don't know that static vs dynamic quite captures the intent here, it feels a bit worse to me. You can have static allocations on the heap and that's not what an array is, so we lose some of the array vs heap-array difference. Would it be enough to just explicitly discuss that I am talking about direct vs indirect storage as the type category and SSO as an optimization doesn't change the type from an indirect-storage type?


| Owning type | Runtime Sized | Compile-time Sized |
| ------------------------ | ---------------------- | ----------------------------------- |
| Direct, Immutable Size | - | `std::array<T, N>` / `T[N]` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would list T[N] as primary here -- I don't think std::array has nearly as much uptake as most of the other alternatives. Even with unique_ptr, I suspect it is very common for code to not actually use std::array even when it could (and possibly should in that case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. (I was trying to center the C++ idiom rather than the C one.)

- Everything else should be written as idiomatic types with descriptive names.

[^1]:
"[chandlerc] Prioritize: slices first, then relocatable, then compile-time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this got minuted as "relocatable", pretty sure I was saying resizable... (which matches what you're suggesting above I think).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are saying the same thing - when you resize the pointer moves (is relocated) and external references are invalidated, but you're probably thinking of a different context of relocatable. I will reword to say resizeable storage (to not be ambiguous with a fixed-size storage but resizable container within it, where resizing does not change the pointer).

Comment on lines +426 to +428
- Given such a feature, we could consider moving `i8`, `bool`, etc to be
aliases in the `prelude` that are in the set of names automatically imported
into the file scope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hard as they specifically use syntax outside of what we would use for names in the prelude, and different syntactic conventions than you're suggesting for Array.

I think this is an important point that isn't fully clear to me from the proposal. I can see a few different possible interpretations of the goal of this proposal:

  • a) Array is different from a keyword like bool or self in an important way, and instead is a library type name that happens to be imported into the file scope automatically. But bool and i8 and self remain keywords or literal syntaxes and are not intended to behave as library type names imported automatically.
  • b) Array is a keyword, just like bool and self and such, but is designed to follow the conventions of library types in order to be similar to the name it aliases: Core.Array.
  • c) Array is not intended to be a keyword, but an automatically imported name from the prelude, but it is also not intended to be different from bool or i8, and all of the things currently designed as keywords referencing types in the prelude should eventually become automatically imported names from the prelude.

I think most of the proposal reads like (a), but this paragraph reads a like (c). And I think it's worth having an explicit discussion and decision around whether we want (b).

FWIW, in the original discussion around how to name things like bool and such, it was brought up that these auto-imported names are very close to keywords, and it isn't clear there is any real advantage to having them not be keywords. We even have a good model for how to workaround name collisions with keywords. And I think it's somewhat important that if we're now going away from that model, why, and what the tradeoff is we're making.

Copy link
Contributor Author

@danakj danakj Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that Array is different from i8, bool etc because of
a) its frequency of use
b) it's not a machine primitive, it's an aggregate (but that largely impacts its frequency of use)

There's no intention to have similar/different behaviour, as types are just types. It sounds like you're saying keywords behave differently than library types for the Carbon developer, and that isn't clear at all to me. That distinction seems like an implementation detail of the language that does not change how a developer would use the types at all in Carbon code.

For instance, I am not sure what you mean by i8 etc "remain keywords or literal syntaxes and are not intended to behave as library type names". I don't think I could tell the difference in behaviour between them being a keyword vs them being a type alias in Core that is imported into file scope. What would change about their behaviour?

As such c) seems like a reasonable eventual outcome for me. But if there's a difference in user-visible behaviour, then that would be a reason to (separately from the decision to push a name into the file scope or not) implement it as a keyword instead of a library import. Is there?

Self OTOH is necessarily different from a library type, as its meaning is context-dependent. So it needs to be a keyword, and self similarly can't be a global value in the library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I could tell the difference in behaviour between them being a keyword vs them being a type alias in Core that is imported into file scope. What would change about their behaviour?

Concretely:

namespace N;

// Error, bool is a keyword.
fn bool();
// Error, bool is a keyword.
fn N.bool();
// OK, this function's name is `bool`.
fn r#bool();

alias Array = Core.Array;
// Error, already have an `Array` in this scope.
fn Array();
// OK, but uses of this might be rejected due to shadowing.
fn N.Array();
  // OK
  fn Use1() { N.Array(); }
  // Possibly invalid due to shadowing ambiguity.
  fn N.Use2() { Array(); }
// Error, already have an `Array` in this scope.
fn r#Array();

mutably-sized indirect storage buffer (hereafter called "heap-buffer").

Arrays and heap-buffers are some of the most commonly used types, after
primitive types. The syntax, whatever it is, will be incredibly frequent in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the design considers all types in the prelude to be primitive types, which would include arrays. We even hypothesize that the string type is a primitive type. That seems to not align well with how this proposal uses the term, as you seem to consistently draw a contrast between primitive types and arrays.

In general, I find it hard to argue in Carbon that bool, i8, and T* are categorically distinct from the array type you are proposing...

Sadly, I don't have a great suggested alternative. This is maybe a good topic for discussion to try and pin down how we want to talk about these things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see an array as a primitive, but as an aggregate of objects (primitives or non-primitive types).

In this proposal I separated them because I needed a way to talk about the different categories of frequency of use, and primitives (as I called them) as a group occur with much greater frequency than arrays do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want a term here, C++ calls the primitive types that are not defined in terms of any other type (bool, iN, but not T*) "fundamental types", which I think sort of fits for us too.

@danakj danakj requested a review from chandlerc January 21, 2025 18:01
@danakj
Copy link
Contributor Author

danakj commented Feb 4, 2025

We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:

  • Use a lowercase name to indicate it's a thing you have to memorize at the top level, as a higher priority than matching the name exactly that it forwards to: array(T, N)
  • Use a builtin specifically, which (as mentioned in comments above, and which I didn't understand while writing the original proposal text) is different than an implicit import from Core, as it disallows using the name array in scopes other than the file scope.

Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit.

@chandlerc
Copy link
Contributor

We discussed this in the toolchain sync a few weeks ago, and at the time it sounded like the leads were all landing on the idea that:

  • Use a lowercase name to indicate it's a thing you have to memorize at the top level, as a higher priority than matching the name exactly that it forwards to: array(T, N)
  • Use a builtin specifically, which (as mentioned in comments above, and which I didn't understand while writing the original proposal text) is different than an implicit import from Core, as it disallows using the name array in scopes other than the file scope.

Recording these here for now, but waiting to see if the leads really converge on this outcome or decide to alter it a bit.

So, if you (or anyone else) would like to record a more formal decision, I think the process we use to get there is to file a leads question with roughly the content above and a link here for context if needed. I'm also happy to do that if useful.

We don't always do this -- if folks reach a happy consensus, including the author of a proposal, can just update the proposal and ask the leads to take another look at it. It's basically an optional process for factoring out a smaller decision from a larger "yes" or "no" on a proposal as a whole.

Sorry if that process wasn't clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal rfc Proposal with request-for-comment sent out proposal A proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants