Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: permit conversion of []A to []B if A and B have same underlying type modulo struct tags #71183

Open
adonovan opened this issue Jan 8, 2025 · 23 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Jan 8, 2025

Background: It is not uncommon to need to convert a value of slice type []A to type []B, where A and B have the same representation (e.g. string). Unfortunately, one must allocate a copy.

Proposal: We propose to relax the restrictions so that, given var ( a A; aa []A; b B; bb []B), the conversion ([]B)(aa) is legal so long as both B(a) and A(b) are legal A and B have the same underlying type, ignoring struct tags. The result would be a slice of the same len, cap, and pointer, but a different type. In other words, the operation creates an alias, it does not allocate an array.

The requirement for the mutual assignability same underlying type is that if aa aliases bb, then assigning to aa[0] and reading from bb[0] effects a conversion from A to B, and vice versa. Therefore both conversions had better be legal. Consequently, the two types must have the same memory layout (among other requirements).

[Edited: I mistakenly started with "mutually convertible" but this is not sufficient for e.g. A=float32 B=float64.]

Prior art:

Loosely related:

@ianlancetaylor @griesemer

@gopherbot gopherbot added this to the Proposal milestone Jan 8, 2025
@ianlancetaylor ianlancetaylor added LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee labels Jan 8, 2025
@ianlancetaylor

This comment was marked as resolved.

@adonovan

This comment was marked as resolved.

@adonovan adonovan moved this to Incoming in Proposals Jan 8, 2025
@Jorropo
Copy link
Member

Jorropo commented Jan 8, 2025

float32float64 is valid, so is float64float32
Yet I do not see any reasonable (aka without introducing dynamic tying to everything) way to implement the following code in constant time using aliases:

func f(x []float32) []float64 { return []float64(x) }

same for differently sized integers, floats ←→ integers, ...

what am I missing ?

Edit:
This part tries address this

Consequently, the two types must have the same memory layout (among other requirements).

But I think it's unclear for []float64 ←→ []uint64. It seems to me like we need a formal « same underlying bit representation » and only {,u}int{,64,32,16,8} would end up working ?
And everything based on the same underlying type.

@ianlancetaylor

This comment was marked as resolved.

@ianlancetaylor
Copy link
Member

Agreed, I think mutual convertibility is not enough. I think we need to say that the underlying type is the same. (We could perhaps add some special language about channels, if we care.)

@jimmyfrasche
Copy link
Member

Another case where one would want something similar is, for example, converting a []fmt.Stringer to an []any. That's not mutually convertible so only one direction makes sense but, when it does make sense, the source and target do have the same representation so it could be aliased without a copy.

@jimmyfrasche
Copy link
Member

Since this came up in #69264, I'll use my example from there. In that I asked whether

func Mean(x []float64) float64

should be

func Mean[Slice ~[]E, E ~float64](x Slice) E

so that if you had a slice of type []Temp where type Temp float64 you could use it directly.

This proposal allows the API without generics to avoid a copy by allowing:

var temps []Temp = f()
tbar := Mean([]float64(temps))

But that doesn't do quite the same thing as the generic version which returns a value of type Temp so you'd need to write

Temp(Mean([]float64(temps)))

Further the generics version would accept a value with a type like type Temps []Temp and return a Temp. Would you be able to use that last line to do that or would it require another conversion first like:

Temp(Mean([]float64([]Temp(temps))))

?

If it were instead some function F that takes and returns a slice would it be

Temps([]Temp(F([]float64([]Temp(temps)))))
// or
Temps(F([]float64(temps)))

?

This would be a useful change but I don't know if it makes API design as nice and simple as one would hope so much as push it down the road.

@ianlancetaylor
Copy link
Member

@jimmyfrasche Actually, fmt.Stringer and any don't have the same representation. The empty interface any looks like https://go.googlesource.com/go/+/refs/heads/master/src/runtime/runtime2.go#184. A non-empty interface looks like https://go.googlesource.com/go/+/refs/heads/master/src/runtime/runtime2.go#179.

And in fact different non-empty interface types don't have the same representation, because they have a different itab layout.

@adonovan
Copy link
Member Author

adonovan commented Jan 8, 2025

Agreed, I think mutual convertibility is not enough. I think we need to say that the underlying type is the same.

Ah, thanks for the examples, I agree. I was trying to avoid having to say "same underlying type modulo struct tags", but that's what I thought I was getting at.

Another case where one would want something similar is, for example, converting a []fmt.Stringer to an []any. That's not mutually convertible so only one direction makes sense but, when it does make sense, the source and target do have the same representation so it could be aliased without a copy.

The aliased 1-element slice is a portal through which values can be pushed in both directions, hence the requirement that the constraint on A and B be symmetric. That means conversion from one interface type to another (or from concrete to interface) is not allowed.

@adonovan adonovan changed the title proposal: spec: permit conversion of []A to []B if A and B are mutually convertible proposal: spec: permit conversion of []A to []B if A and B have same underlying type module struct tags Jan 8, 2025
@adonovan adonovan changed the title proposal: spec: permit conversion of []A to []B if A and B have same underlying type module struct tags proposal: spec: permit conversion of []A to []B if A and B have same underlying type modulo struct tags Jan 8, 2025
@aclements
Copy link
Member

@adonovan and I were accidentally working on this proposal concurrently, so I'll add my details here.

Proposal details

We propose allowing explicit conversion from []T to []U if T and U have the same underlying type.

Specifically, update the Conversions section of the Go language spec with the following:

A non-constant value x can be converted to type T in any of these cases:

  • ignoring struct tags (see below), x's type and T are slice types that are not named types, and their slice element types are not type parameters but have identical underlying types.

This language exactly parallels the language on converting between pointer types.

Rationale

This comes up with some regularity on the golang-nuts@ mailing list (e.g., 1, 2, 3, 4, 5) and StackOverflow (e.g., 1, 2, 3, 4). In our experience, it often comes up in data analysis code, when the code wants to represent data using []T where T has the same underlying type as float64 (or int, etc) but call functions—often mathematical functions—that take []float64 ([]int, etc). Using []T confers additional type safety (it can’t accidentally be mixed up with other float-typed data), and the opportunity to add type-dependent methods like a custom String() method for printing.

Today, this problem can be partially solved using generics. A function like Sum(x []float64) float64 could instead be written func Sum[Slice []Elem, Elem ~float64](x Slice) Elem. However, we do not want to encourage this style. Furthermore, it doesn’t apply to the huge number of existing slice APIs.

This conversion is always safe and can be done in O(1) time. Without explicit support, this conversion requires either O(n) time or the use of unsafe. Similar arguments applied to slice-to-array-pointer conversion, which was added in Go 1.17..

This exact conversion we’re proposing to add is mentioned in the language FAQ, but the justification given for not supporting this conversion is primarily that “Go requires you to be explicit about type conversions.” This issue is proposing an explicit type conversion. It’s true that, unlike *T to *U, slice conversion does not change the method set of a value, but it does help change the method set of the contents of a slice. We consider that a feature of this proposal.

This conversion was briefly discussed in #29864, but in that issue, it was treated as a bug that the language doesn’t allow this. In contrast, the issue is a language change proposal that the language should allow this.

On #29864, there’s some concern that some slice conversions would require allocation and copying. We only propose conversions if the element types have the same underlying type, which will never require copying. Because Go already allows an explicit conversion from *T1 to *T2 if T1 and T2 have the same underlying type, it must be possible to do the equivalent slice conversion without copying.

Alternatives

Should we allow “deeper” conversions, like []*T1 to []*T2 where T1 and T2 have the same underlying type? We argue that we should not. While there are use cases for this, stopping at “one level deep” is a much simpler rule, almost always sufficient, and consistent with Go’s other explicit type conversion rules.

Should we allow similar conversions for the component types of map, chan, or func? We argue no simply because this doesn’t seem to come up nearly as often as slice conversions, making it unjustified complexity. Similarly, we could allow conversions from [N]T to [N]U. This likewise doesn’t seem to carry its weight, and with the addition of slice conversions, this can be expressed as (*[N]U)(([]U)(x[:])), which is not pretty, but will run in O(1) time.

@adonovan
Copy link
Member Author

adonovan commented Jan 8, 2025

@ianlancetaylor points out that there's no fundamental reason the same approach couldn't be generalized to other composite types such as arrays, maps, functions, structs, and channels, so long as the corresponding pairs of key/value, param/result, field, or element types have identical underlying types. So, for example, conversions betweenmap[int]string and map[http.ConnState]http.Dir would both preserve type safety.

The earlier proposal #19778 is even more general, relying on the notion of deep structural equality and "memory layout", which is not really a concept in the spec; also it is plainly unsafe w.r.t. interfaces. (An io.Reader and an io.Writer both have the same memory layout---an interface--but being able to freely mutually convert one to another would lead to the wrong methods being called, or crashes.)

[Update: the map and chan data structures might retain the type used in the call to make, which is a principled reason not to extend this proposal to those types.]

@josharian
Copy link
Contributor

josharian commented Jan 8, 2025

Another case where one would want something similar is, for example, converting a []fmt.Stringer to an []any

Actually, fmt.Stringer and any don't have the same representation.

And even if they did, we wouldn't want to prevent future alterations to the underlying representations.

I can't think offhand of cases in which we'd want different memory layouts for values with the same underlying type, because of the ability to take the address of any given element. But worth double-checking to make sure this doesn't create any future implementation restrictions.

@josharian
Copy link
Contributor

josharian commented Jan 8, 2025

So, for example, conversions betweenmap[int]string and map[http.ConnState]http.Dir would both preserve type safety.

I can't think offhand of cases in which we'd want different memory layouts for values with the same underlying type, because of the ability to take the address of any given element.

hmm. map elements aren't addressable. so one could store e.g. slices in a compressed format (combined len/cap) until that didn't work any more for that particular map, at which point we'd set a flag in the map header and change the representation throughout. we considered doing something not dissimilar for small readonly maps, so this isn't hypothetical.

ditto for values in a channel.

so at least in theory, doing this with maps and channels could end up posing an implementation restriction.

@adonovan
Copy link
Member Author

adonovan commented Jan 8, 2025

hmm. map elements aren't addressable. so one could store e.g. slices in a compressed format (combined len/cap) until that didn't work any more for that particular map, at which point we'd set a flag in the map header and change the representation throughout. we considered doing something not dissimilar for small readonly maps, so this isn't hypothetical.

I didn't quite follow that. Do you mean that the representation of maps might globally vary based on the rtypes of its keys and values?

Your comment made me wonder whether the type descriptor used to make a map is recorded within the map. It doesn't seem to be, but perhaps that may change in future. For a channel, the element type used to make it is recorded within it. That could be an obstacle.

@josharian
Copy link
Contributor

josharian commented Jan 8, 2025

Do you mean that the representation of maps might globally vary based on the rtypes of its keys and values?

Yeah, seems implausible.

Your comment made me wonder whether the type descriptor used to make a map is recorded within the map. It doesn't seem to be,

There are generated algs recorded there (eq, hash), and last I was aware we don't fully canonicalize them in the compiler. (I tried, long ago, and got stuck.) So there are at least artifacts of the original types in the headers.

@jimmyfrasche
Copy link
Member

This would also mean you could not change the underlying type of A or B except in unison or such conversions would fail to compile

@adonovan
Copy link
Member Author

adonovan commented Jan 9, 2025

This would also mean you could not change the underlying type of A or B except in unison or such conversions would fail to compile

That's true, but it's equally true of many other pairs of types related by conversions, such as concrete types and interfaces.

@Merovius
Copy link
Contributor

Merovius commented Jan 9, 2025

@jimmyfrasche

This would also mean you could not change the underlying type of A or B except in unison or such conversions would fail to compile

Changing the underlying type is always a breaking change. Consider that if A has underlying type X, you could have func F[T ~X]() and a call F[A]().

@robpike
Copy link
Contributor

robpike commented Jan 9, 2025

I think this is a bad idea because of the door it opens. It doesn't go far enough for all the questions it suggests, and going any further could lead to extremely complicated problems. The decision to not do this goes right back to the beginning of the language and our desire to keep things simple to explain. Please do not make this change.

@jimmyfrasche
Copy link
Member

I think using generics provides a nicer API than this language change but I don't think that solves all the problem presented here either.

For general slices, there have been some calls (forgive me not tracking down the old issues) to let copy do a copy and conversion which covers more than this, albeit with a copy. Perhaps that should be reconsidered?

There are probably some times when you'd like to do this in performance sensitive code without a copy. Maybe this conversion should go in unsafe? Something like unsafe.AliasSliceAs[T](slice)?

@AndrewHarrisSPU
Copy link

AndrewHarrisSPU commented Jan 9, 2025

There are probably some times when you'd like to do this in performance sensitive code without a copy. Maybe this conversion should go in unsafe? Something like unsafe.AliasSliceAs[T](slice)?

I'm a bit sympathetic to the cases (and less so to other cases) where this zero or fixed cost conversion is desired for slices of floats and ints going into standard / third-party math packages. But would unsafe be required by callers of these packages to accomplish the efficient conversion, and is that worth it? Is there something for the numerical types/math set of cases that could be done by callee packages (like math)?

@adonovan
Copy link
Member Author

@jimmyfrasche: There are probably some times when you'd like to do this in performance-sensitive code without a copy. Maybe this conversion should go in unsafe? Something like unsafe.AliasSliceAsT?

The proposed operation preserves type safety whereas unsafe.AliasSliceAs does not. Let's keep unsafe ideas out-of-scope here.

@robpike: I think this is a bad idea because of the door it opens. It doesn't go far enough for all the questions it suggests, and going any further could lead to extremely complicated problems. The decision to not do this goes right back to the beginning of the language and our desire to keep things simple to explain.

Could you elaborate on what door it opens? Are you concerned that similar conversions for other data types (such as maps and channels) will follow, despite the qualitative difference that those data structures may internally retain knowledge of the type used to make them? What kind of problem do you anticipate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Projects
None yet
Development

No branches or pull requests