Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Placement New #17057

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

WalterBright
Copy link
Member

This implements Placement New, described in https://github.com/WalterBright/documents/blob/master/placementnew.md

@WalterBright WalterBright added Severity:Enhancement Review:WIP Work In Progress - not ready for review or pulling labels Nov 10, 2024
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#17057"

@WalterBright WalterBright force-pushed the placementNew branch 3 times, most recently from 29a5064 to 4db86a4 Compare November 10, 2024 07:46
@thewilsonator thewilsonator added Review:Needs Changelog A changelog entry needs to be added to /changelog Review:Needs Spec PR A PR updating the language specification needs to be submitted to dlang.org Review:Needs Tests Severity:New Language Feature labels Nov 10, 2024
@WalterBright WalterBright force-pushed the placementNew branch 6 times, most recently from 311f2cb to fd23a02 Compare November 12, 2024 08:55
@WalterBright
Copy link
Member Author

I got it to work for some basic cases. Next comes extending it to more complex ones.

@WalterBright WalterBright force-pushed the placementNew branch 3 times, most recently from 6456c9a to c1d3813 Compare November 13, 2024 09:23
@WalterBright
Copy link
Member Author

Placement new for structs seem to be working now!

@WalterBright WalterBright force-pushed the placementNew branch 4 times, most recently from 5f00e37 to 49f4a4c Compare November 14, 2024 08:17
@Connor-GH
Copy link

Connor-GH commented Nov 15, 2024

If one desires to use classes without the GC, such as in BetterC, it's just awkward to use emplace

If i'm reading this right, this will allow classes in betterC, using a kind of "allocator argument" like in $OTHER_LANGUAGES? Once upon a time I made my own internal fork of dmd that forced betterC on all D code but of course that didn't last long because it couldn't compile Phobos.

(I should clarify, my usecase is making a kernel with betterC and so far I have had to use a hacky mixin and alias this in order to use inheritance. Also, will this unlock the door for interfaces?)

@WalterBright WalterBright force-pushed the placementNew branch 2 times, most recently from 16cdbd6 to 448abed Compare November 15, 2024 04:51
@TurkeyMan
Copy link
Contributor

return malloc(T.sizeof)[0 .. T.sizeof] should be sufficient; you don't need that weird thunk.

It doesn't work because it returns a void[], and its size cannot be checked at compile time. The static array works, I tried it and looked at the assembler.

No, your code you wrote:

ref void[T.sizeof] mallocate(T)() {
    return *(cast(void[T.sizeof]*) malloc(T.sizeof)); // <-- this is gross and blind cast/thunk is unsafe
}

You don't need to write that. Just write what I suggested instead; you don't need the thunk:

ref void[T.sizeof] mallocate(T)() {
    return malloc(T.sizeof)[0 .. T.sizeof]; // <-- this is not gross and it is safe
}

we return void[] from allocation functions

Not a problem, it's still castable to a pointer to a static array, since the length is known in advance.

Don't cast, just slice it to the proper length.

The whole point is that in idiomatic and safe code, typed variables will never be handled at any time when they are in an invalid state.

Placement new is never going to be safe.

I don't see why you think this... it's definitely safe. The example code you wrote above is already safe, ie:

struct S {    int i = 1, j = 4, k = 9; }

ref void[T.sizeof] mallocate(T)() {
    return malloc(T.sizeof)[0 .. T.sizeof]; // no casts, nothing nasty
}

void main() {
    S* ps = new(mallocate!S()) S;
    assert(ps.i == 1);
    assert(ps.j == 4);
    assert(ps.k == 9);
}

What's unsafe about this? It's completely fine by my reading.

Prefixing a destroy() means that void-initialized objects will also be destroyed, and that's UB for sure.

I don't know what prefixing a destroy means, but you're scaring me... nothing weird please! I'm satisfied at this point that we're going the right direction here!


void test6()
{
S6* ps = new(mallocate!S6()) S6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case we recently discussed... but what are all the other cases?
I thought we finally agreed on raw buffer initialisation using sized void buffers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether the void array buffers or used or a variable is up to the user. I left it in because it makes it most convenient when dealing with unions / optional types / sum types. The implementation doesn't care what type it is, only that it is an lvalue of sufficient size.

I didn't add tests for other cases, as they would be redundant. A tutorial should include them, however.

void test0()
{
int i;
int* pi = new (i) int;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we ejected this pattern into space? Please, just no.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I responded to this in the other comment.

@Connor-GH
Copy link

This is code straight ripped from my kernel:

module memory;
import traits;
import kalloc : kmalloc, kfree;

T *d_new(T, Args...)(auto ref Args args) {
	import core.lifetime : emplace, forward;
	T *mem = cast(T *)kmalloc(T.sizeof);
	return mem.emplace(forward!args);
}
void d_delete(T)(T *data) {
	if (data is null)
		return;
	kfree(data);
}

I believe placement new would replace at least the d_new, but what about the d_delete? delete is deprecated, and in a betterC environment, I don't have access to classes and interfaces to even make an interface to the GC. In other words, I have no way of being able to use keyword new and keyword delete.

@WalterBright
Copy link
Member Author

ref void[T.sizeof] mallocate(T)() {
    return malloc(T.sizeof)[0 .. T.sizeof]; // <-- this is not gross and it is safe
}

you're right, that is better. Thanks!

What's unsafe about this?

It's equivalent to this:

struct T { int* a,b,c; }

void foo(T);

@safe void test(void[24] a) {
    foo(cast(T)a); // Error: cast from `void[24]` to `T` not allowed in safe code
}

I.e. the compiler cannot tell what it's stepping on, as it does not know what the provenance of the void[] is.

@TurkeyMan
Copy link
Contributor

Okay, so, what I'm suggesting by new accepting void[](and only that), is that it DOES know; it can assume (written in the spec even), that the argument it receives is raw memory, and any argument that is not raw memory is an invalid argument. It can be confident that given raw memory, it has agency to construct a new value, and it is safe to do so.

This assumption and assertion only works well when the function is specified to accept void[].
This assumption gets awkward and tenuous if new accepts a ref T, because there's nothing stopping the user from passing a live or valid T, or distinguishing it from an uninitialised T; and new can't know. But if new only accepts void[], then it knows it received raw memory, and the only way a user could pass something risky is with an explicit cast in user code, at which point the user has accepted an unsafe operation in support of their optimisation.

If the user has responsibility for the unsafe cast, then the operation itself is safe... this is quite specifically why I reject the idea that new would receive a T so vehemently; it fundamentally undermines the opportunity to make this whole thing safe, and wrecks lifetime analysis.

@WalterBright
Copy link
Member Author

D allows conversion of a type to *void in @safe code, but not the other way around. Passing a void[] through new to convert it to an object is not checkably safe.

@System code is not necessarily unsafe, it just means its up to the user to ensure its safety as the compiler cannot do it. Memory allocation and initialization is always going to be uncheckable unless the language controls the whole process - like what gc new does and like what stack variables do.

If the user has responsibility for the unsafe cast, then the operation itself is safe

Sure, but it cannot be marked @safe.

Not accepting T as an argument means the user will have to cast it to void*, which will look ugly. Casting to void* is a safe operation, so it becomes an ugly construct with no added value.

Taking a T argument is useful for unions, option types, and sum types. Although you don't use them, they are popular constructs (and I use them in D and would likely use them even more with placement new!).

Note there's an inconsistency in the compiler - in @safe code, a T cannot be cast to a void[T.sizeof], but an &T can be cast to void*. https://issues.dlang.org/show_bug.cgi?id=24866

The ref T semantics was chosen because:

  1. under the hood a simple pointer is passed - this is the most efficient code
  2. the size of the object pointed to by the pointer is known at compile time and can be checked at compile time

Such efficiency matters a great deal with unions, option types, and sum types. We have to be competitive.

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Nov 19, 2024

I don't see how you get from ref T to void[] (or void[N]) though without unsafe casts, even considering the implicit cast from T* to void*. You need to write something like (&lvalue)[0..1]; taking the pointer is safe, but slicing a pointer to an array is not (pointer doesn't know the length); so the implicit cast to void array is not where it gets caught; in this expression, it's the blind slice on the T*.

Not accepting T as an argument means the user will have to cast it to void*, which will look ugly. Casting to void* is a safe operation, so it becomes an ugly construct with no added value.

I don't think it looks ugly, it looks appropriate. It's visible to the reader that you're taking the buffer from under a value; which is what you are doing. I think the clarity of that operation holds huge value.

And again, you're arguing from niche-cases; the default case that people will type doesn't encounter any of this; the 2 cases you've demonstrated that do are your union (unsafe by definition; special handling is completely appropriate), and a void-init stack variable; which is fairly uncommon; in almost all circumstances you just let the declaration init the value... but a void-init stack variable is also unsafe, and requires handling a typed value before it's alive. Maybe the programmers in the safe-future will see a shift to placing buffers on the stack rather than uninitialised lvalues.

I think it's fundamentally bad for the language at the lowest level to promote or require handling invalid-but-already-typed objects. If the variable hasn't been initialised yet, it's memory, so it's void[].

Talking about option types and sum types is a red-herring; I use option types and sum types too, but they are a tool in the toolbox. You don't write placement new ever while using those objects; they have strong API associated with initialisation and assignment. Everything's internal, nobody will handle placement new expressions when interacting with those objects.

Optimise for what people actually do in their code, which is allocate memory and then initialise it.

Note there's an inconsistency in the compiler - in @safe code, a T cannot be cast to a void[T.sizeof], but an &T can be cast to void*. https://issues.dlang.org/show_bug.cgi?id=24866

Is that an inconsistency? That's specifically what I'm taking advantage of.
It's never occurred to me to close that gap that way... I quite like that how it is.

The ref T semantics was chosen because:

They weren't "chosen"; you just trivially decided it with no warning or discussion, against the strongest advice I can possibly muster.

  1. under the hood a simple pointer is passed - this is the most efficient code
  2. the size of the object pointed to by the pointer is known at compile time and can be checked at compile time

void[N] can be passed by ref, and the size is known at compile time.

Such efficiency matters a great deal with unions, option types, and sum types. We have to be competitive.

Complete red herring; I'm not suggesting anything that would change the codegen in any way whatsoever; I'm interested in getting the expression right; clear, self-explanatory, and relatively foolproof. Nothing I'm discussing changes a single thing about the codegen.

I have proposed the OPTION to accept void[] in ADDITION to void[N], which would causes a runtime bound check to be emit, but if that doesn't land, I'm happy with void[N] only; the user will have to slice to N length, and cause the runtime bounds check at the users slice expression rather than the new expression.

@WalterBright
Copy link
Member Author

I have proposed the OPTION to accept void[] in ADDITION to void[N]

I avoided that because if T is a void[] then the compiler will have to introduce a special case to have T not be the target but what it refers to being the target.

Talking about option types and sum types is a red-herring

I use emplace 206 times in the compiler front end. I use a more primitive method extensively in the optimizer and back end. It's also used extensively in Phobos.

We're just going in circles now repeating ourselves.

The good news is that it works just fine for your use case. You can get rid of emplace now! I'm a bit less lucky, as we're still using an ancient version of D for the bootstrap compiler, grr grr.

you just trivially decided it with no warning or discussion

This is a bit unfair. If Bob writes a proposal, he makes numerous decisions. People then ask for a reference implementation. Bob implements it, discovering that more decisions have to be made. Then the discussion happens. That's normally how things work.

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Nov 20, 2024

I have proposed the OPTION to accept void[] in ADDITION to void[N]

I avoided that because if T is a void[] then the compiler will have to introduce a special case to have T not be the target but what it refers to being the target.

I don't follow. I don't see how there's any room for special cases here?

Talking about option types and sum types is a red-herring

I use emplace 206 times in the compiler front end. I use a more primitive method extensively in the optimizer and back end. It's also used extensively in Phobos.

How is it used? If you're doing low-level/unsafe hack-ey stuff (like void init, and your unions), then just wrap a principled new in a function you like. The language should present the clearest and most fool-proof spec possible.

The principle is as simple and clear as day: before the new there is no T, after the new, then T has come to exist. new has no business accepting any typed variable as input; confusion and user-error are guaranteed to follow. The input is "memory", which is most appropriately typed as void.

If you want to do some hacky stuff, then just wrap it in a function that does your preferred casts or type puns in whatever way you like; the world is your oyster. Don't wreck the spec for your personal convenience.

If I'm wrong; the spec can be relaxed, but it can never be tightened...

We're just going in circles now repeating ourselves.

It's because I consistently feel casually dismissed, and I always seem to feel that the only way I can be heard eventually is to repeat myself over and over. I'm having flashbacks of that time we needed to mangle C++ symbols with a namespace; that really did my head in!

I don't know how to say this without sounding arrogant, but I know about this stuff better than most, and I am completely confident my spec on rvalue semantics, and placement new/delete (which are both related tools), will lead this aspect of the language to a really great place that people will be excited about. I wish you'd trust me enough to try my proposals verbatim before perverting them. Let them show their merit before casually dismissing them or butchering them into something I'm no longer excited about... you've spent a few days thinking about this stuff, but I've been baking for years or decades, and I also know the landscape; all these semantics are things I've been using for a very long time. I've worked on a broad and varied range of projects and written millions of lines of code in projects where these tools and patterns are present. I know what they're for, where they lead, and how they scale.

We must have gone round in circles on the rvalue conversation no less than 10 times (possibly more!), but we got there eventually... so I'm not sure going around in circles isn't eventually productive.

The good news is that it works just fine for your use case. You can get rid of emplace now! I'm a bit less lucky, as we're still using an ancient version of D for the bootstrap compiler, grr grr.

This is a different matter. 'Fine' is not the benchmark we should be trying to achieve when finally correcting some of the most fundamental aspects of the D spec. This is a once-in-a-lifetime opportunity to get this stuff really right... and from my point of view, there is nothing in the language that needs work more important than these 2 things.

you just trivially decided it with no warning or discussion

This is a bit unfair. If Bob writes a proposal, he makes numerous decisions. People then ask for a reference implementation. Bob implements it, discovering that more decisions have to be made. Then the discussion happens. That's normally how things work.

I don't understand. I see that the other way around. From my point of view; this is my proposal... I'm glad you liked it and ran with it, but there was a lot more to it that I couldn't easily share in the initial conversation we had.
Should I have perceived that it was forked from my initial proposal and I was cut out of it? I'll admit that I was somewhat offended to note my name's nowhere to be seen near the DIP on this. That feels a bit unfair... but perhaps that's the signal where I should have recognised that it's no longer my design.

@TurkeyMan
Copy link
Contributor

I have proposed the OPTION to accept void[] in ADDITION to void[N]

I avoided that because if T is a void[] then the compiler will have to introduce a special case to have T not be the target but what it refers to being the target.

I don't follow. I don't see how there's any room for special cases here?

I realised what you meant here... this situation naturally occurs when you want to new an array though, so the case still has to be handled?
Why do you see this case as a problem?

@WalterBright
Copy link
Member Author

I am catching up after 4 days without internet or power. Seattle is a technologically advanced city with a primitive electric grid.

I consistently feel casually dismissed

I'm sorry you got that impression. I've provided a rationale for each decision.

Should I have perceived that it was forked from my initial proposal and I was cut out of it? I'll admit that I was somewhat offended to note my name's nowhere to be seen near the DIP on this. That feels a bit unfair... but perhaps that's the signal where I should have recognised that it's no longer my design.

You did indeed propose new(ptr) T(...) in email along with a couple examples.

The first line of the first post of the DIP in the n.g. on Oct 30: "Based on a suggestion by Manu Evans:"
https://www.digitalmars.com/d/archives/digitalmars/dip/development/First_Draft_Placement_New_Expression_421.html#N421
I updated the DIP Nov 18: https://github.com/WalterBright/documents/blob/master/placementnew.md

The dip is about passing an lvalue rather than a pointer.

perverting ... butchering

It's a good thing we're friends, Manu!

If you want things verbatim, write the DIP the way you want it. I wrote it the way I thought would work best. I've never seen an idea that survived into a specification without modification, like the replacement of void* with void[]. I've also never seen a specification that survived into an implementation without modification. (I rewrote the DIP after this implementation.)

You've wanted 3 things, and I want them too:

  1. __rvalue
  2. placement new
  3. extended alias

The first two now have implementations and will work for your use cases. The other aspect of placement new works for my use cases. We should be both happy about this.

but there was a lot more to it that I couldn't easily share in the initial conversation we had.

I didn't see the more, other than what I mentioned earlier. I know placement new from C++, having implemented it. I also implemented that in an earlier version of D: https://dlang.org/deprecate.html#Class%20allocators%20and%20deallocators and the implementation code for it is still present (and I made some use of it!).

From my point of view; this is my proposal...

Ok, but you've also made it clear it is not what you proposed :-)

The hinge of our disagreement is placement new cannot be made @safe and trying to make it safe with an unsafe cast doesn't fix it. The implementation won't allow its use in @safe code.

@TurkeyMan
Copy link
Contributor

The hinge of our disagreement is placement new cannot be made @safe and trying to make it safe with an unsafe cast doesn't fix it. The implementation won't allow its use in @safe code.

Not at all, the hinge of our disagreement is that accepting a T as input is absolutely unacceptable.
Accepting a T by ref is palpably worse, because it further hides the appearance that you're handling memory at all (no pointer in sight) and looks exactly like passing a value.

By writing:

T x;
new(x) T;

There's nothing you could do to make it look more like a value of T is the input to the function, which couldn't be further from the truth; the ONLY valid input is raw uninitialised memory. The only language we have to handle that concept is void.

I think if you were going to design this API, it's not possible to make the API worse than that; it communicates exactly the wrong thing, and I don't think you could inspire misunderstanding and user error more confidently if you tried.

Before new, there IS NO T... I'll die on that hill.
The language should not have any spec that encourages handling invalid values, and I reckon it will also complicate future lifetime analysis.

Yes I know you can write T x = void and then you want to hand that to placement new; I absolutely want to see the coersion from T to void[N] right in my face, so that there can be no mistake. Anyone that encounters that line of code will stop and ask themselves if the value they're coercing has definitely been destroyed prior, or assure it's uninitialised.

@nordlow
Copy link
Contributor

nordlow commented Nov 26, 2024

Could the current semantics of C++'s " Placement new" described in https://en.cppreference.com/w/cpp/language/new give insights on what type(s) that should be allowed to be passed as argument to new()?

@TurkeyMan
Copy link
Contributor

C++ accepts void*, and it does not perform any bounds checking.

@TurkeyMan
Copy link
Contributor

The trouble with the function apparently accepting T values, is that people will supply T values! They will get their new object out the other side like they expect, and (depending on the implementation of the constructor/destructor) there's a high chance their code will appear to just work. There's nothing in sight to suggest they might be making a mistake, and chances are they are just silently leaking memory or something like that.

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Dec 11, 2024

Oh no!

void f(int* p) @nogc
{
    new(*p) int;
}
error : cannot use `new` in `@nogc` function `main.f`

Seems that placement new is not @nogc, probably hangover from new?
The expression needs to infer the attributes of the constructor it calls...

@TurkeyMan
Copy link
Contributor

    override void visit(NewExp e)
    {
+++     if (e.placement)
+++         return; // placement new
        if (e.member && !e.member.isNogc() && f.setGC(e.loc, null))
        {
            // @nogc-ness is already checked in NewExp::semantic
            return;
        }
        if (e.onstack)
            return;
        if (global.params.ehnogc && e.thrownew)
            return;                     // separate allocator is called for this, not the GC

        if (setGC(e, "cannot use `new` in `@nogc` %s `%s`"))
            return;
        f.printGCUsage(e.loc, "`new` causes a GC allocation");
    }

This worked for me locally...

@WalterBright
Copy link
Member Author

@TurkeyMan Added your fix

@TurkeyMan
Copy link
Contributor

Is it correct though?

@WalterBright
Copy link
Member Author

yes

@TurkeyMan
Copy link
Contributor

shall we merge this too?

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Dec 16, 2024

Actually, I still vehemently object to the API... can we get some 3rd parties to add their opinions?
@atilaneves ?

@atilaneves
Copy link
Contributor

Let me talk to @WalterBright first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review:Needs Changelog A changelog entry needs to be added to /changelog Review:Needs Spec PR A PR updating the language specification needs to be submitted to dlang.org Review:Needs Tests Review:WIP Work In Progress - not ready for review or pulling Severity:Enhancement Severity:New Language Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants