-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All pointers stored in the arena are weak #14
Comments
In case it helps - I've got what I've been calling a 'poor person's arena' in my AVRO codec, which gets around this problem by allocating data per-type (essentially per-type slabs) and using type information when allocating by calling low-level reflection functions directly. https://github.com/philpearl/avro/blob/d2f7ca4351f4676a66b6ccb901df2b81643c748b/buffer.go#L134 https://github.com/philpearl/avro/blob/master/unsafetricks.go |
yeah, that's what we have to do to make arenas safe in general; ensuring that the right type metadata is attached to every slab's allocation is the only way to make the pointers within them "real". i think it's probably the only way to make this arena usable except in extremely niche cases where all the types are pointer-free, which we can't detect or enforce. this really calls into question how much faster the arena really is, given that it may end up doing relatively little to reduce the workload on the GC, and how resetting the arena without discarding its slabs affects that. (a major advantage of typing the slabs is that resetting the arena becomes relatively safe, because the memory from the arena will never be doled out aliased with another type; it's just a regular slot in a regular slice, and any still-living pointers into the slab will at worst just see an erased (or replaced) struct value instead of causing UB). the main advantage: fewer small, real allocations by the allocator that need to be freed. this can mean reduced memory growth per request, exactly equal to how much memory is explicitly allocated from the arena instead of globally. unfortunately it's difficult to make this include a lot of the allocations that are actually involved. ancillary allocations are hard to move into the arena; string data in particular will live outside unless a mechanism is added to deliberately copy it into the arena somewhere. regardless, if we're able to explicitly put many, many small allocations into the arena that we would otherwise have put into the global allocator where the memory has to wait for a GC pass to become available again... that can potentially help! |
@mumbleskates I'm not sure what exactly you want to say, maybe your example is a little bit misleading. below you can find updated example of above: package main
import (
"fmt"
"runtime"
"unsafe"
"github.com/ortuman/nuke"
)
func main() {
arena := nuke.NewMonotonicArena(1024*1024, 1024)
var homes []*string
for i := 0; i < 1000; i++ {
s := fmt.Sprintf("this string contains the number %d in the middle", i)
home := nuke.MakeSlice[byte](arena, len(s), len(s))
copy(home, s)
homes = append(homes, (*string)(unsafe.Pointer(&home)))
if i%10 == 0 {
runtime.GC()
}
}
// All pointers stored in the arena are weak, and do not keep the data they
// refer to alive; any pointed-to data may be garbage collected if no "real"
// pointers to it exist.
for i, home := range homes {
fmt.Printf("%d: %q\n", i, *home)
}
} At least at the moment we have correct output in a terminal. |
In general if you have such questions in Go then you should be aware about Go memory model, GC mechanics and Unsafe is unsafe. Go's GC is not smart enough to guess what magic is going on when you touch "unsafe". Also Go's Arenas is similar to "placement new" in C++. In order to use that you should know a little bit more than regular Go user, because arenas and custom allocators are not so simple. |
This comment was marked as duplicate.
This comment was marked as duplicate.
@mumbleskates I have more rudeness for you. you do not put strings into memory managed by the arena. if you want to have a string managed by the arena, you must allocate memory for that string inside the arena. so you should use "make slice" method. Go's memory model does not have pointer arithmetic like in C. A lot of related features from C does not work in Go. The arena is not aware about what Go's GC does. Go's GC is not aware what you do in the arena. your original example does something like following:
in result you have:
so what do you want to get to according to above? That's why I've provided "correct" example of your code. Before using Go's
Go's And a bit about the next line: this is what Go's if you are aware of Go's memory, long story short: the |
This is entirely incorrect. If you read it closely you can see what it does, rather than guessing:
at no point is the string data produced by Once again, I am not using the
yes, this is what makes using this arena deeply unsafe and hazardous except in extremely limited use cases. Writing an arena for only raw byte and integer data is one thing, but there are no amenities to make that easier here, and nothing at all in any of the documentation to warn of this pointer-weakening hazard. if the slabs for potentially pointer-bearing types are deliberately typed upon allocation instead, this hazard disappears without removing the basic utility or advantages of the arena as it is today. If your response to all this is along the same lines you've already expressed, that you believe weaklings and losers who can't write exactly correct code and know precisely what will and won't be safe at all times should not use the library, I'm left asking who appointed you as the monarch of this repository that you don't own. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I think @AnyCPU has a point - if you ensure all allocations in the arena only point to other allocations in the arena then you're fine. This should be a reasonable constraint for something like a serialisation decoder, so I think there are reasonable use-cases (and indeed if I'd twigged that for my AVRO decoder I might have made my poor arena simpler). It should also be fine for something like a JSON decoder - everything the JSON decoder allocates can come from the arena. In terms of whether an arena approach is useful or performant if you need to do the things I did - yes, it can be very helpful. I built the AVRO decoder to handle generating ML features from data with very wide rows with lots of strings. Each row would generate a lot of garbage that could all be discarded once the row was processed. Before adding an arena approach it was spending > 95% of CPU in garbage collection. After adding the arenas garbage collection was very low, and a process that IIRC took ~20 hours completed in more like 20 minutes. But yes, arena approaches can be very dangerous and require extremely careful programming. The best you can hope is that when you discard the arena that accidentally referenced memory becomes invalid quickly and causes exceptions quickly. I'd hoped the Go arena was going to be an improvement on my efforts in this regard. In the meantime I'm having reasonable success just being extremely careful! |
Oh, I should say the key to making things performant is re-using arenas, and having arenas that gradually right-size to the task in hand. |
And also to be clear I don't anymore think that this is a bug - at best it's a documentation issue. |
i think if they'd bothered to say it the way you did at any point, instead of saying a lot of immediately disprovable and contradictory things about the demonstrating example and what it does and then producing a hazardous and confusing example of their own, then they would have had a point.
there's definitely an argument for this. I believe it's pretty clear something needs to happen either way: it's very, very involved in the inner workings of the (current) runtime to even realize that this happens. in my opinion, either there needs to be clear and up front documentation that this is a double ended chainsaw footgun in a language that does nothing to help make it safe; or there could be handling for this (like a switch case that creates typed slabs for anything that isn't a non-pointer primitive, a slice of such, or the user explicitly specifie the type is free of external pointers). i would lobby for typed slabs at least as a longer-term feature goal, as an allocation saved is still an allocation saved even if it points to another allocation that isn't. the pointed-to memory is already on the heap; the best we could do by copying it into the arena is prevent the allocation from graduating to an older generation. i don't think finding the typed slabs this way would be terrible as allocating memory within the arena already has a cost approximately linear to the number of slabs, so there are already other fixes to be made. |
@mumbleskates i see you want to get another one portion of the rudeness, ok.
this is true
who said that? did you read links above?
who said that? it should not. GC does not thinks, it knows.
what do you mean when you say the string struct? a string in Go is synthetic sugar.
I bet you just do not understand what is going under the hood of the arena and what is going on in that case when you do
if you do not care it is your trouble, the arena and GC just simply do their job, if you misuse it, if you do not read provided docs, if do not read what memory arenas are, nobody cares.
once again I can show Go's source code from official repo without any praying.
it is mentioned a lot of time, the areanas and unsafe are not for everyone.
you have just appointed me as the monarch, this is paradox. you have a wall, a wall has a socket, you also have two fingers, you are able to put you finger into your socket. |
@philpearl you are right. of course it is nice to increase a number of examples how to use this library and docs too. now i understand why exactly Go's team does not want to release their arena officially. |
@philpearl maybe it is interesting topic for you: memory ballasts. |
i said that. so does the script i submitted in the original issue. that is factually what it does. if i am to give you the benefit of the doubt, i can only assume that either you did not read it at all or you suffer from a basic misunderstanding of the data types involved.
i said that. it is furthermore a known fact of the basic mechanism of go's garbage collector. heap allocations (and i believe stack frames) have associated metadata about the types contained within them, allowing pointers to be traversed by the garbage collector.
strings in golang are not merely sugar, syntactic or otherwise. you can learn more about this by reading the documentation for string types in general, noticing that at this point it is clear that you have a very fuzzy idea of what is going on in any program that is even slightly tricky with its memory management. I strongly recommend you stick to safe golang for the time being. |
no, it does not. yes, on this line:
you effectively reserve 16 bytes in the arena. on this line:
you tries to copy whole string which is larger than reserved space, at this point the arena knows nothing about what you are trying to do. Go's GC also does know nothing on this. after that you stress Go's GC. exact behavior at this point is not covered by Go Spec. in real life underlying memory will contain trash. because you do not keep whole string in memory managed by arena, you will always have troubles and it does not matter what arena you will use. yes, actually you can keep pointers in the arena, but they will kept as on
no, it does not work this way, but i would recommend you to update a mirror. (c) sincerely emperor. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@mumbleskates as always you can just not to write. |
apologies, i didn't realize until now that blocking could prevent further off-topic comments. that's my bad |
@philpearl i've been toying with this general idea; were you able to quantify the improvements from using the runtime type functions directly? how's this compare to |
TBH I can't fully remember. The reflect package has a tendency to allocate unnecessarily, but I've a feeling some of those issues may have been fixed recently. I think if I started fresh I'd perhaps see what could be done directly with the reflect package or even generics, and only switch to the deeply unsafe stuff if the performance wasn't fabulous. |
@philpearl That's what I'm leaning towards yeah. from what i can see the current reflect type mechanism just cobbles together an interface object pointing to the static type info struct that those super unsafe functions give you. The larger challenge certainly seems to be in maintaining usable safety:
all this is doable, it's just a lot of iterating trying to find something that actually works! |
i have a branch with:
overall, the POD path is maybe 15% slower than the monotonic arena (not a significant cost really), with the advantages of automatically growing (with exponential growth, so amortized O(1) cost to allocate) and skipping over slabs that are already full (which may degenerate in the monotonic arena implementation) and automatically shrinking over time if most of the arena is unused (shrinking incrementally if the slab group was less than 1/4 used in that cycle). i was not able to get any kind of lite introspection to operate any better than just using Shaving nanoseconds that nobody is likely to notice aside, the implementation still completely saves all the outside allocations, as was the original goal of an arena. (notably, here, there was actually an aborted attempt to make userland arenas recently.) Several choices remain for this, including making individual typechecked allocations maybe 50% slower with the extra logic but making the type dispatch even more automatic, memoizing the full inspection of each type seen into something like a global Major improvements other than that would look like a |
If any type containing a pointer is stored in the arena, the data it points to may be garbage collected if no other pointers exist because the GC cannot know about those pointers as the slabs are allocated without type information. This leads to UB when those pointers are dereferenced.
https://go.dev/play/p/r4YNthSmv-4
The text was updated successfully, but these errors were encountered: