Arena is an experimental package provided by the Go standard library, proposed in #51317, aimed at providing a user-controlled method for memory allocation and deallocation at a finer granularity to improve GC control.
The arena implementation provides an Arena object, which users can use for memory allocation and manual memory deallocation. Once this package stabilizes, Go applications and libraries can adaptively adjust to further improve GC.
One use case for the arena is: suppose there is a tree data structure where each node is a heap object. The Arena can be used to allocate each node object, and when the entire tree structure is no longer needed, all node objects can be released at once through the Arena's free interface (since they are allocated continuously from the same Arena, the memory space for all node objects is contiguous, making deallocation relatively fast).
Not wanting to draw diagrams or being good at lengthy textual descriptions, I will analyze it using code with comments in a slightly rigid manner below.
Arena Usage Example#
The minimum Go version to use the arena is 1.20, and GOEXPERIMENT needs to be enabled.
$ go env -w GOEXPERIMENT='arenas'
Let's write a test to see how to use the arena package and some precautions.
func Test_arena(t *testing.T) {
ar := arena.NewArena() // Create an arena object
// Use arena.MakeSlice to create a slice providing len and cap
// Similar to the native makeslice
s1 := arena.MakeSlice[int](ar, 0, 8)
s1 = append(s1, 1, 2, 3, 4, 5)
t.Log(s1) // [1 2 3 4 5]
// Use arena.Clone to clone an object
s2 := []string{"🐱", "🐭"}
s3 := arena.Clone(s2)
t.Log(s3) // [🐱 🐭]
// Use arena.New to create an object similar to the native new, returning *T
s4 := arena.New[map[int]int](ar)
t.Log((*s4)[0]) // 0
// If this append is not added, accessing s1 after Free will crash
// Because here the append exceeds cap and actually uses growslice
// So s1 becomes memory allocated by native mallocgc and is no longer managed by the arena
s1 = append(s1, 6, 7, 8, 9)
// Use Free to release all memory space allocated by the arena
ar.Free()
// Since growslice now has s1 unrelated to ar, accessing it will not crash
t.Log(s1)
// Since ar has been released, the original memory area is set to be inaccessible (refer to the implementation source code later), this will crash
// accessed data from freed user arena 0xc0047fffc0
// fatal error: fault
// [signal SIGSEGV: segmentation violation code=0x2 addr=0xc004000000 pc=0x1133494]
t.Log(((*s4)[0]))
}
Arena Implementation Source Code Analysis#
When reading the code, I tend to first envision a usage scenario and then read the corresponding implementation according to the flow:
- First create an arena
- The arena first requests space (each time requesting a segment of space)
- The arena allocates object 1, allocates object 2 ... until space runs out
- Request space again to allocate new objects
- Release the arena (in most cases, there should be caching here)
- Create another arena or request space again (may take some things from the cache)
The following source code analysis has a considerable correlation with Go memory management, requiring readers to have a basic understanding of Go's memory allocator implementation (mspan, mheap, etc.) and the GC process.
From Standard Library to Runtime#
The standard library arena
is actually a wrapper for runtime linked via linkname to src/runtime/arena.go, just look directly at arena.go.
func newUserArena() *userArena // Create arena object
func (a *userArena) new(typ *_type) unsafe.Pointer
func (a *userArena) slice(sl any, cap int)
func (a *userArena) free()
// ...
Runtime Structure of Arena#
arenaChunk is a segment of heap memory, managed by mspan after being allocated by mheap (spanClass = 0).
type userArena struct {
fullList *mspan // Doubly linked list of already allocated mspan
active *mspan // The most recently allocated mspan
refs []unsafe.Pointer // List of base addresses of arenaChunk (mspan)
// The last item always refers to the active mspan, the rest are in fullList
defunct atomic.Bool // Marks whether this userArena has been freed
}
Creating Arena#
func newUserArena() *userArena {
a := new(userArena)
SetFinalizer(a, func(a *userArena) {
// If not actively freed, free it during GC
a.free()
})
a.refill() // Request a segment of space when created
return a
}
Filling Arena Space#
func (a *userArena) refill() *mspan {
// The first refill will definitely have s empty
s := a.active
var x unsafe.Pointer
// Not the first refill
if s != nil {
// If the old active is not empty, put it into fullList
// Then place the newly requested mspan into active
s.next = a.fullList
a.fullList = s
a.active = nil
s = nil
}
// Get arenaChunk from the global reuse list, directly take the last one
// This list is populated when the arena is freed later
lock(&userArenaState.lock)
if len(userArenaState.reuse) > 0 {
// Pick off the last arena chunk from the list.
n := len(userArenaState.reuse) - 1
x = userArenaState.reuse[n].x
s = userArenaState.reuse[n].mspan
userArenaState.reuse[n].x = nil
userArenaState.reuse[n].mspan = nil
userArenaState.reuse = userArenaState.reuse[:n]
}
unlock(&userArenaState.lock)
if s == nil {
// Request a new arenaChunk, actually managed by mspan
x, s = newUserArenaChunk()
}
// Save the base address of the new arenaChunk (mspan) in refs
a.refs = append(a.refs, x)
// Place the most recently requested mspan into active
// But do not add it to fullList
a.active = s
return s
}
// Create a new arenaChunk, still using mspan to manage memory
func newUserArenaChunk() (unsafe.Pointer, *mspan) {
// userArena also needs to add credit to assist GC
deductAssistCredit(userArenaChunkBytes)
var span *mspan
systemstack(func() {
// Use mheap to obtain a userArena
span = mheap_.allocUserArenaChunk()
})
// The returned x is the base address of a segment of heap space managed by mspan (address of the first byte)
x := unsafe.Pointer(span.base())
// If requested during GC, directly mark the object as black (since it's all empty, no need to scan)
if gcphase != _GCoff {
gcmarknewobject(span, span.base(), span.elemsize)
}
// Memory sampling related...
// Affects heap memory space, triggering a GC test
if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
gcStart(t)
}
return x, span
}
// Use mheap to request arenaChunk
func (h *mheap) allocUserArenaChunk() *mspan {
var s *mspan
var base uintptr
lock(&h.lock)
if !h.userArena.readyList.isEmpty() {
// First check the free list readyList doubly linked list
s = h.userArena.readyList.first
h.userArena.readyList.remove(s)
base = s.base()
} else {
// Request a new arena
hintList := &h.userArena.arenaHints
// This part is more complex, aligning sizes, requesting space from the OS
// and recording arena's metadata on mheap, etc.
v, size := h.sysAlloc(userArenaChunkBytes, hintList, false)
// If the obtained size is larger than requested
// then split the remaining part into the readyList for future use
// The returned size is still userArenaChunkBytes
if size > userArenaChunkBytes {
for i := uintptr(userArenaChunkBytes); i < size; i += userArenaChunkBytes {
s := h.allocMSpanLocked()
s.init(uintptr(v)+i, userArenaChunkPages)
h.userArena.readyList.insertBack(s)
}
size = userArenaChunkBytes
}
base = uintptr(v)
}
unlock(&h.lock)
// sysAlloc returns reserved address space (BRK), need to mmap to change permissions
sysMap(unsafe.Pointer(base), userArenaChunkBytes, &gcController.heapReleased)
// Create an mspan to manage the allocated space, here spanclass is 0
spc := makeSpanClass(0, false)
h.initSpan(s, spanAllocHeap, spc, base, userArenaChunkPages)
s.isUserArenaChunk = true
// GC and heap profile data statistics...
// Place mspan into mcentral's full spanSet
// Mark as having no available space (this space can only be used by the userArena owner)
h.central[spc].mcentral.fullSwept(h.sweepgen).push(s)
s.limit = s.base() + userArenaChunkBytes
s.freeindex = 1
s.allocCount = 1
// Clear mspan's bitmap on mheap
s.initHeapBits(true)
// Clear the memory area (also increases the possibility of using hugepage)
// Because Linux will track and predict memory access patterns
// Continuous zero values will be judged as sequential scanning patterns, making hugepage more likely
memclrNoHeapPointers(unsafe.Pointer(s.base()), s.elemsize)
s.needzero = 0
s.freeIndexForScan = 1
return s
}
Creating Objects#
Taking MakeSlice to create a slice as an example.
func (a *userArena) slice(sl any, cap int) {
// Get the type of the slice itself
i := efaceOf(&sl)
typ := i._type
// Get the type of slice elements
typ = (*slicetype)(unsafe.Pointer(typ)).Elem
// Some typekind checks are omitted
// Use alloc to request and construct a structure that conforms to slice ABI and assign it
*((*slice)(i.data)) = slice{a.alloc(typ, cap), cap, cap}
}
func (a *userArena) alloc(typ *_type, cap int) unsafe.Pointer {
s := a.active
var x unsafe.Pointer
for {
// Start trying to allocate space from the current active mspan
x = s.userArenaNextFree(typ, cap)
if x != nil {
break
}
// If mspan is insufficient for allocation, request a new arenaChunk
// If the requested size exceeds one arenaChunk, it will fallback to mallocgc
// A new arenaChunk space will definitely be sufficient for allocation
// During the next refill, the previous active will be placed into fullList
// Then the latest mspan will be placed into active
s = a.refill()
}
return x
}
func (s *mspan) userArenaNextFree(typ *_type, cap int) unsafe.Pointer {
// ...
// Calculate the required size based on typesize and cap
// If size exceeds userArenaChunkMaxAllocBytes, it can only be created via mallocgc
// Because arenaChunk may not be contiguous, multiple arenaChunks cannot be used to construct structures exceeding the size
// Currently, userArenaChunkMaxAllocBytes is 8M
if size > userArenaChunkMaxAllocBytes {
if cap >= 0 {
return newarray(typ, cap)
}
return newobject(typ)
}
// typ.PtrBytes indicates how many bytes of this type may contain pointers
// The calculation uses the offset of the last pointer field plus the size of the pointer
// https://github.com/golang/go/blob/master/src/reflect/type.go#L2618
// This value will be used when GC scans the structure; if PtrBytes is zero, it indicates this type does not contain pointers
// takeFromBack means allocating space from the tail end of the arenaChunk
// takeFromFront means allocating space from the head of the arenaChunk
// For types without pointers, allocate from back to front; otherwise, from front to back
// This way, during GC scanning, when scanning from front to back, if it encounters empty or pointer-less types, it does not need to continue scanning backward
var ptr unsafe.Pointer
if typ.PtrBytes == 0 {
// Allocate pointer-less objects from the tail end of the chunk.
v, ok := s.userArenaChunkFree.takeFromBack(size, typ.Align_)
if ok {
ptr = unsafe.Pointer(v)
}
} else {
v, ok := s.userArenaChunkFree.takeFromFront(size, typ.Align_)
if ok {
ptr = unsafe.Pointer(v)
}
}
if ptr == nil {
// The current active arenaChunk (mspan) space is insufficient, return nil to let the upper layer create a new arenaChunk
return nil
}
// For types with pointers, need to mark on the corresponding mheap's bitmap
// If it's a slice type (cap >= 0), need to mark each element
if typ.PtrBytes != 0 {
if cap >= 0 {
userArenaHeapBitsSetSliceType(typ, cap, ptr, s.base())
} else {
userArenaHeapBitsSetType(typ, ptr, s.base())
}
c := getMCache(mp)
if cap > 0 {
// If it's a slice type, only the last element's last segment of pointer-less fields does not need to be scanned
// [{*int, int}, {*int, int}, {*int, int}]
// In this case, only the last int does not need to be scanned
c.scanAlloc += size - (typ.Size_ - typ.PtrBytes)
} else {
// If it's a single type, only PtrBytes needs to be counted
// {int, *int, int, int}
// In this case, the last two ints do not need to be scanned
c.scanAlloc += typ.PtrBytes
}
}
// Ensure the object is initialized and the heap bitmap is set before being observed by GC
// On some weakly ordered machines, there may be inconsistent behavior due to reordering
// So a store/store barrier is added here
publicationBarrier()
return ptr
}
Releasing Arena#
func (a *userArena) free() {
// The objects on fullList correspond one-to-one with the addresses on refs
// Excluding refs[len(refs)-1] which indicates the active object
s := a.fullList
i := len(a.refs) - 2
for s != nil {
a.fullList = s.next
s.next = nil
freeUserArenaChunk(s, a.refs[i])
s = a.fullList
i--
}
// Place the active object into the reuse list
s = a.active
if s != nil {
lock(&userArenaState.lock)
userArenaState.reuse = append(userArenaState.reuse, liveUserArenaChunk{s, a.refs[len(a.refs)-1]})
unlock(&userArenaState.lock)
}
a.active = nil
a.refs = nil
}
func freeUserArenaChunk(s *mspan, x unsafe.Pointer) {
if gcphase == _GCoff {
// When not in GC, set (global) all arenaChunk spans to fault
lock(&userArenaState.lock)
faultList := userArenaState.fault
userArenaState.fault = nil
unlock(&userArenaState.lock)
// The setUserArenaChunkToFault function mainly sets the mmap region of the arenaChunk to be inaccessible
// This way, if access is attempted after free, it will crash
// And it will also place mspan into quarantine for GC worker sweep
s.setUserArenaChunkToFault()
for _, lc := range faultList {
lc.mspan.setUserArenaChunkToFault()
}
// Since mspan may still hold pointers, it needs to KeepAlive until GC cleans up
// Only after GC cleanup will the corresponding mspan be moved to readyList for reuse
KeepAlive(x)
KeepAlive(faultList)
} else {
// During GC, directly place the corresponding arenaChunk mspan into the fault list
// Wait for the next freeUserArenaChunk and _GCoff to clean it up
lock(&userArenaState.lock)
userArenaState.fault = append(userArenaState.fault, liveUserArenaChunk{s, x})
unlock(&userArenaState.lock)
}
}
Some Thoughts#
- The current arena implementation is still quite rough, with shared data like userArenaState placed in global variables and having global locks.
- The size of arenaChunk is also worth noting; currently, it is 8M. Since arena's free does not immediately return to the memory allocator or OS, it still requires GC sweep. If arenas are created arbitrarily, one can imagine that each arena will have at least one 8M arenaChunk, which can easily overwhelm the heap.
- Space waste issue; if there are some larger data structures, like 4+M, there may be 40% space waste. However, generally, it won't be that large, and this issue needs to be considered based on usage scenarios. It is uncertain whether it will be possible to customize the areaChunk size in the future.
- Performance issues; testing shows that the improvement does not seem as high as expected. It may also be due to issues with my benchmark code, so I won't post it here. Testing for specific scenarios will be more accurate when needed.