小编:本文由 WebInfra 团队姚忠孝、杨文明、张地编写。意在通过深入剖析常用的内存分配器的关键实现,以理解虚拟机动态内存管理的设计哲学,并为实现虚拟机高效的内存管理提供指引。
在现代计算机体系结构中,内存是系统核心资源之一。虚拟机( VM )作为运行程序的抽象"计算机",内存管理是其不可或缺的能力,其中主要包括如内存分配、垃圾回收等,而其中内存分配器又是决定"计算机"内存模型,以及高效内存分配和回收的关键组件。

小编:本文由 WebInfra 团队姚忠孝、杨文明、张地编写。意在通过深入剖析常用的内存分配器的关键实现,以理解虚拟机动态内存管理的设计哲学,并为实现虚拟机高效的内存管理提供指引。
struct AndroidConfig { using SizeClassMap = AndroidSizeClassMap; #if SCUDO_CAN_USE_PRIMARY64 // 256MB regions typedef SizeClassAllocator64<SizeClassMap, 28U, 1000, 1000, /*MaySupportMemoryTagging=*/true> Primary; #else // 256KB regions typedef SizeClassAllocator32<SizeClassMap, 18U, 1000, 1000> Primary; #endif // Cache blocks up to 2MB typedef MapAllocator<MapAllocatorCache<32U, 2UL << 20, 0, 1000>> Secondary; template <class A> using TSDRegistryT = TSDRegistrySharedT<A, 2U>; // Shared, max 2 TSDs. };
struct UnpackedHeader { uptr ClassId : 8; u8 State : 2; u8 Origin : 2; uptr SizeOrUnusedBytes : 20; uptr Offset : 16; uptr Checksum : 16; };
struct AndroidSizeClassConfig { static const uptr NumBits = 7; static const uptr MinSizeLog = 4; static const uptr MidSizeLog = 6; static const uptr MaxSizeLog = 16; static const u32 MaxNumCachedHint = 14; static const uptr MaxBytesCachedLog = 13; static constexpr u32 Classes[] = { 0x00020, 0x00030, 0x00040, 0x00050, 0x00060, 0x00070, 0x00090, 0x000b0, 0x000c0, 0x000e0, 0x00120, 0x00160, 0x001c0, 0x00250, 0x00320, 0x00450, 0x00670, 0x00830, 0x00a10, 0x00c30, 0x01010, 0x01210, 0x01bd0, 0x02210, 0x02d90, 0x03790, 0x04010, 0x04810, 0x05a10, 0x07310, 0x08210, 0x10010, }; static const uptr SizeDelta = 16; };
NOINLINE void *allocate(uptr Size, Chunk::Origin Origin, uptr Alignment = MinAlignment, bool ZeroContents = false) { initThreadMaybe(); // 初始化TSD Array 和 Primary SizeClass地址空间 // skip trivials....... // If the requested size happens to be 0 (more common than you might think), // allocate MinAlignment bytes on top of the header. Then add the extra // bytes required to fulfill the alignment requirements: we allocate enough // to be sure that there will be an address in the block that will satisfy // the alignment. const uptr NeededSize = roundUpTo(Size, MinAlignment) + ((Alignment > MinAlignment) ? Alignment : Chunk::getHeaderSize()); // skip trivials....... void *Block = nullptr; uptr ClassId = 0; uptr SecondaryBlockEnd; if (LIKELY(PrimaryT::canAllocate(NeededSize))) { // 从 Primary Allocator分配small object ClassId = SizeClassMap::getClassIdBySize(NeededSize); DCHECK_NE(ClassId, 0U); bool UnlockRequired; auto *TSD = TSDRegistry.getTSDAndLock(&UnlockRequired); Block = TSD->Cache.allocate(ClassId); // If the allocation failed, the most likely reason with a 32-bit primary // is the region being full. In that event, retry in each successively // larger class until it fits. If it fails to fit in the largest class, // fallback to the Secondary. if (UNLIKELY(!Block)) { while (ClassId < SizeClassMap::LargestClassId) { Block = TSD->Cache.allocate(++ClassId); if (LIKELY(Block)) { break; } } if (UNLIKELY(!Block)) { ClassId = 0; } } if (UnlockRequired) TSD->unlock(); } // 如果分配的是大内存,或者Primary 无法分配小内存, // 则直接在Secondary Allocator进行分配 if (UNLIKELY(ClassId == 0)) Block = Secondary.allocate(NeededSize, Alignment, &SecondaryBlockEnd, ZeroContents); // skip trivials....... const uptr BlockUptr = reinterpret_cast<uptr>(Block); const uptr UnalignedUserPtr = BlockUptr + Chunk::getHeaderSize(); const uptr UserPtr = roundUpTo(UnalignedUserPtr, Alignment); void *Ptr = reinterpret_cast<void *>(UserPtr); void *TaggedPtr = Ptr; // skip trivials....... // 根据返回内存地址,设置chunk-header对象数据 Chunk::UnpackedHeader Header = {}; if (UNLIKELY(UnalignedUserPtr != UserPtr)) { const uptr Offset = UserPtr - UnalignedUserPtr; DCHECK_GE(Offset, 2 * sizeof(u32)); // The BlockMarker has no security purpose, but is specifically meant for // the chunk iteration function that can be used in debugging situations. // It is the only situation where we have to locate the start of a chunk // based on its block address. reinterpret_cast<u32 *>(Block)[0] = BlockMarker; reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset); Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask; } Header.ClassId = ClassId & Chunk::ClassIdMask; Header.State = Chunk::State::Allocated; Header.Origin = Origin & Chunk::OriginMask; Header.SizeOrUnusedBytes = (ClassId ? Size : SecondaryBlockEnd - (UserPtr + Size)) & Chunk::SizeOrUnusedBytesMask; // 设置chunk-header,CheckSum用于完整性校验 Chunk::storeHeader(Cookie, Ptr, &Header); // skip trivials....... return TaggedPtr; }
为了缓解多线程内存分配的数据竞争问题,每个线程的数据缓存区( TLS )保存了少量用于常用中小内存分配的 Slots 。因为这些 Slots 是按线程存储的,所以可以在没有锁的情况下分配它们,并且只需要更快的线程本地存储查找,从而提高进程中的缓存局部性。每个线程的缓存已经过定制以满足大多数请求,通过批量分配和释放第二层的内存,分期锁获取,并在不捕获过多内存的情况下进一步改善局部性。
内存分配器
|
说明
|
---|---|
dlmalloc
|
https://github.com/ARMmbed/dlmalloc· 非线程安全,多线程分配效率低· 内存使用率较低(内存碎片多)· 分配效率低· 安全性低
|
jemlloc
|
https://github.com/jemalloc/jemalloc/blob/dev/INSTALL.md· 代码体积较大,元数据占据内存大· 内存分配效率高(基于run(桶) + region)· 内存使用率高· 安全性中
|
scudo
|
https://android.googlesource.com/platform/external/scudo/· 更注重安全性,内存使用效率和性能上存在一定的损失· 内存分配效率偏高(sizeclass + region, 安全性+checksum校验)· 内存使用率低(元数据,安全区,地址随机化内存占用多)· 安全性高( 安全性+checksum校验,地址随机化,保护区)
|
partition-alloc
|
https://source.chromium.org/chromium/chromium/src/+/main:base/allocator/partition_allocator/PartitionAlloc.md· 代码体积小· 内存使用率偏高(相比于jemalloc多保护区,基于range的分配)· 安全性偏高(相比sudo少安全性,完整性校验,地址随机化)· 内存分配效率高(基于bucket + slot,分区特定优化)· 支持"全"平台 PartitionAlloc Everywhere (BlinkOn 14) :https://www.youtube.com/watch?v=QfY-WMFjjKA
|
Allocator
|
QPS (higher is better)
|
Max RSS (lower is better)
|
---|---|---|
tcmalloc (internal)
|
410K
|
357MB
|
jemalloc
|
356K
|
1359MB
|
dlmalloc (glibc)
|
295K
|
333MB
|
mesh
|
142K
|
710MB
|
portableumem
|
24K
|
393MB
|
hardened_malloc*
|
18K
|
458MB
|
guarder
|
FATALERROR**
|
|
freeguard
|
SIGSEGV***
|
|
scudo (standalone)
|
400K
|
318MB
|
评论已关闭。