一个无竞争的缓存

技术分享 3周前 (04-30) 0 999+

关注

一个无竞争的缓存

otter是一个无竞争的缓存，在相关的性能测试中表项突出。otter的原理基于如下论文：

Cache定义

Cache的定义如下，其主要的组件包括：

hashmap：保存全部缓存数据
policy(s3-FIFO)：这是一个驱逐策略。当在hashmap中添加一个数据时，会同时将该数据添加到s3-FIFO中，若此时s3-FIFO驱逐出了老的数据，则需要同时删除hashmap中的对应数据。因此hashmap中的数据内容受限于s3-FIFO，hashmap和s3-FIFO中的数据是以最终一致的方式呈现的。
readBuffers：是一个缓存之上的缓存，其数据空间是较小且固定。用于找出热点数据，并增加热点数据的使用频率(freq)，以辅助实现s3-FIFO驱逐策略。
expiryPolicy：数据的缓存策略，支持固定TTL、可变TTL以及无过期方式。通过一个名为的cleanup 的goroutine来定期清理过期数据。
writeBuffer：这是一个事件队列，haspmap的增删改操作会将数据变更事件push到writeBuffer中，再由单独的goroutine异步处理这些事件，以保证hashmap、s3-FIFO和expiryPolicy的数据一致性。

otter将大部分存储的大小都设置为2的幂，这样实现的好处有两点：

在进行存储大小调整时，方便通过移位操作进行扩缩容

通过位与操作可以方便找到ring buffer中的数据位置:

func RoundUpPowerOf2(v uint32) uint32 { 	if v == 0 { 		return 1 	} 	v-- 	v |= v >> 1 	v |= v >> 2 	v |= v >> 4 	v |= v >> 8 	v |= v >> 16 	v++ 	return v } func main() {   var capacity uint32 = 5 //定义buffer容量 	var bufferHead uint32 	t := RoundUpPowerOf2(capacity) //将buffer容量转换为向上取2的幂 	mask := t - 1 //获取掩码 	buffer := make([]int, t)  	head := atomic.LoadUint32(&bufferHead) 	buffer[head&mask] = 100 //获取下一个数据位置，并保存数据 	atomic.AddUint32(&bufferHead, 1) //下一个数据位置+1 }

在Cache中有一个锁evictionMutex，并发访问竞争中，仅用于变更从readBuffers中返回的热点数据的freq，因此对并发访问竞争的影响很小。

type Cache[K comparable, V any] struct {    nodeManager      *node.Manager[K, V]    hashmap          *hashtable.Map[K, V] //hashmap    policy           *s3fifo.Policy[K, V] //s3-FIFO    expiryPolicy     expiryPolicy[K, V] //expiryPolicy    stats            *stats.Stats    readBuffers      []*lossy.Buffer[K, V] //readBuffers    writeBuffer      *queue.Growable[task[K, V]] //writeBuffer    evictionMutex    sync.Mutex    closeOnce        sync.Once    doneClear        chan struct{}    costFunc         func(key K, value V) uint32    deletionListener func(key K, value V, cause DeletionCause)    capacity         int    mask             uint32    ttl              uint32    withExpiration   bool    isClosed         bool }

数据节点的创建

Otter中的数据单位为node，一个node表示一个[k,v]。使用Manager来创建node，根据使用的过期策略和Cost，可以创建bec、bc、be、b四种类型的节点：

b -->Base：基本类型
e -->Expiration：使用过期策略
c -->Cost：大部分场景下的node的cost设置为1即可，但在如某个node的数据较大的情况下，可以通过cost来限制s3-FIFO中的数据量，以此来控制缓存占用的内存大小。

type Manager[K comparable, V any] struct { 	create      func(key K, value V, expiration, cost uint32) Node[K, V] 	fromPointer func(ptr unsafe.Pointer) Node[K, V] }

NewManager可以根据配置创建不同类型的node：

func NewManager[K comparable, V any](c Config) *Manager[K, V] { 	var sb strings.Builder 	sb.WriteString("b") 	if c.WithExpiration { 		sb.WriteString("e") 	} 	if c.WithCost { 		sb.WriteString("c") 	} 	nodeType := sb.String() 	m := &Manager[K, V]{}  	switch nodeType { 	case "bec": 		m.create = NewBEC[K, V] 		m.fromPointer = CastPointerToBEC[K, V] 	case "bc": 		m.create = NewBC[K, V] 		m.fromPointer = CastPointerToBC[K, V] 	case "be": 		m.create = NewBE[K, V] 		m.fromPointer = CastPointerToBE[K, V] 	case "b": 		m.create = NewB[K, V] 		m.fromPointer = CastPointerToB[K, V] 	default: 		panic("not valid nodeType") 	} 	return m }

需要注意的是NewBEC、NewBC、NewBE、NewB返回的都是node指针，后续可能会将该指针保存到hashmap、s3-FIFO、readBuffers等组件中，因此在可以保证各组件操作的是同一个node，但同时也需要注意node指针的回收，防止内存泄露。

hashmap

hashmap是一个支持并发访问的数据结构，它保存了所有缓存数据。这里参考了puzpuzpuz/xsync的mapof实现。

一个table包含一个bucket数组，每个bucket为一个链表，每个链表节点包含一个长度为3的node数组:

type Map[K comparable, V any] struct { 	table unsafe.Pointer //指向一个table结构体，用于保存缓存数据  	nodeManager *node.Manager[K, V] //用于管理node 	// only used along with resizeCond 	resizeMutex sync.Mutex 	// used to wake up resize waiters (concurrent modifications) 	resizeCond sync.Cond 	// resize in progress flag; updated atomically 	resizing atomic.Int64 //用于表示该map正处于resizing阶段，resizing可能会生成新的table，导致set失效，该值作为一个条件判断使用 }

type table[K comparable] struct { 	buckets []paddedBucket //其长度为2的幂 	// sharded counter for number of table entries; 	// used to determine if a table shrinking is needed 	// occupies min(buckets_memory/1024, 64KB) of memory 	size   []paddedCounter//用于统计table中的node个数，使用多个counter分散统计的目的是为了降低访问冲突   mask   uint64 //为len(buckets)-1, 用于和node的哈希值作位于运算，计算node所在的bucket位置 	hasher maphash.Hasher[K] //哈希方法，计算node的哈希值 }

bucket是一个单向链表：

type bucket struct {    hashes [bucketSize]uint64 //保存node的哈希值，bucketSize为3    nodes  [bucketSize]unsafe.Pointer //保存node指针，node指针和node的哈希值所在的索引位置相同    next   unsafe.Pointer//指向下一个bucket    mutex  sync.Mutex //用于操作本bucket的锁 }

table的结构如下

下面是map的初始化方法，为了增加检索效率并降低链表长度，table中的buckets数目(size)不宜过小：

func newMap[K comparable, V any](nodeManager *node.Manager[K, V], size int) *Map[K, V] { 	m := &Map[K, V]{ 		nodeManager: nodeManager, 	} 	m.resizeCond = *sync.NewCond(&m.resizeMutex) 	var t *table[K] 	if size <= minNodeCount { 		t = newTable(minBucketCount, maphash.NewHasher[K]()) //minBucketCount=32 	} else { 		bucketCount := xmath.RoundUpPowerOf2(uint32(size / bucketSize)) 		t = newTable(int(bucketCount), maphash.NewHasher[K]()) 	} 	atomic.StorePointer(&m.table, unsafe.Pointer(t)) 	return m }

下面是向map添加数据的方式，注意它支持并行添加数据。set操作的是一个table中的某个bucket。如果table中的元素大于某个阈值，就会触发hashmap扩容(resize)，此时会创建一个新的table，并将老的table中的数据拷贝到新建的table中。

set和resize都会变更相同的table，为了防止冲突，下面使用了bucket锁以及一些判断来防止此类情况：

每个bucket都有一个锁，resize在调整table大小时会新建一个table，然后调用copyBuckets将原table的buckets中的数据拷贝到新的table的buckets中。通过bucket锁可以保证resize和set不会同时操作相同的bucket
由于resize会创建新的table，有可能导致set和resize操作不同的table，进而导致set到无效的table中。
- 如果resize发生在set之前，则通过if m.resizeInProgress() 来保证二则操作不同的table
- 如果同时发生resize和set，则可以通过bucket锁+if m.newerTableExists(t)来保证操作的是最新的table。
  
  由于copyBuckets时也会用到bucket锁，如果此时正在执行set，则copyBuckets会等待set操作完成后再将数据拷贝到新的table中。copyBuckets之后会将新的table保存到hashmap中，因此需要保证bucket和table的一致性，在set时获取到bucket锁之后需要进一步验证table是否一致。

func (m *Map[K, V]) set(n node.Node[K, V], onlyIfAbsent bool) node.Node[K, V] {    for {    RETRY:       var (          emptyBucket *paddedBucket          emptyIdx    int       )       //获取map的table       t := (*table[K])(atomic.LoadPointer(&m.table))       tableLen := len(t.buckets)       hash := t.calcShiftHash(n.Key())//获取node的哈希值       bucketIdx := hash & t.mask //获取node在table中的bucket位置       //获取node所在的bucket位置       rootBucket := &t.buckets[bucketIdx]       //获取所操作的bucket锁，在resize时，会创建一个新的table，然后将原table中的数据拷贝到新创建的table中。       //resize的copyBuckets是以bucket为单位进行拷贝的，且在拷贝时，也会对bucket加锁。这样就保证了，如果同时发生set和resize，       //resize的copyBuckets也会等操作相同bucket的set结束之后才会进行拷贝。       rootBucket.mutex.Lock()       // the following two checks must go in reverse to what's       // in the resize method.       //如果正在调整map大小，则可能会生成一个新的table，为了防止出现无效操作，此时不允许继续添加数据       if m.resizeInProgress() {          // resize is in progress. wait, then go for another attempt.          rootBucket.mutex.Unlock()          m.waitForResize()          goto RETRY       }       //如果当前操作的是一个新的table，需要重新选择table       if m.newerTableExists(t) {          // someone resized the table, go for another attempt.          rootBucket.mutex.Unlock()          goto RETRY       }       b := rootBucket       //set node的逻辑是首先在bucket链表中搜索是否已经存在该node，如果存在则直接更新，如果不存在再找一个空位将其set进去       for {          //本循环用于在单个bucket中查找是否已经存在需要set的node。如果找到则根据是否设置onlyIfAbsent来选择          //是否原地更新。如果没有在当前bucket中找到所需的node，则需要继续查找下一个bucket          for i := 0; i < bucketSize; i++ {             h := b.hashes[i]             if h == uint64(0) {                if emptyBucket == nil {                   emptyBucket = b //找到一个最近的空位，如果后续没有在bucket链表中找到已存在的node，则将node添加到该位置                   emptyIdx = i                }                continue             }             if h != hash { //查找与node哈希值相同的node                continue             }             prev := m.nodeManager.FromPointer(b.nodes[i])             if n.Key() != prev.Key() { //为了避免哈希碰撞，进一步比较node的key                continue             }             if onlyIfAbsent { //onlyIfAbsent用于表示，如果node已存在，则不会再更新                // found node, drop set                rootBucket.mutex.Unlock()                return n             }             // in-place update.             // We get a copy of the value via an interface{} on each call,             // thus the live value pointers are unique. Otherwise atomic             // snapshot won't be correct in case of multiple Store calls             // using the same value.             atomic.StorePointer(&b.nodes[i], n.AsPointer())//node原地更新，保存node指针即可             rootBucket.mutex.Unlock()             return prev          }         //b.next == nil说明已经查找到最后一个bucket，如果整个bucket链表中都没有找到所需的node，则表示这是新的node，需要将node         //添加到bucket中。如果bucket空间不足，则需要进行扩容          if b.next == nil {             //如果已有空位，直接添加node即可             if emptyBucket != nil {                // insertion into an existing bucket.                // first we update the hash, then the entry.                atomic.StoreUint64(&emptyBucket.hashes[emptyIdx], hash)                atomic.StorePointer(&emptyBucket.nodes[emptyIdx], n.AsPointer())                rootBucket.mutex.Unlock()                t.addSize(bucketIdx, 1)                return nil             }            //这里判断map中的元素总数是不是已经达到扩容阈值growThreshold，即当前元素总数大于容量的0.75倍时就执行扩容            //其实growThreshold计算的是table中的buckets链表的数目，而t.sumSize()计算的是tables中的node总数，即            //所有链表中的节点总数。这么比较的原因是为了降低计算的时间复杂度，当tables中的nodes较多时，能够及时扩容            //buckets数目，而不是一味地增加链表长度。            //参见：https://github.com/maypok86/otter/issues/79             growThreshold := float64(tableLen) * bucketSize * loadFactor             if t.sumSize() > int64(growThreshold) {                // need to grow the table then go for another attempt.                rootBucket.mutex.Unlock()               //扩容，然后重新在该bucket中查找空位。需要注意的是扩容会给map生成一个新的table，               //并将原table的数据拷贝过来，由于table变了，因此需要重新set(goto RETRY)                m.resize(t, growHint)                goto RETRY             }             // insertion into a new bucket.             // create and append the bucket.            //如果前面bucket中没有空位，且没达到扩容要求，则需要新建一个bucket，并将其添加到bucket链表中             newBucket := &paddedBucket{}             newBucket.hashes[0] = hash             newBucket.nodes[0] = n.AsPointer()             atomic.StorePointer(&b.next, unsafe.Pointer(newBucket))//保存node             rootBucket.mutex.Unlock()             t.addSize(bucketIdx, 1)             return nil          }         //如果没有在当前bucket中找到所需的node，则需要继续查找下一个bucket          b = (*paddedBucket)(b.next)       }    } }

func (m *Map[K, V]) copyBuckets(b *paddedBucket, dest *table[K]) (copied int) {    rootBucket := b    //使用bucket锁    rootBucket.mutex.Lock()    for {       for i := 0; i < bucketSize; i++ {          if b.nodes[i] == nil {             continue          }          n := m.nodeManager.FromPointer(b.nodes[i])          hash := dest.calcShiftHash(n.Key())          bucketIdx := hash & dest.mask          dest.buckets[bucketIdx].add(hash, b.nodes[i])          copied++       }       if b.next == nil {          rootBucket.mutex.Unlock()          return copied       }       b = (*paddedBucket)(b.next)    } }

Get的逻辑和set的逻辑类似，但get时无需关心是否会操作老的table，原因是如果产生了新的table，其也会复制老的数据。

s3-FIFO

s3-FIFO可以看作是hashmap的数据过滤器，使用s3-FIFO来淘汰hashmap中的数据。

Dqueue

S3-FIFO的ghost使用了Dqueue。

Dqueue就是一个ring buffer，支持PopFront/PushFront和PushBack/PopBack，其中buffer size为2的幂。其快于golang的container/list库。

由于是ring buffer，随着push和pop操作，其back和front的位置会发生变化，因此可能会出现back push的数据到了Front前面的情况。

用法如下：

package main  import (     "fmt"     "github.com/gammazero/deque" )  func main() {     var q deque.Deque[string]     q.PushBack("foo")     q.PushBack("bar")     q.PushBack("baz")      fmt.Println(q.Len())   // Prints: 3     fmt.Println(q.Front()) // Prints: foo     fmt.Println(q.Back())  // Prints: baz      q.PopFront() // remove "foo"     q.PopBack()  // remove "baz"      q.PushFront("hello")     q.PushBack("world")      // Consume deque and print elements.     for q.Len() != 0 {         fmt.Println(q.PopFront())     } }

readBuffers

在读取数据时，会将获取的数据也保存到readBuffers中，readBuffers的空间比较小，其中的数据可以看作是热点数据。当某个readBuffers[i]数组满了之后，会将readBuffers[i]中的所有nodes返回出来，并增加各个node的freq(给s3-FIFO使用)，然后清空readBuffers[i]。

readBuffers是由4倍最大goroutines并发数的lossy.Buffer构成的数组，lossy.Buffer为固定大小的ring buffer 结构，包括用于创建node的nodeManager以及存放node数组的policyBuffers，容量大小为capacity(16)。

parallelism := xruntime.Parallelism() roundedParallelism := int(xmath.RoundUpPowerOf2(parallelism))

readBuffersCount := 4 * roundedParallelism readBuffers := make([]*lossy.Buffer[K, V], 0, readBuffersCount)

使用nodeManager来初始化lossy.Buffer，

for i := 0; i < readBuffersCount; i++ {   readBuffers = append(readBuffers, lossy.New[K, V](nodeManager)) }

下面是lossy.New的实现，Buffer长度为2的幂。

type Buffer[K comparable, V any] struct { 	head                 atomic.Uint64 //指向buffer的head 	headPadding          [xruntime.CacheLineSize - unsafe.Sizeof(atomic.Uint64{})]byte 	tail                 atomic.Uint64 //指向buffer的tail 	tailPadding          [xruntime.CacheLineSize - unsafe.Sizeof(atomic.Uint64{})]byte 	nodeManager          *node.Manager[K, V] //用于管理node   returned             unsafe.Pointer //可以看做是一个条件锁，和hashmap的resizing作用类似，防止在buffer变更(add/free)的同时添加node 	returnedPadding      [xruntime.CacheLineSize - 2*8]byte   policyBuffers        unsafe.Pointer //指向一个容量为16的PolicyBuffers，用于复制读缓存(buffer)中的热点数据 	returnedSlicePadding [xruntime.CacheLineSize - 8]byte 	buffer               [capacity]unsafe.Pointer //存储读缓存的数据 }

type PolicyBuffers[K comparable, V any] struct { 	Returned []node.Node[K, V] }

func New[K comparable, V any](nodeManager *node.Manager[K, V]) *Buffer[K, V] { 	pb := &PolicyBuffers[K, V]{ 		Returned: make([]node.Node[K, V], 0, capacity), 	} 	b := &Buffer[K, V]{ 		nodeManager:   nodeManager, 		policyBuffers: unsafe.Pointer(pb), 	} 	b.returned = b.policyBuffers 	return b }

下面是向readBuffers中添加数据的方式：

// Add lazily publishes the item to the consumer. // // item may be lost due to contention. func (b *Buffer[K, V]) Add(n node.Node[K, V]) *PolicyBuffers[K, V] { 	head := b.head.Load() 	tail := b.tail.Load() 	size := tail - head   //并发访问可能会导致这种情况，buffer满了就无法再添加元素，需要由其他操作通过返回热点数据来释放buffer空间 	if size >= capacity { 		// full buffer 		return nil 	}   // 添加开始，将tail往后移一位 	if b.tail.CompareAndSwap(tail, tail+1) { 		// tail中保存的是下一个元素的位置。使用mask位与是为了获取当前ring buffer中的tail位置。 		index := int(tail & mask)     // 将node的指针保存到buffer的第index位，这样就完成了数据存储 		atomic.StorePointer(&b.buffer[index], n.AsPointer())      // buffer满了,此时需要清理缓存，即将读缓存buffer中的热点数据数据存放到policyBuffers中，后续给s3-FIFO使用 		if size == capacity-1 { 			// 这里可以看做是一个条件锁，如果有其他线程正在处理热点数据，则退出。 			if !atomic.CompareAndSwapPointer(&b.returned, b.policyBuffers, nil) { 				// somebody already get buffer 				return nil 			}        //将整个buffer中的数据保存到policyBuffers中，并清空buffer。 			pb := (*PolicyBuffers[K, V])(b.policyBuffers) 			for i := 0; i < capacity; i++ {         // 获取head的索引 				index := int(head & mask) 				v := atomic.LoadPointer(&b.buffer[index]) 				if v != nil { 					// published 					pb.Returned = append(pb.Returned, b.nodeManager.FromPointer(v)) 					// 清空buffer的数据 					atomic.StorePointer(&b.buffer[index], nil) 				} 				head++ 			}  			b.head.Store(head) 			return pb 		} 	}  	// failed 	return nil }

Otter中的Add和Free是成对使用的，只有在Free中才会重置Add中变更的Buffer.returned。因此如果没有执行Free，则对相同Buffer的其他Add操作也无法返回热点数据。

idx := c.getReadBufferIdx() pb := c.readBuffers[idx].Add(got) //获取热点数据 if pb != nil {   c.evictionMutex.Lock()   c.policy.Read(pb.Returned) //增加热点数据的freq   c.evictionMutex.Unlock()    c.readBuffers[idx].Free() //清空热点数据存放空间 }

Free方法如下：

// 在add返回热点数据，并在增加热点数据的freq之后，会调用Free方法释放热点数据的存放空间 func (b *Buffer[K, V]) Free() { 	pb := (*PolicyBuffers[K, V])(b.policyBuffers) 	for i := 0; i < len(pb.Returned); i++ { 		pb.Returned[i] = nil //清空热点数据 	} 	pb.Returned = pb.Returned[:0] 	atomic.StorePointer(&b.returned, b.policyBuffers) }

writebuffer

writebuffer队列用于保存node的增删改事件，并由另外一个goroutine异步处理这些事件。事件类型如下：

const ( 	addReason reason = iota + 1 	deleteReason 	updateReason 	clearReason //执行cache.Clear 	closeReason //执行cache.Close )

writebuffer的初始大小是最大并发goroutines数目的128倍：

queue.NewGrowable[task[K, V]](minWriteBufferCapacity, maxWriteBufferCapacity),

Growable是一个可扩展的ring buffer，从尾部push，从头部pop。在otter中作为存储node变动事件的缓存，类似kubernetes中的workqueue。

type Growable[T any] struct { 	mutex    sync.Mutex 	notEmpty sync.Cond //用于通过push来唤醒由于队列中由于没有数据而等待的Pop操作 	notFull  sync.Cond //用于通过pop来唤醒由于数据量达到上限maxCap而等待的Push操作 	buf      []T //保存事件 	head     int //指向buf中下一个可以pop数据的索引 	tail     int //指向buf中下一个可以push数据的索引 	count    int //统计buf中的数据总数 	minCap   int //定义了buf的初始容量 	maxCap   int //定义了buf的最大容量，当count数目达到该值之后就不能再对buf进行扩容，需要等待pop操作来释放空间 }

writebuffer的队列长度同样是2的幂，包括minCap和maxCap也是是2的幂：

func NewGrowable[T any](minCap, maxCap uint32) *Growable[T] { 	minCap = xmath.RoundUpPowerOf2(minCap) 	maxCap = xmath.RoundUpPowerOf2(maxCap)  	g := &Growable[T]{ 		buf:    make([]T, minCap), 		minCap: int(minCap), 		maxCap: int(maxCap), 	}  	g.notEmpty = *sync.NewCond(&g.mutex) 	g.notFull = *sync.NewCond(&g.mutex)  	return g }

下面是扩展writebuffer的方法：

func (g *Growable[T]) resize() { 	newBuf := make([]T, g.count<<1) //新的buf是原来的2倍 	if g.tail > g.head { 		copy(newBuf, g.buf[g.head:g.tail]) //将事件拷贝到新的buf 	} else { 		n := copy(newBuf, g.buf[g.head:]) //pop和push操作导致head和tail位置变动，且tail位于head之前，需要作两次copy 		copy(newBuf[n:], g.buf[:g.tail]) 	}  	g.head = 0 	g.tail = g.count 	g.buf = newBuf }

Node 过期策略

支持的过期策略有：

固定TTL：所有node的过期时间都一样。将node保存到队列中，因此最早入队列的node最有可能过期，按照FIFO的方式获取队列中的node，判断其是否过期即可。
可变过期策略：这里参考了Bucket-Based Expiration Algorithm: Improving Eviction Efficiency for In-Memory Key-Value Database，该算法的要点是将时间转换为空间位置
无过期策略：即不配置过期时间，在调用RemoveExpired获取过期的nodes时，认为所有nodes都是过期的。

可变过期策略

下面介绍可变过期策略的实现：

var ( 	buckets = []uint32{64, 64, 32, 4, 1}   //注意spans中的元素值都是2的幂，分别为1(span[0]),64(span[1]),4096(span[2]),131072(span[3]),524288(span[4])。   //上面的buckets定义也很有讲究，spans[i]表示该buckets[i]的超时单位，buckets[i][j]的过期时间为j个spans[i]，即过期时间为j*spans[i]。   //buckets之所以为{64, 64, 32, 4, 1}，是因为buckets[1]的超时单位为64s，因此如果过期时间大于64s就需要使用buckets[1]的超时单位spans[1]，   //反之则使用buckets[0]的超时单位spans[0]，因此buckets[0]长度为64(64/1=64)；   //以此类推，buckets[2]的超时单位为4096s,如果过期时间大于4096s就需要使用buckets[2]的超时单位spans[2]，反之则使用buckets[1]的超时单位spans[1]，   //因此buckets[1]长度为64(4096/64=64)；buckets[3]的超时单位为131072s,如果过期时间大于131072s就需要使用buckets[3]的超时单位spans[3]，   //反之则使用buckets[2]的超时单位spans[2]，因此buckets[2]长度为32(131072/4096=32)...   //spass[4]作为最大超时时间单位，超时时间大于该spans[4]时，都按照spans[4]计算   //buckets[i]的长度随过期时间的增加而减少，这也符合常用场景，因为大部分场景中的过期时间都较短，像1.52d这种级别的过期时间比较少见 	spans   = []uint32{ 		xmath.RoundUpPowerOf2(uint32((1 * time.Second).Seconds())),             // 1s--2^0 		xmath.RoundUpPowerOf2(uint32((1 * time.Minute).Seconds())),             // 1.07m --64s--2^6 		xmath.RoundUpPowerOf2(uint32((1 * time.Hour).Seconds())),               // 1.13h --4096s--2^12 		xmath.RoundUpPowerOf2(uint32((24 * time.Hour).Seconds())),              // 1.52d --131072s--2^17 		buckets[3] * xmath.RoundUpPowerOf2(uint32((24 * time.Hour).Seconds())), // 6.07d --524288s--2^19 		buckets[3] * xmath.RoundUpPowerOf2(uint32((24 * time.Hour).Seconds())), // 6.07d --524288s--2^19 	} 	shift = []uint32{ 		uint32(bits.TrailingZeros32(spans[0])), 		uint32(bits.TrailingZeros32(spans[1])), 		uint32(bits.TrailingZeros32(spans[2])), 		uint32(bits.TrailingZeros32(spans[3])), 		uint32(bits.TrailingZeros32(spans[4])), 	} )

下面是缓存数据使用的数据结构。

type Variable[K comparable, V any] struct { 	wheel [][]node.Node[K, V]   time  uint32 }

Variable.wheel的数据结构如下，Variable.wheel[i][]的数组长度等于buckets[i]，buckets[i]的超时单位为spans[i]，Variable.wheel[i][j]表示过期时间为j*spans[i]的数据所在的位置。

但由于超时单位跨度比较大，因此即使Variable.wheel[i][j]所在的nodes被认为是过期的，也需要进一步确认node是否真正过期。以64s的超时单位为例，过期时间为65s的node和过期时间为100s的node会放到相同的wheel[1][0]链表中，若当前时间为80s，则只有过期时间为65s的node才是真正过期的。因此需要进一步比较具体的node过期时间。
Variable.time是一个重要的成员：其表示上一次执行清理操作(移除过期数据或清除所有数据)的时间，并作为各个wheel[i]数组中的有效数据的起点。该值在执行清理操作之后会被重置，表示新的有效数据起点。要理解该成员的用法，应该将Variable.wheel[i]的数组看做是一个个时间块(而非位置点)，每个时间块表示一个超时单位。

`Variable`的初始化

Variable的初始化方式如下，主要就是初始化一个二维数组：

func NewVariable[K comparable, V any](nodeManager *node.Manager[K, V]) *Variable[K, V] { 	wheel := make([][]node.Node[K, V], len(buckets)) 	for i := 0; i < len(wheel); i++ { 		wheel[i] = make([]node.Node[K, V], buckets[i]) 		for j := 0; j < len(wheel[i]); j++ { 			var k K 			var v V 			fn := nodeManager.Create(k, v, math.MaxUint32, 1) //默认过期时间为math.MaxUint32，相当于没有过期时间 			fn.SetPrevExp(fn) 			fn.SetNextExp(fn) 			wheel[i][j] = fn 		} 	} 	return &Variable[K, V]{ 		wheel: wheel, 	} }

删除过期数据

func (v *Variable[K, V]) RemoveExpired(expired []node.Node[K, V]) []node.Node[K, V] { 	currentTime := unixtime.Now()//获取到目前为止，系统启动的秒数，以此作为当前时间 	prevTime := v.time //获取上一次执行清理的时间,在使用时会将其转换为以spans[i]为单位的数值，作为各个wheel[i]的起始清理位置 	v.time = currentTime //重置v.time，本次清理之后的有效数据的起始位置，也可以作为下一次清理时的起始位置    //在清理数据时会将时间转换以spans[i]为单位的数值。delta表示上一次清理之后到当前的时间差。   //在清理时需要遍历清理各个wheel[i]，如果delta大于buckets[i]，则认为整个wheel[i]都可能出现过期数据，   //反之，则认为wheel[i]的部分区间数据可能过期。 	for i := 0; i < len(shift); i++ {     //在prevTime和currentTime都小于shift[i]或二者非常接近的情况下delta可能为0，但delte为0时无需执行清理动作 		previousTicks := prevTime >> shift[i] 		currentTicks := currentTime >> shift[i] 		delta := currentTicks - previousTicks 		if delta == 0 {	 			break 		}  		expired = v.removeExpiredFromBucket(expired, i, previousTicks, delta) 	}  	return expired }

下面用于清理wheel[i]下的过期数据：

func (v *Variable[K, V]) removeExpiredFromBucket(expired []node.Node[K, V], index int, prevTicks, delta uint32) []node.Node[K, V] { 	mask := buckets[index] - 1   //获取buckets[index]对应的数组长度 	steps := buckets[index]   //如果delta小于buckets[index]的大小，则[start,start+delta]之间的数据可能是过期的   //如果delta大于buckets[index]的大小，则整个buckets[i]都可能是过期的 	if delta < steps { 		steps = delta 	}   //取上一次清理的时间作为起始位置，[start,end]之间的数据都认为可能是过期的 	start := prevTicks & mask 	end := start + steps 	timerWheel := v.wheel[index] 	for i := start; i < end; i++ {     //遍历wheel[index][i]中的链表 		root := timerWheel[i&mask] 		n := root.NextExp() 		root.SetPrevExp(root) 		root.SetNextExp(root)  		for !node.Equals(n, root) { 			next := n.NextExp() 			n.SetPrevExp(nil) 			n.SetNextExp(nil)       //注意此时v.time已经被重置为当前时间。进一步比较具体的node过期时间。 			if n.Expiration() <= v.time { 				expired = append(expired, n) 			} else { 				v.Add(n) 			}  			n = next 		} 	}  	return expired }

下图展示了删除过期数据的方式

v.time中保存了上一次清理的时间，进而转换为本次wheel[i]的清理起始位置

在下一次清理时，会在此读取上一次清理的时间，并作为本次wheel[i]的清理起始位置
![image-20240418154846844](/Users/charlie.liu/Library/Application Support/typora-user-images/image-20240418154846844.png)

添加数据

添加数据时首先需要找到该数据在Variable.wheel中的位置Variable.wheel[i][j]，然后添加到该位置的链表中即可。

在添加数据时需要避免将数据添加到上一次清理点之前

// findBucket determines the bucket that the timer event should be added to. func (v *Variable[K, V]) findBucket(expiration uint32) node.Node[K, V] {   //expiration是绝对时间。获取距离上一次清理过期数据(包括清理所有数据)所过去的时间，或看做是和起始有效数据的距离。   duration := expiration - v.time 	length := len(v.wheel) - 1 	for i := 0; i < length; i++ {     //找到duration的最佳超时单位spans[i] 		if duration < spans[i+1] {       //计算expiration包含多少个超时单位，并以此作为其在wheel[i]中的位置index。       //expiration >> shift[i]等价于(duration + v.time)>> shift[i]，即和起始有效数据的距离 			ticks := expiration >> shift[i] 			index := ticks & (buckets[i] - 1) 			return v.wheel[i][index] 		} 	} 	return v.wheel[length][0] //buckets[4]的长度为1，因此二维索引只有一个值0。 }

Cache的Set & Get

Set

添加node时需要同时处理node add/update事件。

func (c *Cache[K, V]) set(key K, value V, expiration uint32, onlyIfAbsent bool) bool {   //限制node的cost大小，过大会占用更多的缓存空间 	cost := c.costFunc(key, value) 	if int(cost) > c.policy.MaxAvailableCost() { 		c.stats.IncRejectedSets() 		return false 	}  	n := c.nodeManager.Create(key, value, expiration, cost)   //只添加不存在的节点 	if onlyIfAbsent {     //res == nil说明是新增的node 		res := c.hashmap.SetIfAbsent(n) 		if res == nil { 			// 将node添加事件添加到writeBuffer中 			c.writeBuffer.Push(newAddTask(n)) 			return true 		} 		c.stats.IncRejectedSets() //如果node存在，则不作任何处理，增加rejected统计 		return false 	}    //evicted != nil表示对已有node进行了更新，反之则表示新加的node 	evicted := c.hashmap.Set(n) 	if evicted != nil { 		// update，将老节点evicted设置为无效状态，并将node更新事件添加到writeBuffer中 		evicted.Die() 		c.writeBuffer.Push(newUpdateTask(n, evicted)) 	} else { 		// 将node添加事件添加到writeBuffer中 		c.writeBuffer.Push(newAddTask(n)) 	}  	return true }

Get

Get需要处理删除过期node事件。

// GetNode returns the node associated with the key in this cache. func (c *Cache[K, V]) GetNode(key K) (node.Node[K, V], bool) { 	n, ok := c.hashmap.Get(key) 	if !ok || !n.IsAlive() { //不返回非active状态的node 		c.stats.IncMisses() 		return nil, false 	}    //如果node过期，需要将node删除事件添加到writeBuffer中，后续由其他goroutine执行数据删除 	if n.HasExpired() { 		c.writeBuffer.Push(newDeleteTask(n)) 		c.stats.IncMisses() 		return nil, false 	}    //在读取node之后的动作，获取热点node，并增加s3-FIFO node的freq 	c.afterGet(n)   //增加命中统计 	c.stats.IncHits()  	return n, true }

在成功读取node之后，需要处理热点nodes：

func (c *Cache[K, V]) afterGet(got node.Node[K, V]) { 	idx := c.getReadBufferIdx()   //获取热点nodes 	pb := c.readBuffers[idx].Add(got) 	if pb != nil { 		c.evictionMutex.Lock()     //增加nodes的freq 		c.policy.Read(pb.Returned) 		c.evictionMutex.Unlock()     //已经处理完热点数据，清理存放热点数据的buffer 		c.readBuffers[idx].Free() 	} }

另外还有一种获取方法，此方法中不会触发驱逐策略，即不会用到readBuffers和s3-FIFO：

func (c *Cache[K, V]) GetNodeQuietly(key K) (node.Node[K, V], bool) { 	n, ok := c.hashmap.Get(key) 	if !ok || !n.IsAlive() || n.HasExpired() { 		return nil, false 	}  	return n, true }

事件和过期数据的处理

otter有两种途径来处理缓存中的数据，一种是通过处理writeBuffer中的事件来对缓存数据进行增删改，另一种是定期清理过期数据。

事件处理

writeBuffer中保存了缓存读写过程中的事件。

需要注意的是hashmap中的数据会按照add/delete操作实时更新，只有涉及到s3-FIFO驱逐的数据才会通过writeBuffer异步更新。

func (c *Cache[K, V]) process() { 	bufferCapacity := 64 	buffer := make([]task[K, V], 0, bufferCapacity) 	deleted := make([]node.Node[K, V], 0, bufferCapacity) 	i := 0 	for {     //从writeBuffer中获取一个事件 		t := c.writeBuffer.Pop()      //调用Cache.Clear()或Cache.Close()时会清理cache。Cache.Clear()和Cache.Close()中都会清理hashmap和readBuffers     //这里清理writebuffer和s3-FIFO 		if t.isClear() || t.isClose() { 			buffer = clearBuffer(buffer) 			c.writeBuffer.Clear()  			c.evictionMutex.Lock() 			c.policy.Clear() 			c.expiryPolicy.Clear() 			if t.isClose() { 				c.isClosed = true 			} 			c.evictionMutex.Unlock()       //清理完成 			c.doneClear <- struct{}{}       //如果是close则直接退出，否则(clear)会继续处理writeBuffer中的事件 			if t.isClose() { 				break 			} 			continue 		}      //这里使用了批量处理事件的方式 		buffer = append(buffer, t) 		i++ 		if i >= bufferCapacity { 			i -= bufferCapacity  			c.evictionMutex.Lock()  			for _, t := range buffer { 				n := t.node() 				switch { 				case t.isDelete()://删除事件，发生在直接删除数据或数据过期的情况下。删除expiryPolicy，和s3-FIFO中的数据 					c.expiryPolicy.Delete(n) 					c.policy.Delete(n) 				case t.isAdd()://添加事件，发送在新增数据的情况下，将数据添加到expiryPolicy和s3-FIFO中 					if n.IsAlive() { 						c.expiryPolicy.Add(n) 						deleted = c.policy.Add(deleted, n) //添加驱逐数据 					} 				case t.isUpdate()://更新事件，发生在添加相同key的数据的情况下，此时需删除老数据，并添加活动状态的新数据 					oldNode := t.oldNode() 					c.expiryPolicy.Delete(oldNode) 					c.policy.Delete(oldNode) 					if n.IsAlive() { 						c.expiryPolicy.Add(n) 						deleted = c.policy.Add(deleted, n) //添加驱逐数据 					} 				} 			}        //从expiryPolicy中删除s3-FIFO驱逐的数据 			for _, n := range deleted { 				c.expiryPolicy.Delete(n) 			}  			c.evictionMutex.Unlock()  			for _, t := range buffer { 				switch { 				case t.isDelete(): 					n := t.node() 					c.notifyDeletion(n.Key(), n.Value(), Explicit) 				case t.isUpdate(): 					n := t.oldNode() 					c.notifyDeletion(n.Key(), n.Value(), Replaced) 				} 			}        //从hashmap中删除s3-FIFO驱逐的数据 			for _, n := range deleted { 				c.hashmap.DeleteNode(n) 				n.Die() 				c.notifyDeletion(n.Key(), n.Value(), Size) 				c.stats.IncEvictedCount() 				c.stats.AddEvictedCost(n.Cost()) 			}  			buffer = clearBuffer(buffer) 			deleted = clearBuffer(deleted) 			if cap(deleted) > 3*bufferCapacity { 				deleted = make([]node.Node[K, V], 0, bufferCapacity) 			} 		} 	} }

清理过期数据

cleanup是一个单独的goroutine，用于定期处理Cache.hashmap中的过期数据。在调用Cache.Get时会判断并删除(通过向writeBuffer中写入deleteReason事件，由process goroutine异步删除)s3-FIFO(Cache.policy)中的过期数据。

另外无需处理readbuffers中的过期数据，因为从readbuffers读取到热点数据之后，只会增加这些数据的freq，随后会清空存放热点数据的空间，不会对其他组件的数据造成影响。

func (c *Cache[K, V]) cleanup() { 	bufferCapacity := 64 	expired := make([]node.Node[K, V], 0, bufferCapacity) 	for { 		time.Sleep(time.Second) //每秒尝试清理一次过期数据  		c.evictionMutex.Lock() 		if c.isClosed { 			return 		}      //删除expiryPolicy、policy和hashmap中的过期数据 		expired = c.expiryPolicy.RemoveExpired(expired) 		for _, n := range expired { 			c.policy.Delete(n) 		}  		c.evictionMutex.Unlock()  		for _, n := range expired { 			c.hashmap.DeleteNode(n) 			n.Die() 			c.notifyDeletion(n.Key(), n.Value(), Expired) 		}  		expired = clearBuffer(expired) 		if cap(expired) > 3*bufferCapacity { 			expired = make([]node.Node[K, V], 0, bufferCapacity) 		} 	} }