Cache

1. What is Cache
2. What is LRU
3. LRU Cache Structure in LevelDB
4. Overall Cache Flow

1. What is Cache

Cache is hardware or software that temporarily stores data in computing environments. It is a smaller, faster, and more expensive memory used to improve the performance of recently or frequently accessed data.

There are several types of caches:

CPU Cache: Memory mounted next to the CPU
Disk Cache: Memory built into hard disks for disk control and external interface
Other Caches: These are the caches we need to understand in LevelDB. Caches other than the two mentioned above are mainly managed through software, such as page cache where the operating system's main memory is copied to the hard disk.

2. What is LRU

LRU(Least Recently Used) policy is a page replacement algorithm, which means that the least recently used data will be removed. The basic assumption of this algorithm is that the data that has not been used for the longest time is unlikely to be used in the future.

LRU Cache works by removing the least recently used item when the cache is full and placing a new node. LRU Cache can be implemented using double linked list.

Below is a diagram explaining LRU Cache.

The basic assumption is that the data closer to Head (rightmost node) is recently used and the data closer to Tail (leftmost node) is old. When accessing cached data, move the data to Head to indicate that it is recently used and remove it from the priority queue.

The order of reading data is A, B, C, D, E, D, F.

The numbers in parentheses indicate sorting. The smaller the number, the closer the data is to Tail.
When E is read, the cache is full, so the oldest data A is removed.
Continue reading D and F, and you can see that the node with the least access count is removed and replaced.

LevelDB uses two different data structures for LRU Cache:

HashTable (HandleTable): Used for lookup
Double Linked List: Used for removing old data

3. LRU Cache Structure in LevelDB

To understand the LRU Cache structure in LevelDB accurately, we analyzed the source code of LevelDB through Git clone.

(source code download)

git clone --recurse-submodules https://github.com/google/leveldb.git leveldb_release

Below is a rough diagram of the overall LRU Cache structure. (For accurate code analysis, please refer to leveldb_release/build/util/cache.cc.)

(1) ShardedLRUCache

ShardedLRUCache in LevelDB consists of 16 LRU Caches internally and is defined as the external LRUCache.

Sharding

Sharding means dividing data into pieces and managing them. It is a kind of load balancing technique that distributes workload requests to multiple servers or processing devices. It means dividing a large database or network system into multiple small pieces and distributing them for storage and management.

Below is a diagram showing how to divide a table into different tables.

LevelDB uses LRU Cache through Sharding technique. ShardedLRUCache maintains an array of LRU Caches internally.

When calling LRU Caches in ShardedLRUCache, Lock is performed for each LRU Cache to prevent multiple threads from accessing the same LRU Cache object.

(2) LRUCache

Each Item in LRUCache is an LRUHandle, which will be described in (3) LRUHandle. LRUCache is managed by 1 HandleTable (HashTable) and 2 Double Linked Lists. LRUHandles are managed in both Double Linked List and HashTable HandleTable.

Below are the three elements that make up the Cache: Double Linked List and HashTable.

LRUHandle lru_; // will be described (5).
LRUHandle in_use_; // will be described (6).
HandleTable table_; // will be described (4).

(3) LRUHandle

LRUCache's object. HandleTable (HashTable) implements fast lookup by combining with Double Linked List and Double Linked List implements fast addition and deletion.

LRUHandle's components:

struct LRUHandle {
    void* value;
    void (*deleter)(const Slice&, void* value); // to free the key and value space
    LRUHandle* next_hash; // handle hash collisions and keep objects (LRUHandle) belonging to the array
    LRUHandle* next; // used to maintain LRU order in doubly linked list
    LRUHandle* prev; // points to the previous node in a doubly linked list
    size_t charge;  
    size_t key_length; // length of key value
    bool in_cache; // whether the handle is in the cache table
    uint32_t refs; // the number of times the handle was referenced    
    uint32_t hash; // hash value of key used to determine sharding and fast comparison    
    char key_data[1]; // start of key
};

(4) HandleTable

It is a very simple hash table.

Declare a double pointer list_ in the code to manage LRUHandle objects.

uint32_t length_; // length of LRUHandle* array
uint32_t elems_; // number of nodes in the table
LRUHandle** list_;

Below is a diagram showing LRUHandle** list_ as LRUHandle* list_.

Below are the LRUHandle methods for the HandleTable class.

LRUHandle** FindPointer(const Slice& key, uint32_t hash);
// use next_hash to traverse the double linked lists until find the item corresponding to the key.
// if no match is found, next_hash points to the end of list_ and the value of next_hash is nullptr.
// if a match is found, a double pointer (LRUHandle) pointed to by next_hash is returned.
// due to the above characteristics, it is used in the first line of Lookup, Insert, and Remove function.

LRUHandle* Lookup(const Slice& key, uint32_t hash);
// directly read the value pointed to by the pointer (LRUHandle) by calling FindPointer

LRUHandle* Insert(LRUHandle* h);
// call FindPointer to put LRUHandle at the location pointed to by the pointer (next_hash).

LRUHandle* Remove(const Slice& key, uint32_t hash);
// directly deletes the item pointed to by the pointer (next_hash) by calling FindPointer

void Resize();
// dynamically expand array as LRUHandle grows

(5) in_use_ and (6) lru_

Each of them is a data structure that makes up the LRU Cache: Double Linked List. They separate hot and cold data and maintain an internal HashTable. Hot data is frequently accessed, while cold data is rarely accessed.

(5) in_use_

Hot linked list, which maintains the cached object for use by the caller.

(6) lru_

Cold linked list, which maintains the popularity of cached objects.

Both of these double linked lists are managed by the refs variable. in_use_ has refs of 2 because it is referenced by external users, and lru_ has refs of 1 because it is not referenced.

Here, lru_ and in_use_ are not actual nodes; they do not store data themselves. Data is always inserted into the lru_ list first. (Important, data is always inserted into the lru_ list first.)

lru_->next (next pointer) points to the oldest data, and lru_->prev (previous pointer) points to the newest data. When a new Handle is inserted, it is inserted at the end of the double linked list, and lru_->prev (previous pointer) points to the newly inserted Handle. When a cache is accessed by external users, it goes from lru_ to next_hash to in_use_, and refs increases from 1 to 2.

4. Overall Cache Flow

LevelDB Cache Mechanism

In LevelDB, Cache is checked when reading data, and the mechanism is shown in the diagram below.

The function Get is called when reading data.

leveldb::Status s = db->Get(leveldb::ReadOptions(), key, &value);

Part of the source code of DBImpl::Get:

<1> MemTable* mem = mem_;
<2> MemTable* imm = imm_;
<3> Version* current = versions_->current();

if (mem->Get(lkey, value, &s)) { <4>
} else if (imm != nullptr && imm->Get(lkey, value, &s)) { 
    <5>
} else { 
    <6>
    s = current->Get(options, lkey, value, &stats);
    have_stat_update = true;
}

Declare memtable mem_ to find the table in memtable
If not found in memtable, declare memtable imm_ to find the table in immutable memtable
If not found in both, declare version current to get the current version to find the table in sstable
Call Memtable::Get to read the table from disk
Similarly, call Memtable::Get to read the table from immutable memtable
Call Version::Get to read the table from sstable

Version::Get calls TableCache::Get to read the table.

LevelDB Cache Usage

TableCache::Get is the part where LevelDB uses Cache.

TableCache::Get source code:

Cache::Handle* handle = nullptr;

<1> Status s = FindTable(file_number, file_size, &handle);

if (s.ok()) {
    <2> 
    Table* t = reinterpret_cast<TableAndFile*>(cache_->Value(handle))->table;

    s = t->InternalGet(options, k, arg, handle_result);

    cache_->Release(handle);
}

return s;

+LevelDB has two different caches: TableCache and BlockCache.

<1> TableCache Usage (FindTable)

Find the table corresponding to the file in the cache. If not found, open the file first and then create the table corresponding to the file and add it to the cache.

Below is the structure of TableCache.

key is the file number of the SSTable. value is divided into two sections:

RandomAccessFile: Pointer to the opened SSTable on disk
Table: Pointer to the Table structure corresponding to the SSTable in memory, which stores the cache_id of the SSTable and the block cache

<2> BlockCache Usage (InternalGet -> BlockReader)

The InternalGet function that implements the logic of finding Key in the SSTable calls BlockReader. BlockCache is used when the block is found in the BlockCache; if not found, the file is opened and read, and then the block is added to the BlockCache.

BlockReader Cache Usage Algorithm:
Try to get the block directly from the BlockCache.
If not found in the BlockCache, call ReadBlock to read from the file.
If reading is successful, add the block to the BlockCache.

Below is the structure of BlockCache.

key is composed of the offset of each SSTable file and a unique cache_id to distinguish it from other SSTable files.

value is composed of Data blocks of the opened SSTable file.

After BlockReader finishes looking up in the BlockCache and inserting into the BlockCache, the read operation finishes using the Cache.

Below is a rough diagram of how to use Cache in LevelDB.

LevelDB Cache Creation

TableCache is created using the cache_size option in ./db_bench and set using the TableCache::TableCacheSize function.

TableCache::TableCache BackTrace:

leveldb::Benchmark::Run
leveldb::Benchmark::Open 
leveldb::DB::Open
leveldb::DBImpl::DBImpl
leveldb::TableCache::TableCache

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search