SSTable - Write
SSTables are created in the following situations:
- When a
Flush (Minor Compaction)occurs from the MemTable - When
Compactionoccurs in storage
This document focuses on how SSTables are created when a Flush occurs from the MemTable.
When a Flush Occurs from the MemTable
The process of a Flush occurring from the MemTable is as follows:

CompactMemTable is called, which in turn calls WriteLevel0Table, resulting in the creation of an SSTable. At this point, BuildTable is the function that actually creates the SSTable.
The flow of BuildTable is as follows:
Overall Process

Steps
- Create an instance of
TableBuilder - Add key-value pairs from the MemTable one by one using the
Addmethod ofTableBuilder - Complete the process of creating the SSTable using the
Finishmethod ofTableBuilder - Write the contents in the
WritableFileto storage - Load the SSTable stored in storage into the cache and check if it is available
Status BuildTable(const std::string& dbname, Env* env, const Options& options,
TableCache* table_cache, Iterator* iter, FileMetaData* meta) {
Status s;
meta->file_size = 0;
// Make sure the Iterator points to the first element
iter->SeekToFirst();
std::string fname = TableFileName(dbname, meta->number);
if (iter->Valid()) {
WritableFile* file;
s = env->NewWritableFile(fname, &file);
if (!s.ok()) {
return s;
}
// 1. Create an instance of TableBuilder
TableBuilder* builder = new TableBuilder(options, file);
meta->smallest.DecodeFrom(iter->key());
Slice key;
// 2. Add key-value pairs of MemTable one by one via TableBuilder's "Add" method
for (; iter->Valid(); iter->Next()) {
key = iter->key();
builder->Add(key, iter->value());
}
if (!key.empty()) {
meta->largest.DecodeFrom(key);
}
// 3. Complete the process of creating SSTable via TableBuilder's "Finish" method
s = builder->Finish();
if (s.ok()) {
meta->file_size = builder->FileSize();
assert(meta->file_size > 0);
}
delete builder;
// 4. Write the contents in the WritableFile to storage
if (s.ok()) {
s = file->Sync();
}
if (s.ok()) {
s = file->Close();
}
delete file;
file = nullptr;
if (s.ok()) {
// 5. Put the SSTable stored in storage into cache and check if it is available
Iterator* it = table_cache->NewIterator(ReadOptions(), meta->number,
meta->file_size);
s = it->status();
delete it;
}
}
// Check for Iterator related errors
if (!iter->status().ok()) {
s = iter->status();
}
if (s.ok() && meta->file_size > 0) {
// Keep it
} else {
env->RemoveFile(fname);
}
return s;
}
TableBuilder::Add
The role of passing the key-value pair currently referenced by the Iterator to each BlockBuilder within the TableBuilder.
- If the Data Block being created by the
BlockBuilderis empty, meaning a new Data Block is being started, add a new entry to the Index Block. The entry added at this time is not for the newly started Data Block but for the Data Block that was being created just before. - If using a Bloom Filter, update the Filter Block as well.
- Add data to the Data Block.
- If the Data Block being created is full (i.e., exceeds the block size specified by the option), call
Flush.
void TableBuilder::Add(const Slice& key, const Slice& value) {
Rep* r = rep_;
// ...
// 1. If the Data Block that BlockBuilder is creating is empty,
// add a new entry to the Index Block
if (r->pending_index_entry) {
assert(r->data_block.empty());
r->options.comparator->FindShortestSeparator(&r->last_key, key);
std::string handle_encoding;
r->pending_handle.EncodeTo(&handle_encoding);
r->index_block.Add(r->last_key, Slice(handle_encoding));
r->pending_index_entry = false;
}
// 2. If using Bloom Filter, update the Filter Block as well
if (r->filter_block != nullptr) {
r->filter_block->AddKey(key);
}
r->last_key.assign(key.data(), key.size());
r->num_entries++;
// 3. Add data to the Data Block
r->data_block.Add(key, value);
const size_t estimated_block_size = r->data_block.CurrentSizeEstimate();
// 4. If the Data Block being created is full, call "Flush"
if (estimated_block_size >= r->options.block_size) {
Flush();
}
}
TableBuilder::Finish
Called when
Addis finished for all key-value pairs in the MemTable, and it finalizes the SSTable being written.
- Call
Flush. - If using a Bloom Filter, add the Filter Block to the
WritableFileusingFilterBlockBuilder. - Add the Meta Index Block to the
WritableFile. - Add the Index Block being created by
BlockBuilderto theWritableFile. - Add the Footer to the
WritableFile.
Status TableBuilder::Finish() {
Rep* r = rep_;
// 1. Call "Flush"
Flush();
assert(!r->closed);
r->closed = true;
BlockHandle filter_block_handle, metaindex_block_handle, index_block_handle;
// 2. If using Bloom Filter, add the Filter Block to the WritableFile.
if (ok() && r->filter_block != nullptr) {
WriteRawBlock(r->filter_block->Finish(), kNoCompression,
&filter_block_handle);
}
// 3. Add the Meta Index Block to the WritableFile
if (ok()) {
BlockBuilder meta_index_block(&r->options);
if (r->filter_block != nullptr) {
std::string key = "filter.";
key.append(r->options.filter_policy->Name());
std::string handle_encoding;
filter_block_handle.EncodeTo(&handle_encoding);
meta_index_block.Add(key, handle_encoding);
}
WriteBlock(&meta_index_block, &metaindex_block_handle);
}
// 4. Add the Index Block to the WritableFile.
if (ok()) {
if (r->pending_index_entry) {
r->options.comparator->FindShortSuccessor(&r->last_key);
std::string handle_encoding;
r->pending_handle.EncodeTo(&handle_encoding);
r->index_block.Add(r->last_key, Slice(handle_encoding));
r->pending_index_entry = false;
}
WriteBlock(&r->index_block, &index_block_handle);
}
// 5. Add the Footer to the WritableFile.
if (ok()) {
Footer footer;
footer.set_metaindex_handle(metaindex_block_handle);
footer.set_index_handle(index_block_handle);
std::string footer_encoding;
footer.EncodeTo(&footer_encoding);
r->status = r->file->Append(footer_encoding);
if (r->status.ok()) {
r->offset += footer_encoding.size();
}
}
return r->status;
}
TableBuilder::Flush
The role of writing the Data Block being created to storage.
- Add the contents of the Data Block being created to the
WritableFile. - Write the contents of the
WritableFileto storage. - If using a Bloom Filter, create a new Bloom Filter.
void TableBuilder::Flush() {
Rep* r = rep_;
// ...
// 1. Add the contents of the Data Block being created to the WritableFile
WriteBlock(&r->data_block, &r->pending_handle);
if (ok()) {
r->pending_index_entry = true;
// 2. Write the contents of WritableFile to storage
r->status = r->file->Flush();
}
// 3. If using Bloom Filter, create a new Bloom Filter
if (r->filter_block != nullptr) {
r->filter_block->StartBlock(r->offset);
}
}