[Team Compaction] YCSB Tuning Report

DB options for tunning


options default Explanation
leveldb.write_buffer_size 2MB single memtable size
leveldb.max_file_size 4MB sstable size
leveldb.compression snappy compression method
leveldb.cache_size 4M cache size
leveldb.filter_bits 10 number of filter block bits
leveldb.block_size 4KB size of data block in one file
leveldb.block_restart_interval 16 number of keys between restart points(delta encoding)

Analyze workloads


img
A: Read/update ratio: 50/50
B: Read/update ratio: 95/5
D: Read/update/insert ratio: 95/0/5
Because of the high read proportion, consideration is given to ways to maximize read performance.
Of course, consider write performance and choose the best option.

Hypothesis and experiment


Considering that it is run at random, it is measured with an average of 3 times.

Hypothesis

Based on what was studied in the study, it is expected that writing performance will increase as the size of the buffer increases the amount of writing at once.
In addition, it is expected that read performance will improve if the size of the file is 1/4 of the size of the buffer, allowing four files, which are the thresholds of level0, to be read at once, and increasing the cache size.
Finally, by reducing the size of the block, increasing the number of bits in the index of the file, and increasing the number of bits in the filter block, the read performance is expected to be improved

Default

workload runtime(sec) throughput(ops/sec)
load 6.22135 16073.7
A 2.88199 34698.2
B 0.753103 132784
D 0.545697 183252

write_buffer_size 8MB

workload runtime(sec) throughput(ops/sec) -> runtime(sec) throughput(ops/sec)
load 6.22135 16073.7 -> 4.58351 21817.4
A 2.88199 34698.2 -> 2.30625 43360.4
B 0.753103 132784 -> 0.738974 135323
D 0.545697 183252 -> 0.50175 199302

As the write performance improved, the performance of load and A improved.

write_buffer_size 32MB, max_file_size 8MB

workload runtime(sec) throughput(ops/sec) -> runtime(sec) throughput(ops/sec)
load 6.22135 16073.7 -> 1.33948 74655.9
A 2.88199 34698.2 -> 1.44349 69276.5
B 0.753103 132784 -> 0.708532 141137
D 0.545697 183252 -> 0.502773 198897

The buffer size was further increased to 32MB(no performance changes after increasing more than 32MB), and the file size was 8MB, so that only Level 0 could be filled first, and it could be seen that the write performance was improved.

cacahe_size 8MB, block_size 2KB

workload runtime(sec) throughput(ops/sec) -> runtime(sec) throughput(ops/sec)
load 6.22135 16073.7 -> 1.33054 75157.5
A 2.88199 34698.2 -> 1.38317 72297.5
B 0.753103 132784 -> 0.690946 144729
D 0.545697 183252 -> 0.484127 206557

For read performance, the cache size was increased and the block size was reduced. (More cache than 8MB or less block size makes little difference in performance)
It was found that the performance of workload B and C, which had a high reading proportion, improved slightly.
Here, increasing the filter block and changing block_restart_interval does not cause any worse performance or make any difference.

Therefore, the best options

the best options
leveldb.write_buffer_size 32MB
leveldb.max_file_size 8MB
leveldb.compression snappy
leveldb.cache_size 8MB
leveldb.filter_bits 10
leveldb.block_size 2KB
leveldb.block_restart_interval 16

Conclusion and discussion


We selected the best option by properly increasing the buffer size, file size, and cache size to 32MB, 8MB, and 8MB, respectively, and the block size to half 2KB.
It was clear that writing performance improved. However, it was predicted that read performance would improve if the cache size was increased to increase the hit rate and the block size was reduced to increase the items included in the index, but read performance did not improve significantly. There may be several factors for this reason, but the unpredictable factor is believed to be the main cause because data is accessed randomly.

Using one-third of the memory as a cache is good in terms of tradeoff, which can leave a large amount of OS page cache, so it is expensive to avoid memory budgeting, but in terms of performance, it is expected that the performance will be better if the size is increased by utilizing the remaining cache well.