Write-Ahead Logging (WAL) Analysis
Overview of max_total_wal_size Option
Hypothesis
- Each column family has its own ssTable, but they share a single WAL.
- A new WAL is created whenever a column family is flushed, redirecting all writes to the new WAL.
- WALs can only be deleted once all their data is moved to ssTables, necessitating regular flushes.
- Without constraints on WAL size, deletion slows, and flushes become infrequent.
- The
max_total_wal_sizeoption triggers the deletion of the oldest live WAL file when the size exceeds a specified value, forcing a flush if live data exists. - A smaller WAL size might lead to frequent flushes, potentially degrading performance.
Experimental Design
- Independent Variable:
--max_total_wal_size=[int value] - Dependent Variables: SAF, WAF, Latency, Throughput
Command
./db_bench --benchmarks="fillseq" --max_total_wal_size=[0,1,10,100,1000,10000,100000,1000000,10000000,100000000] --num=10000000
Experiment Environment
- Operating System: macOS Monterey
- Processor: 2.3GHz 8-Core Intel Core i9
- SSD: APPLE SSD AP1024N 1TB
Results
- Initial experiments showed no significant results in the given environment.
- The default value of
max_total_wal_sizeis not zero but calculated as:plaintext [sum of all write_buffer_size * max_write_buffer_number] * 4 - This option is effective only with two or more column families.
Calculations for Column Families
- 10 Column Families:
write_buffer_size = 64 MB (Default)max_write_buffer_number = 2 (Default)-
max_total_wal_size = [10*64MB*2]*4 = 5.12GB -
15 Column Families:
write_buffer_size = 64 MB (Default)max_write_buffer_number = 2 (Default)max_total_wal_size = [15*64MB*2]*4 = 7.68GB
Experiment Results
- The results seem to reflect only the default column family analysis.
- Different results were observed when the
--num_column_families=option was not specified, indicating usage of all 10 and 15 families.
Discussion
- Performance appears to degrade as
max_total_wal_sizedecreases from its default value. - Testing with a fixed
max_total_wal_sizeof 500MB and varying column family counts (10, 15, 20, 25, 30) suggests performance degradation with more column families sharing the WAL. - The threshold for performance degradation in
max_total_wal_sizeincreases with more column families.
Future Work
- Further benchmarking with multiple column families, altering their characteristics, and monitoring flush counts could provide deeper insights.