Thursday, November 30, 2017

In-memory linkbench and a fast server: MyRocks, InnoDB and TokuDB

This post explains MySQL performance for Linkbench on a fast server. This used a low-concurrency workload to measure response time, IO and CPU efficiency. Tests were run for MyRocks, InnoDB and TokuDB. I wrote a similar report a few months ago. The difference here is that I used an updated compiler toolchain, a more recent version of MyRocks and MySQL 8.0.3. The results didn't change much from the previous blog post.
'
tl;dr:

InnoDB from MySQL 5.6 had the best throughput
CPU efficiency is similar for MyRocks and InnoDB. But to be fair, MyRocks uses ~20% more CPU than InnoDB in MySQL 5.6.35
There is a CPU regression from MySQL 5.6 to 5.7 to 8.x. About 30% of throughput is lost on load and transaction rates from 5.6.35 to 8.0.3. I assume most of this is code above the storage engine layer.
InnoDB writes more than 10X to storage per transaction compared to MyRocks. An SSD will last longer with MyRocks.
Uncompressed InnoDB uses ~1.6X more space than uncompressed MyRocks

Configuration

I used my Linkbench repo and helper scripts to run linkbench with maxid1=10M, loaders=1 and requestors=1 so there will be 2 concurrent connections doing the load and 1 connection running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and this test included that commit. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 256gb of RAM.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here but the database cache was made large enough to cache the ~10gb database.

MyRocks was compiled on October 16 with git hash 1d0132. Compression was not used.
Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058. The results for it aren't interesting here but will be interesting for IO-bound linkbench.
TokuDB was from Percona Server 5.7.17. Compression was not used.

The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL still has user & table statistics for monitoring.

Graphs

The first two graphs show the load and transaction rates relative to InnoDB from upstream MySQL 5.6.35. For this test it has the best rates for load and transactions. There is a big drop in throughput for InnoDB from 5.6.35 to 8.0.3 for both the load and transaction tests.

The chart below has the KB written to storage per transaction. The rate for InnoDB is more than 10X the rate for MyRocks. An SSD will last longer with MyRocks. The rate for MyRocks is also much better than TokuDB. The rate here for TokuDB is worse than what I measured in September and I have yet to debug it.

All engines use a similar amount of space after the load, ~15gb. But MyRocks does much better after 12 hours of transactions -- InnoDB is ~1.6X larger and TokuDB is ~1.19X larger. The problem for InnoDB is B-Tree fragmentation. The advantage for MyRocks is leveled compaction which limits garbage to ~10% of the database size.

Load Results

All of the data is here. I adjusted iostat metrics for MyRocks because iostat currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact. The table below has a subset of the results

InnoDB 5.6 has the best insert rate but there is a regression from 5.6.35 to 5.7.17 to 8.0.3. I assume most of that is from code above the storage engine.
Write efficiency (wkb/i) is similar for all engines
CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB

ips wkb/i Mcpu/i size wMB/s cpu engine

49986 0.80 98 14 40.1 4.9 MyRocks.Oct16
62224 0.98 72 15 61.1 4.5 FbInno.Jun16
63891 1.03 74 16 65.7 4.7 Inno.5635
56383 1.03 85 16 58.3 4.8 Inno.5717
55173 1.04 78 16 57.6 4.3 Inno.801
41815 1.05 103 16 44.0 4.3 Inno.802
43590 1.06 101 16 46.4 4.4 Inno.803
23664 1.34 160 14 31.7 3.8 Toku.5717

legend:

* ips - inserts/second

* wkb/i - iostat KB written per insert

* Mcpu/i - normalized CPU time per insert

* wMB/s - iostat write MB/s, average

* size - database size in GB at test end

* cpu - average value of vmstat us + sy columns

Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here. I adjusted iostat metrics to for MyRocks because iostat currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

InnoDB 5.6 has the best transaction rate but there is a regression from 5.6.35 to 5.7.17 to 8.0.3. I assume most of that is from code above the storage engine.
Write efficiency (wkb/t) was better for MyRocks. InnoDB writes more than 10X to storage per transaction compared to MyRocks.
CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
Response times are similar for MyRocks and InnoDB
Space efficiency is better for MyRocks. InnoDB is ~1.6X larger.

tps wkb/t Mcpu/t size un gn ul gl wMB/s engine

5753 0.44 677 16 0.3 0.1 0.5 0.5 2.5 MyRocks.Oct16
7065 5.11 624 23 0.3 0.1 0.4 0.3 36.1 FbInno.Jun16
7420 5.17 562 26 0.3 0.1 0.4 0.2 38.4 Inno.5635
6616 5.20 628 26 0.3 0.1 0.5 0.3 34.4 Inno.5717
6313 5.16 654 25 0.3 0.1 0.5 0.3 32.6 Inno.801
5978 5.38 682 25 0.3 0.1 0.6 0.3 32.2 Inno.802
6070 5.39 669 25 0.3 0.1 0.6 0.3 32.7 Inno.803
4234 2.92 814 19 0.5 0.2 1 0.6 12.4 Toku.5717

legend:

* tps - transactions/second

* wkb/t - iostat KB written per transaction

* Mcpu/t - normalized CPU time per transaction

* size - database size in GB at test end

* un, gn, ul, gl - 99th percentile response time in millisecs for
UpdateNode, GetNode, UpdateList and GetLinkedList transactions

* wMB/s - iostat write MB/s, average

Wednesday, November 29, 2017

Sysbench, IO-bound, small server: MyRocks and InnoDB

In this post I compare MyRocks and InnoDB using IO-bound sysbench and a small server. The goal is to understand where MyRocks differs from InnoDB. I previously published more results for many versions of MyRocks and InnoDB. Here I use MyRocks from June 2017 and InnoDB from upstream 5.6.35 and 5.7.17.

tl;dr

There is more variance in QPS on IO-bound sysbench than on in-memory sysbench
Not much QPS is lost when compression is used with MyRocks
Two things look better in MySQL 5.7 -- InnoDB range scans and optimization of queries with large in-lists.
For many of the workloads InnoDB writes between 5X and 20X more to storage per transaction. An SSD will last longer with MyRocks.
Full-scan perf from MyRocks without compression matches InnoDB-5.7 and is much better than InnoDB-5.6 when filesystem readahead is enabled. Now we need to make that feature work for real.
MyRocks QPS was >= than InnoDB on most of the write-heavy tests. Read-free index maintenance makes a big difference for MyRocks on some of them.
InnoDB QPS was >= MyRocks on most of the range-scan tests (read-only, read-write)
InnoDB QPS was > MyRocks on the point-query tests. The 5.7 optimizer might help here.
InnoDB QPS was > MyRocks on most of the inlist-query tests

Configuration

The tests used MyRocks from FB MySQL which is currently based on MySQL 5.6.35. The build is from June 16 with git hash 52e058 for FB MySQL and 7e5fac for RocksDB. Upstream 5.6.35 and 5.7.17 was used for InnoDB.

All tests used jemalloc with mysqld. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log.

Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is much larger than RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller. The SSD on the i3 NUC is slower than on the i5 NUC. InnoDB is more dependent than MyRocks on IO performance and in some tests below MyRocks does much better on the i3 NUC than the i5 NUC relative to InnoDB.

Tests are repeated for MyRocks without and with compression. The compression configuration is none for L0/L1/L2 and then with LZ4 for the middle levels of the LSM tree and then zstandard for the max level. In the rest of this post that is described as zstandard compression.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The graphs have the relative QPS where that is the QPS for a configuration relative to the base case. The base case is InnoDB from upstream 5.6.35. The base case is faster when the relative QPS is less than 1. The tables that follow have the absolute and relative QPS.
Graphs

The graphs have the QPS relative to the InnoDB from MySQL 5.6.35. The name MyRocks.none is for MyRocks without compression and MyRocks.zstd is for MyRocks with zstandard compression.

There are 4 types of tests and graphs for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so there are more graphs here. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are the relative QPS for update-index, update-nonindex and insert. MyRocks does 3X to 10X better than InnoDB on update-index because non-unique secondary index maintenance is read-free for it.

The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are the relative QPS for read-write with range-size=100, read-only with range-size=10,000 and then full-scan. The results for read-only and full-scan are for the test run after the write-heavy tests. The results for full-scan include two extra configurations that both enable filesystem readahead during the scan: MyRocks.none.ra, MyRocks.zstd.ra.

The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is the relative QPS for the test run after the write-heavy tests.

The inlist-query group includes the hot-points test and the random-points tests. The random-points result is from the test run after the write-heavy tests. The graph is the relative QPS.

full-scan

The sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for InnoDB from MySQL 5.6.35 (InnoDB-5.6). Values are provided for the i3 and i5 NUC.

For full-scan results are provided for two extra configurations that both enable filesystem readahead during the scan: MyRocks.none.ra, MyRocks.zstd.ra. As seen below readahead is great for full-scan. Unfortunately that feature is not ready for production. I don't know if filesystem readahead is the right solution in this case, but it was easy to use for a benchmark. For a range scan there is an iterator open on each level of the LSM tree and RocksDB does page at a time reads. For a long range scan those reads can be much larger than page at a time. Issue 723 is open for this.

The full scan of the PK index is done before and after the write-heavy tests. Note that tests run before the write-heavy tests are still run immediately after the initial load. So in both cases there can be page write-back activity with InnoDB and compaction with MyRocks. But from data I haven't shared there were not writes in progress for the i5 NUC.

Summary of full-scan throughput:

The best perf is from MyRocks without compression when filesystem readahead is used
InnoDB-5.7 is better than InnoDB-5.6 on the i5 NUC but worse than it on the i3 NUC

Legend:
* Mrps - scan rate in millions of rows per second
* ratio - ratio of Mrps for this engine vs InnoDB-5.6

before write-heavy
i3 NUC i5 NUC
Mrps ratio Mrps ratio engine
1.019 1.00 1.766 1.00 InnoDB-5.6
0.320 0.31 2.424 1.37 InnoDB-5.7
0.879 0.86 1.194 0.68 MyRocks.none
1.927 1.89 2.318 1.31 MyRocks.none.ra
0.714 0.70 0.860 0.49 MyRocks.zstd
1.006 0.99 1.280 0.72 MyRocks.zstd.ra

after write-heavy
i3 NUC i5 NUC
Mrps ratio Mrps ratio engine
0.914 1.00 1.786 1.00 InnoDB-5.6
0.829 0.91 2.406 1.35 InnoDB-5.7
0.610 0.67 1.126 0.63 MyRocks.none
0.969 1.06 2.133 1.19 MyRocks.none.ra
0.477 0.52 0.816 0.46 MyRocks.zstd
0.963 1.05 1.212 0.68 MyRocks.zstd.ra

Data from iostat and vmstat helps to understand the performance differences and the benefit from readahead for MyRocks. We need to make that work for real. On the i5 NUC:

InnoDB-5.6 and MyRocks use ~2X more CPU per row than InnoDB-5.7
InnoDB-5.7 gets more IO throughput: ~2.1X more vs MyRocks, ~1.4X more vs InnoDB-5.6
Filesystem readahead reduces the CPU overhead and increases the IO read rate for MyRocks

Legend:
* CPU.avg - average CPU utilization
* CPU/row - cost per row scanned
* rGB - GB read from storage
* rMB/s - average read IO rate from storage

before write-heavy
CPU.avg CPU/row rGB rMB/s engine
28.0 48.96 57 219.6 MyRocks.none
50.5 22.73 58 437.6 MyRocks.none.ra
45.4 53.91 59 337.4 InnoDB-5.6
27.7 24.26 59 460.0 InnoDB-5.7

after write-heavy
CPU.avg CPU/row rGB rMB/s engine
27.2 50.30 60 219.1 MyRocks.none
49.4 24.06 62 427.2 MyRocks.none.ra
46.0 54.01 59 338.8 InnoDB-5.6
29.0 25.36 59 457.6 InnoDB-5.7

update-inlist

MyRocks does better than InnoDB and the difference here is larger than for update-nonindex. Both tests do updates that don't require secondary index maintenance but this test updates 100 rows/statement versus 1/statement for update-nonindex. The relative time above the storage engine is larger here, and a more efficient engine has less of an impact here. On the i5 NUC:

CPU/update is ~1.5X larger for InnoDB-5.6 and ~1.1X larger for InnoDB-5.7 than MyRocks
KB written to storage per update is ~10X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
212 1.00 307 1.00 InnoDB-5.6
213 1.00 392 1.28 InnoDB-5.7
445 2.10 430 1.40 MyRocks.none
363 1.71 458 1.49 MyRocks.zstd

update-one

InnoDB does better than MyRocks. This test updates the same row repeatedly. There are no reads from storage from processing the update, but MyRocks compaction might read from storage. The update statement doesn't require secondary index maintenance. On the i5 NUC:

CPU/update is ~1.1X larger for InnoDB-5.7 and MyRocks than InnoDB-5.6
KB written to storage per update is similar for MyRocks and InnoDB

i3 NUC i5 NUC
QPS ratio QPS ratio engine
9120 1.00 10988 1.00 InnoDB-5.6
7839 0.86 9570 0.87 InnoDB-5.7
7656 0.84 8508 0.77 MyRocks.none
7774 0.85 8749 0.80 MyRocks.zstd

update-index

MyRocks is 3X to 10X faster than InnoDB because secondary index maintenance is read-free. On the i5 NUC:

CPU/update is ~4X larger for InnoDB than MyRocks
KB written to storage per update is ~30X larger for InnoDB than MyRocks
Storage read operations per update is ~2X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
220 1.00 850 1.00 InnoDB-5.6
312 1.42 924 1.09 InnoDB-5.7
2477 11.26 3004 3.53 MyRocks.none
2296 10.44 2778 3.27 MyRocks.zstd

update-nonindex

MyRocks and InnoDB have similar performance here. Secondary index maintenance isn't done for this test, so MyRocks doesn't gain from read-free index maintenance. See the comment for update-inlist. On the i5 NUC:

Storage read operations per update is ~0.9 for InnoDB and MyRocks
CPU/update is similar for InnoDB and MyRocks
KB written to storage per update is ~5X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
1526 1.00 2683 1.00 InnoDB-5.6
1556 1.02 2773 1.03 InnoDB-5.7
1391 0.91 2888 1.08 MyRocks.none
2237 1.47 2715 1.01 MyRocks.zstd

delete

MyRocks does better. On the i5 NUC:

CPU/delete is ~2X larger for InnoDB than MyRocks
KB written to storage per delete is ~13X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2729 1.00 5068 1.00 InnoDB-5.6
2763 1.01 5115 1.01 InnoDB-5.7
7666 2.81 8790 1.73 MyRocks.none
7281 2.67 8268 1.63 MyRocks.zstd

read-write with range-size=100

MyRocks is slightly better than InnoDB. I was surprised by this and my guess is that MyRocks efficiency on insert/update/delete is larger than the InnoDB efficiency on range scans. Much of the write IO here can be for rows modified by the previous tests. On the i5 NUC:

CPU/query is ~1.2X larger for InnoDB than MyRocks
KB written to storage per query is ~20X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2028 1.00 2799 1.00 InnoDB-5.6
2208 1.09 3077 1.10 InnoDB-5.7
2565 1.26 3126 1.12 MyRocks.none
2517 1.24 2901 1.04 MyRocks.zstd

read-write with range-size=10000

InnoDB-5.7 is the best here. Unlike the previous test, tt is faster than MyRocks because the range scan here is longer (10,000 vs 100 rows). Someone improved range scan performance for InnoDB in 5.7. On the i5 NUC:

CPU/query is ~2X larger for MyRocks and InnoDB-5.6 than InnoDB-5.7
KB written to storage per query is ~12X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
169 1.00 217 1.00 InnoDB-5.6
257 1.52 335 1.54 InnoDB-5.7
146 0.86 202 0.93 MyRocks.none
132 0.78 163 0.75 MyRocks.zstd

read-only with range-size=100

InnoDB is faster here while MyRocks did better on read-write with range-size=100. I assume this is explained by MyRocks benefiting from faster insert, update, delete on the read-write test. InnoDB in 5.7 continues to benefit from improvements to range-scan performance. On the i5 NUC:

CPU/query was ~1.4X larger for InnoDB-5.6 and ~1.8X larger for MyRocks than InnoDB-5.7

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2885 1.00 3687 1.00 InnoDB-5.6
3272 1.13 4954 1.34 InnoDB-5.7
2464 0.85 3011 0.82 MyRocks.none
2406 0.83 2779 0.75 MyRocks.zstd

read-only.pre with range-size=10000

InnoDB in 5.7 continues to benefit from improvements to range-scan performance. MyRocks.none did better than I expected for this test, perhaps because it is run before the write-heavy tests. On the i5 NUC:

CPU/query was ~1.7X larger for InnoDB-5.6 and ~1.5X larger for MyRocks than InnoDB-5.7
Storage read operations and read KB per query were ~1.2X larger for InnoDB than MyRocks

i3 NUC i5 NUC
QPS ratio QPS ratio engine
139 1.00 173 1.00 InnoDB-5.6
212 1.53 273 1.58 InnoDB-5.7
137 0.99 174 1.01 MyRocks.none
112 0.81 138 0.80 MyRocks.zstd

read-only with range-size=100000

InnoDB in MySQL 5.7 continues to have the best range-scan performance. The MyRocks QPS here is less compared to the same test from the previous section. The tests in the previous section are run before write-heavy tests while tests here are run after them. It costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM. On the i5 NUC:

CPU/query is 1.06X larger here for MyRocks.none compared to the previous test.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
140 1.00 178 1.00 InnoDB-5.6
210 1.50 275 1.54 InnoDB-5.7
102 0.73 166 0.93 MyRocks.none
105 0.75 132 0.74 MyRocks.zstd

point-query.pre

InnoDB continues to get the best QPS but the difference between MySQL 5.6 and 5.7 is smaller than it was for range queries. On the i5 NUC:

CPU/query is ~1.2X larger for InnoDB-5.6 and ~1.4X larger for MyRocks than InnoDB-5.7
Storage read operations per query are ~1.06 larger for InnoDB than MyRocks.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
3879 1.00 5250 1.00 InnoDB-5.6
4264 1.10 6198 1.18 InnoDB-5.7
4177 1.08 4370 0.83 MyRocks.none
3354 0.86 3969 0.76 MyRocks.zstd

point-query

InnoDB continues to get the best QPS but the difference between MySQL 5.6 and 5.7 is smaller than it was for range queries. The MyRocks QPS here is less compared to the same test from the previous section, which is expected for read-heavy tests that follow write-heavy tests. On the i5 NUC the iostat and vmstat metrics are similar to the result above for point-query.pre with two exceptions:

CPU/query is ~1.5X larger for MyRocks than InnoDB-5.7. This is larger than above.
Storage read operations/query are similar rather than 6% larger for InnoDB

i3 NUC i5 NUC
QPS ratio QPS ratio engine
3896 1.00 5310 1.00 InnoDB-5.6
4332 1.11 6155 1.16 InnoDB-5.7
2361 0.61 3966 0.75 MyRocks.none
2742 0.70 3707 0.70 MyRocks.zstd

random-points.pre

InnoDB-5.7 does the best. It is odd that MyRocks does better than InnoDB-5.6 here but not above on point-query. Both do point lookups but this test does 100 lookups per SELECT while the point-query tests do 1 per SELECT. On the i5 NUC:

CPU/query is ~4.8X larger for InnoDB-5.6 and ~3.4X larger for MyRocks than InnoDB-5.7. I wonder if a change in the optimizer explains this because this looks like a difference between MySQL 5.6 and 5.7 rather than between engines.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
56 1.00 50 1.00 InnoDB-5.6
69 1.23 105 2.10 InnoDB-5.7
65 1.16 57 1.14 MyRocks.none
50 0.89 61 1.22 MyRocks.zstd

random-points

Results are similar to random-points.pre: InnoDB-5.7 does the best and MySQL 5.6 uses more CPU/query than 5.7. On the i5 NUC:

CPU/query is ~3.5X larger for InnoDB-5.6 and ~2.6X larger for MyRocks than InnoDB-5.7. See the comment in the previous section.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
56 1.00 64 1.00 InnoDB-5.6
51 0.91 100 1.56 InnoDB-5.7
36 0.64 62 0.97 MyRocks.none
41 0.73 57 0.89 MyRocks.zstd

hot-points

InnoDB gets at least 3X more QPS than MyRocks. This test is always in-memory and the QPS here is similar to the QPS from in-memory sysbench. On the i5 NUC:

CPU/query is ~3X larger for MyRocks than InnoDB.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
3609 1.00 4167 1.00 InnoDB-5.6
3455 0.96 3771 0.90 InnoDB-5.7
1000 0.28 1125 0.27 MyRocks.none
1117 0.31 1088 0.26 MyRocks.zstd

insert

MyRocks gets a higher insert rate. On the i5 NUC:

Storage read operations per insert are 6X to 8X larger for InnoDB than MyRocks
KB written to storage per insert are ~3X larger for InnoDB than MyRocks
CPU/insert is ~1.1X larger for InnoDB-5.7 and MyRocks than InnoDB-5.6

i3 NUC i5 NUC
QPS ratio QPS ratio engine
5097 1.00 7991 1.00 InnoDB-5.6
5730 1.12 7234 0.91 InnoDB-5.7
7867 1.54 8652 1.08 MyRocks.none
7828 1.54 8299 1.04 MyRocks.zstd

Tuesday, November 28, 2017

Marketing and the Dunning-Kruger effect

I am wary of user reports that claim product X was lousy for them, then they moved to product Y and everything was awesome. Sometimes this means that product X was lousy -- in general or for their use case. Other times it means the team using product X did a lousy job deploying it. It is hard for the reader to figure this out. It can also be hard for some authors to figure this out thanks to the Dunning-Kruger effect so lousy reports will continue to be published. These reports are not my favorite form of marketing and some of the bad ones linger for years. We deserve better especially in the open-source database market where remarkable progress is being made.

I have written before on benchmarketing. Other posts that mention it are here.

Tuesday, November 21, 2017

Sysbench, IO-bound, small server: MyRocks over time

In this post I compare four MyRocks releases from February to October using IO-bound sysbench and a small server. The goal is to understand where we have made MyRocks faster and slower this year. I previously shared results for in-memory sysbench with MyRocks and IO-bound sysbench with InnoDB. Tests were done for builds of MyRocks from February 10, April 14, June 16, August 15 and October 16.

tl;dr

There is more variance in QPS on IO-bound sysbench than on in-memory sysbench. I didn't try to determine how much of that is caused by storage devices and how much by MyRocks.
Not much QPS is lost when compression is used
A typical result is a loss of 10% of QPS from February 10 to October 16
Full-scan might have lost 15% of throughput from February 10 to October 16
Full-scan throughput is between 1.2X and 1.6X better when filesystem readahead is enabled
Some read-heavy tests run after write-heavy tests lose more QPS in October 16 than February 10 when compared to the same test run before write-heavy tests. This was also seen on in-memory sysbench.

Configuration

The tests used MyRocks from FB MySQL which is currently based on 5.6.35. Builds were done using FB MySQL as of February 10, April 14, June 16, August 15 and October 16. The git hashes for these builds are:

February 10 - FB MySQL f3019b, RocksDB c2ca7a
April 14 - FB MySQL e28823, RocksDB 9300ef
June 16 - FB MySQL 52e058, RocksDB 7e5fac
August 15 - FB MySQL 0d76ae, RocksDB 50a969
October 16 - FB MySQL 1d0132, RocksDB 019aa7

All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines but there are a few new & changed options in that time. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. Compression was not used.

Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is much larger than RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller.

Tests are repeated for MyRocks without compression and then with LZ4 for the middle levels of the LSM tree and zstandard for the max level.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The graphs have the relative QPS where that is the QPS for a configuration relative to the base case. The base case is the QPS for the Feb10 build without compression. When the relative QPS is less than 1 then the base case is faster. The tables that follow have the absolute and relative QPS. The tests are explained here.

Graphs

The graphs have the QPS relative to the Feb10 build without compression. i3-none and i5-none are results for the i3 and i5 NUCs without compression. i3-zstd and i5-zstd are results for the i3 and i5 NUCs with zstandard compression.

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so I provide extra graphs here. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are for update-nonindex and update-index. To keep this from getting out of hand I save the analysis for the per-test sections.

For write-heavy most of the results have a relative QPS of ~0.9 on the Oct16 builds that don't use compression. There is more variance on the i3 NUC as seen below for i3-none.

The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are for read-write with range-size=100 and read-only with range-size=10,000. The largest regression comes after Feb10 or Apr14. From the graphs below the QPS decrease was larger on the i3 NUC.

The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests. The largest regression comes after Apr14. The Oct16 builds without compression have a relative QPS of ~0.9.

The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests.

full-scan

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for feb10.none. Values are provided for the i3 and i5 NUC.

The full scan of the PK index is done before and after the write-heavy tests. There is a regression on full scan throughput for the i5 NUC without compression. Otherwise there is a lot of variance.

QPS in the Oct16 build relative to Feb10:

For the i3 NUC gets better for the before and worse for the after write-heavy tests
For the i5 NUC gets worse for both the before and after write-heavy tests. The reduction for the after write-heavy tests in oct16.none on both the i3 and i5 NUC might be worth debugging as it is ~15%.

I repeated the Jun16 test with an option to make filesystem readahead more likely and that increased throughput by between 1.2X and 1.6X - see jun16.none.ra and jun16.zstd.ra. This option, rocksdb_advise_random_on_open=0, isn't safe to set for general purpose workloads.

before write-heavy
i3 NUC i5 NUC
Mrps ratio Mrps ratio engine
0.796 1.00 1.454 1.00 feb10.none
1.019 1.39 1.409 0.97 apr14.none
0.879 1.10 1.194 0.82 jun16.none
1.927 2.42 2.318 1.59 jun16.none.ra
0.860 1.08 1.198 0.82 aug15.none
0.898 1.13 1.230 0.85 oct16.none
-
0.714 0.90 0.916 0.63 feb10.zstd
0.761 0.96 0.930 0.64 apr14.zstd
0.714 0.90 0.860 0.59 jun16.zstd
1.006 1.26 1.280 0.88 jun16.zstd.ra
0.737 0.93 0.833 0.57 aug15.zstd
0.747 0.94 0.876 0.60 oct16.zstd

after write-heavy
i3 NUC i5 NUC
Mrps ratio Mrps ratio engine
0.698 1.00 1.327 1.00 feb10.none
0.758 1.09 1.280 0.96 apr14.none
0.610 0.87 1.126 0.85 jun16.none
0.969 1.39 2.133 1.61 jun16.none.ra
0.620 0.89 1.081 0.81 aug15.none
0.597 0.86 1.134 0.85 oct16.none
-
0.653 0.94 0.886 0.67 feb10.zstd
0.575 0.82 0.881 0.66 apr14.zstd
0.477 0.68 0.816 0.61 jun16.zstd
0.963 1.38 1.212 0.91 jun16.zstd.ra
0.522 0.75 0.804 0.61 aug15.zstd
0.522 0.75 0.814 0.61 oct16.zstd

update-inlist

QPS in the Oct16 build relative to Feb10:

For the i3 NUC is better
For the i5 NUC is unchanged for oct16.none and better for oct16.zstd

i3 NUC i5 NUC
QPS ratio QPS ratio engine
375 1.00 403 1.00 feb10.none
477 1.27 492 1.22 apr14.none
445 1.19 430 1.07 jun16.none
449 1.20 488 1.21 aug15.none
455 1.21 405 1.00 oct16.none
-
344 0.92 443 1.10 feb10.zstd
374 1.00 466 1.16 apr14.zstd
363 0.97 458 1.14 jun16.zstd
376 1.00 437 1.08 aug15.zstd
372 0.99 463 1.15 oct16.zstd

update-one

QPS in the Oct16 build relative to Feb10 is worse in all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
8514 1.00 9287 1.00 feb10.none
7854 0.92 8972 0.97 apr14.none
7656 0.90 8508 0.92 jun16.none
7470 0.88 8377 0.90 aug15.none
7823 0.92 8655 0.93 oct16.none
-
8280 0.97 9180 0.99 feb10.zstd
7884 0.93 9270 1.00 apr14.zstd
7774 0.91 8749 0.94 jun16.zatd
7596 0.89 8517 0.92 aug15.zstd
7704 0.90 8512 0.92 oct16.zstd

update-index

QPS in the Oct16 build relative to Feb10 is slightly worse for oct16.none and the same or better for oct16.zstd.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2515 1.00 3057 1.00 feb10.none
1570 0.62 3084 1.01 apr14.none
2477 0.98 3004 0.98 jun16.none
2460 0.98 3008 0.98 aug15.none
2411 0.96 3038 0.99 oct16.none
-
2295 0.91 2704 0.88 feb10.zstd
2279 0.91 2787 0.91 apr14.zstd
2296 0.91 2778 0.91 jun16.zstd
2242 0.89 2779 0.91 aug15.zstd
2294 0.91 2799 0.92 oct16.zstd

update-nonindex

QPS in the Oct16 build relative to Feb10 is worse for oct16.none and better for oct16.zstd.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2393 1.00 2987 1.00 feb10.none
2265 0.95 3115 1.04 apr14.none
1391 0.58 2888 0.97 jun16.none
1403 0.59 2893 0.97 aug15.none
1445 0.60 2938 0.98 oct16.none
-
2257 0.94 2562 0.86 feb10.zstd
2279 0.95 2839 0.95 apr14.zstd
2237 0.98 2715 0.91 jun16.zstd
2266 0.95 2680 0.90 aug15.zstd
2265 0.95 2725 0.91 oct16.zstd

delete

QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
7924 1.00 9076 1.00 feb10.none
7810 0.99 9602 1.06 apr14.none
7666 0.97 8790 0.97 jun16.none
7566 0.95 8806 0.97 aug15.none
7505 0.95 8802 0.97 oct16.none
-
7373 0.93 8079 0.89 feb10.zstd
7222 0.91 9002 0.99 apr14.zstd
7281 0.92 8268 0.91 jun16.zstd
6955 0.88 8313 0.92 aug15.zstd
7000 0.88 8397 0.93 oct16.zstd

read-write with range-size=100

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2992 1.00 3360 1.00 feb10.none
2831 0.95 3316 0.99 apr14.none
2565 0.86 3126 0.93 jun16.none
2608 0.87 3092 0.92 aug15.none
2595 0.87 3105 0.92 oct16.none
-
2543 0.85 2988 0.89 feb10.zstd
2572 0.86 3008 0.90 apr14.zstd
2517 0.84 2901 0.86 jun16.zstd
2472 0.83 2780 0.83 aug15.zstd
2514 0.84 2887 0.86 oct16.zstd

read-write with range-size=10000

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
168 1.00 226 1.00 feb10.none
163 0.97 223 0.99 apr14.none
146 0.87 202 0.89 jun16.none
147 0.88 205 0.91 aug15.none
149 0.89 202 0.89 oct16.none
-
142 0.85 175 0.77 feb10.zstd
134 0.80 170 0.75 apr14.zstd
132 0.79 163 0.72 jun16.zstd
132 0.79 161 0.71 aug15.zstd
136 0.81 163 0.72 oct16.zstd

read-only with range-size=100

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2866 1.00 3257 1.00 feb10.none
2677 0.93 3137 0.96 apr14.none
2464 0.86 3011 0.92 jun16.none
2528 0.88 3069 0.94 aug15.none
2531 0.88 3011 0.92 oct16.none
-
2569 0.90 3142 0.96 feb10.zstd
2581 0.90 3003 0.92 apr14.zstd
2406 0.84 2779 0.85 jun16.zstd
2419 0.84 2777 0.85 aug15.zstd
2476 0.86 2819 0.87 oct16.zstd

read-only.pre with range-size=10000

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
150 1.00 189 1.00 feb10.none
150 1.00 195 1.03 apr14.none
137 0.91 174 0.92 jun16.none
137 0.91 176 0.93 aug15.none
136 0.91 173 0.92 oct16.none
-
118 0.79 145 0.77 feb10.zstd
117 0.78 143 0.76 apr14.zstd
112 0.75 138 0.73 jun16.zstd
112 0.75 136 0.72 aug15.zstd
114 0.76 139 0.74 oct16.zstd

read-only with range-size=100000

QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

The QPS here is less compared to the same test from the previous section. The tests in the previous section are run before write-heavy tests while tests here are run after them. It costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM.

The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
129 1.00 184 1.00 feb10.none
102 0.79 181 0.98 apr14.none
102 0.79 166 0.90 jun16.none
95 0.74 166 0.90 aug15.none
101 0.78 164 0.89 oct16.none
-
101 0.78 142 0.77 feb10.zstd
108 0.84 138 0.75 apr14.zstd
105 0.81 132 0.72 jun16.zstd
104 0.81 130 0.71 aug15.zstd
107 0.83 132 0.72 oct16.zstd

point-query.pre

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
4435 1.00 4900 1.00 feb10.none
4596 1.04 4994 1.02 apr14.none
4177 0.94 4370 0.89 jun16.none
4137 0.93 4494 0.92 aug15.none
4226 0.95 4438 0.91 oct16.none
-
3422 0.77 4370 0.89 feb10.zstd
3439 0.78 4325 0.88 apr14.zstd
3354 0.76 3969 0.81 jun16.zstd
3293 0.74 3992 0.81 aug15.zstd
3305 0.75 3962 0.81 oct16.zstd

point-query

QPS in the Oct16 build relative to Feb10 is worse for all cases.

The QPS here is less compared to the same test from the previous section, which is expected for read-heavy tests that follow write-heavy tests. But the decrease is huge for the i3 NUC. I didn't debug that.

The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2735 1.00 4420 1.00 feb10.none
2858 1.04 4261 0.96 apr14.none
2361 0.86 3966 0.90 jun16.none
2452 0.90 3995 0.90 aug15.none
2346 0.86 4022 0.91 oct16.none
-
2764 1.01 4117 0.93 feb10.zstd
2638 0.96 3958 0.90 apr14.zstd
2742 1.00 3707 0.84 jun16.zstd
2667 0.98 3721 0.84 aug15.zstd
2628 0.96 3731 0.84 oct16.zstd

random-points.pre

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
68 1.00 70 1.00 feb10.none
73 1.07 65 0.93 apr14.none
65 0.96 57 0.81 jun16.none
65 0.96 65 0.93 aug15.none
64 0.94 54 0.77 oct16.none
-
52 0.76 65 0.93 feb10.zstd
52 0.76 65 0.93 apr14.zstd
50 0.74 61 0.87 jun16.zstd
50 0.74 60 0.86 aug15.zstd
50 0.74 61 0.87 oct16.zstd

random-points

QPS in the Oct16 build relative to Feb10 is worse for all cases. What I wrote in the point-query section is mostly true here, especially the part about QPS being worse for the test run after write-heavy tests.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
50 1.00 56 1.00 feb10.none
44 0.88 54 0.96 apr14.none
36 0.72 62 1.11 jun16.none
40 0.80 63 1.13 aug15.none
40 0.80 50 0.89 oct16.none
-
43 0.86 62 1.11 feb10.zstd
44 0.88 62 1.11 apr14.zstd
41 0.82 57 1.02 jun16.zstd
40 0.80 55 0.98 aug15.zstd
37 0.74 57 1.02 oct16.zstd

hot-points

While this is an IO-bound benchmark the hot-points test is always in-memory. But the results here have more variance than on in-memory sysbench. I didn't debug that.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
1437 1.00 1327 1.00 feb10.none
1263 0.88 1456 1.10 apr14.none
1000 0.70 1125 0.85 jun16.none
1162 0.81 1307 0.98 aug15.none
1288 0.90 1339 1.01 oct16.none
-
1311 0.91 1417 1.07 feb10.zstd
1399 0.97 1450 1.09 apr14.zstd
1117 0.78 1088 0.82 jun16.zstd
1139 0.79 1391 1.05 aug15.zstd
1310 0.91 1378 1.04 oct16.zstd

insert

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
8056 1.00 8654 1.00 feb10.none
8233 1.02 9403 1.09 apr14.none
7867 0.98 8652 1.00 jun16.none
7930 0.98 8864 1.02 aug15.none
7398 0.92 8236 0.95 oct16.none
-
7922 0.98 8540 0.99 feb10.zstd
8386 1.04 8981 1.04 apr14.zstd
7828 0.97 8299 0.96 jun16.zstd
7637 0.95 8538 0.99 aug15.zstd
6194 0.77 8075 0.93 oct16.zstd

Small Datum