Thursday, November 30, 2017

In-memory linkbench and a fast server: MyRocks, InnoDB and TokuDB

This post explains MySQL performance for Linkbench on a fast server. This used a low-concurrency workload to measure response time, IO and CPU efficiency. Tests were run for MyRocks, InnoDB and TokuDB. I wrote a similar report a few months ago. The difference here is that I used an updated compiler toolchain, a more recent version of MyRocks and MySQL 8.0.3. The results didn't change much from the previous blog post.
'
tl;dr:
  • InnoDB from MySQL 5.6 had the best throughput
  • CPU efficiency is similar for MyRocks and InnoDB. But to be fair, MyRocks uses ~20% more CPU than InnoDB in MySQL 5.6.35
  • There is a CPU regression from MySQL 5.6 to 5.7 to 8.x. About 30% of throughput is lost on load and transaction rates from 5.6.35 to 8.0.3. I assume most of this is code above the storage engine layer.
  • InnoDB writes more than 10X to storage per transaction compared to MyRocks. An SSD will last longer with MyRocks. 
  • Uncompressed InnoDB uses ~1.6X more space than uncompressed MyRocks

Configuration

I used my Linkbench repo and helper scripts to run linkbench with maxid1=10M, loaders=1 and requestors=1 so there will be 2 concurrent connections doing the load and 1 connection running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and this test included that commit. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 256gb of RAM.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here but the database cache was made large enough to cache the ~10gb database.
  • MyRocks was compiled on October 16 with git hash 1d0132. Compression was not used.
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058. The results for it aren't interesting here but will be interesting for IO-bound linkbench.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL still has user & table statistics for monitoring.

Graphs

The first two graphs show the load and transaction rates relative to InnoDB from upstream MySQL 5.6.35. For this test it has the best rates for load and transactions. There is a big drop in throughput for InnoDB from 5.6.35 to 8.0.3 for both the load and transaction tests.
The chart below has the KB written to storage per transaction. The rate for InnoDB is more than 10X the rate for MyRocks. An SSD will last longer with MyRocks. The rate for MyRocks is also much better than TokuDB. The rate here for TokuDB is worse than what I measured in September and I have yet to debug it.
All engines use a similar amount of space after the load, ~15gb. But MyRocks does much better after 12 hours of transactions -- InnoDB is ~1.6X larger and TokuDB is ~1.19X larger. The problem for InnoDB is B-Tree fragmentation. The advantage for MyRocks is leveled compaction which limits garbage to ~10% of the database size.

Load Results

All of the data is here. I adjusted iostat metrics for MyRocks because iostat currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact. The table below has a subset of the results
  • InnoDB 5.6 has the best insert rate but there is a regression from 5.6.35 to 5.7.17 to 8.0.3. I assume most of that is from code above the storage engine.
  • Write efficiency (wkb/i) is similar for all engines
  • CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB

ips     wkb/i   Mcpu/i  size    wMB/s   cpu     engine
 49986  0.80     98     14      40.1     4.9    MyRocks.Oct16
 62224  0.98     72     15      61.1     4.5    FbInno.Jun16
 63891  1.03     74     16      65.7     4.7    Inno.5635
 56383  1.03     85     16      58.3     4.8    Inno.5717
 55173  1.04     78     16      57.6     4.3    Inno.801
 41815  1.05    103     16      44.0     4.3    Inno.802
 43590  1.06    101     16      46.4     4.4    Inno.803
 23664  1.34    160     14      31.7     3.8    Toku.5717

legend:
* ips - inserts/second
* wkb/i - iostat KB written per insert
* Mcpu/i - normalized CPU time per insert
* wMB/s - iostat write MB/s, average
* size - database size in GB at test end
* cpu - average value of vmstat us + sy columns

Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here. I adjusted iostat metrics to for MyRocks because iostat currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact. 
  • InnoDB 5.6 has the best transaction rate but there is a regression from 5.6.35 to 5.7.17 to 8.0.3. I assume most of that is from code above the storage engine.
  • Write efficiency (wkb/t) was better for MyRocks. InnoDB writes more than 10X to storage per transaction compared to MyRocks.
  • CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
  • Response times are similar for MyRocks and InnoDB
  • Space efficiency is better for MyRocks. InnoDB is ~1.6X larger.

tps   wkb/t  Mcpu/t  size  un   gn   ul   gl    wMB/s  engine
5753  0.44   677     16    0.3  0.1  0.5  0.5    2.5   MyRocks.Oct16
7065  5.11   624     23    0.3  0.1  0.4  0.3   36.1   FbInno.Jun16
7420  5.17   562     26    0.3  0.1  0.4  0.2   38.4   Inno.5635
6616  5.20   628     26    0.3  0.1  0.5  0.3   34.4   Inno.5717
6313  5.16   654     25    0.3  0.1  0.5  0.3   32.6   Inno.801
5978  5.38   682     25    0.3  0.1  0.6  0.3   32.2   Inno.802
6070  5.39   669     25    0.3  0.1  0.6  0.3   32.7   Inno.803
4234  2.92   814     19    0.5  0.2  1    0.6   12.4   Toku.5717

legend:
* tps - transactions/second
* wkb/t - iostat KB written per transaction
* Mcpu/t - normalized CPU time per transaction
* size - database size in GB at test end
* un, gn, ul, gl - 99th percentile response time in millisecs for
      UpdateNode, GetNode, UpdateList and GetLinkedList transactions
* wMB/s - iostat write MB/s, average

Wednesday, November 29, 2017

Sysbench, IO-bound, small server: MyRocks and InnoDB

In this post I compare MyRocks and InnoDB using IO-bound sysbench and a small server. The goal is to understand where MyRocks differs from InnoDB.  I previously published more results for many versions of MyRocks and InnoDB. Here I use MyRocks from June 2017 and InnoDB from upstream 5.6.35 and 5.7.17.

tl;dr
  • There is more variance in QPS on IO-bound sysbench than on in-memory sysbench
  • Not much QPS is lost when compression is used with MyRocks
  • Two things look better in MySQL 5.7 -- InnoDB range scans and optimization of queries with large in-lists.
  • For many of the workloads InnoDB writes between 5X and 20X more to storage per transaction. An SSD will last longer with MyRocks.
  • Full-scan perf from MyRocks without compression matches InnoDB-5.7 and is much better than InnoDB-5.6 when filesystem readahead is enabled. Now we need to make that feature work for real.
  • MyRocks QPS was >= than InnoDB on most of the write-heavy tests. Read-free index maintenance makes a big difference for MyRocks on some of them.
  • InnoDB QPS was >= MyRocks on most of the range-scan tests (read-only, read-write)
  • InnoDB QPS was > MyRocks on the point-query tests. The 5.7 optimizer might help here.
  • InnoDB QPS was > MyRocks on most of the inlist-query tests

Configuration

The tests used MyRocks from FB MySQL which is currently based on MySQL 5.6.35. The build is from June 16 with git hash 52e058 for FB MySQL and 7e5fac for RocksDB. Upstream 5.6.35 and 5.7.17 was used for InnoDB.
    All tests used jemalloc with mysqld. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log.

    Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is much larger than RAM.

    I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller. The SSD on the i3 NUC is slower than on the i5 NUC. InnoDB is more dependent than MyRocks on IO performance and in some tests below MyRocks does much better on the i3 NUC than the i5 NUC relative to InnoDB.

    Tests are repeated for MyRocks without and with compression. The compression configuration is none for L0/L1/L2 and then with LZ4 for the middle levels of the LSM tree and then zstandard for the max level. In the rest of this post that is described as zstandard compression.

    Results

    All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The graphs have the relative QPS where that is the QPS for a configuration relative to the base case. The base case is InnoDB from upstream 5.6.35. The base case is faster when the relative QPS is less than 1. The tables that follow have the absolute and relative QPS.
    Graphs

    The graphs have the QPS relative to the InnoDB from MySQL 5.6.35. The name MyRocks.none is for MyRocks without compression and MyRocks.zstd is for MyRocks with zstandard compression.

    There are 4 types of tests and graphs for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so there are more graphs here. The tests are explained here.

    The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are the relative QPS for update-index, update-nonindex and insert. MyRocks does 3X to 10X better than InnoDB on update-index because non-unique secondary index maintenance is read-free for it.
    The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are the relative QPS for read-write with range-size=100, read-only with range-size=10,000 and then full-scan. The results for read-only and full-scan are for the test run after the write-heavy tests. The results for full-scan include two extra configurations that both enable filesystem readahead during the scan: MyRocks.none.ra, MyRocks.zstd.ra.
    The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is the relative QPS for the test run after the write-heavy tests.
    The inlist-query group includes the hot-points test and the random-points tests. The random-points result is from the test run after the write-heavy tests. The graph is the relative QPS.
    full-scan

    The sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for InnoDB from MySQL 5.6.35 (InnoDB-5.6). Values are provided for the i3 and i5 NUC. 

    For full-scan results are provided for two extra configurations that both enable filesystem readahead during the scan: MyRocks.none.ra, MyRocks.zstd.ra. As seen below readahead is great for full-scan. Unfortunately that feature is not ready for production. I don't know if filesystem readahead is the right solution in this case, but it was easy to use for a benchmark. For a range scan there is an iterator open on each level of the LSM tree and RocksDB does page at a time reads. For a long range scan those reads can be much larger than page at a time. Issue 723 is open for this.

    The full scan of the PK index is done before and after the write-heavy tests. Note that tests run before the write-heavy tests are still run immediately after the initial load. So in both cases there can be page write-back activity with InnoDB and compaction with MyRocks. But from data I haven't shared there were not writes in progress for the i5 NUC. 

    Summary of full-scan throughput:

    • The best perf is from MyRocks without compression when filesystem readahead is used
    • InnoDB-5.7 is better than InnoDB-5.6 on the i5 NUC but worse than it on the i3 NUC
    Legend:
    * Mrps - scan rate in millions of rows per second

    * ratio - ratio of Mrps for this engine vs InnoDB-5.6

    before write-heavy
    i3 NUC          i5 NUC
    Mrps    ratio   Mrps    ratio   engine
    1.019   1.00    1.766   1.00    InnoDB-5.6
    0.320   0.31    2.424   1.37    InnoDB-5.7
    0.879   0.86    1.194   0.68    MyRocks.none
    1.927   1.89    2.318   1.31    MyRocks.none.ra
    0.714   0.70    0.860   0.49    MyRocks.zstd
    1.006   0.99    1.280   0.72    MyRocks.zstd.ra

    after write-heavy
    i3 NUC          i5 NUC
    Mrps    ratio   Mrps    ratio   engine
    0.914   1.00    1.786   1.00    InnoDB-5.6
    0.829   0.91    2.406   1.35    InnoDB-5.7
    0.610   0.67    1.126   0.63    MyRocks.none
    0.969   1.06    2.133   1.19    MyRocks.none.ra
    0.477   0.52    0.816   0.46    MyRocks.zstd
    0.963   1.05    1.212   0.68    MyRocks.zstd.ra

    Data from iostat and vmstat helps to understand the performance differences and the benefit from readahead for MyRocks. We need to make that work for real. On the i5 NUC:
    • InnoDB-5.6 and MyRocks use ~2X more CPU per row than InnoDB-5.7
    • InnoDB-5.7 gets more IO throughput: ~2.1X more vs MyRocks, ~1.4X more vs InnoDB-5.6
    • Filesystem readahead reduces the CPU overhead and increases the IO read rate for MyRocks

    Legend:
    * CPU.avg - average CPU utilization
    * CPU/row - cost per row scanned
    * rGB - GB read from storage
    * rMB/s - average read IO rate from storage

    before write-heavy
    CPU.avg CPU/row rGB     rMB/s   engine
    28.0    48.96   57      219.6   MyRocks.none
    50.5    22.73   58      437.6   MyRocks.none.ra
    45.4    53.91   59      337.4   InnoDB-5.6
    27.7    24.26   59      460.0   InnoDB-5.7

    after write-heavy
    CPU.avg CPU/row rGB     rMB/s   engine
    27.2    50.30   60      219.1   MyRocks.none
    49.4    24.06   62      427.2   MyRocks.none.ra
    46.0    54.01   59      338.8   InnoDB-5.6
    29.0    25.36   59      457.6   InnoDB-5.7

    update-inlist

    MyRocks does better than InnoDB and the difference here is larger than for update-nonindex. Both tests do updates that don't require secondary index maintenance but this test updates 100 rows/statement versus 1/statement for update-nonindex. The relative time above the storage engine is larger here, and a more efficient engine has less of an impact here. On the i5 NUC:
    • CPU/update is ~1.5X larger for InnoDB-5.6 and ~1.1X larger for InnoDB-5.7 than MyRocks
    • KB written to storage per update is ~10X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    212     1.00    307     1.00    InnoDB-5.6
    213     1.00    392     1.28    InnoDB-5.7
    445     2.10    430     1.40    MyRocks.none
    363     1.71    458     1.49    MyRocks.zstd

    update-one

    InnoDB does better than MyRocks. This test updates the same row repeatedly. There are no reads from storage from processing the update, but MyRocks compaction might read from storage. The update statement doesn't require secondary index maintenance. On the i5 NUC:
    • CPU/update is ~1.1X larger for InnoDB-5.7 and MyRocks than InnoDB-5.6
    • KB written to storage per update is similar for MyRocks and InnoDB

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    9120    1.00    10988   1.00    InnoDB-5.6
    7839    0.86     9570   0.87    InnoDB-5.7
    7656    0.84     8508   0.77    MyRocks.none
    7774    0.85     8749   0.80    MyRocks.zstd

    update-index

    MyRocks is 3X to 10X faster than InnoDB because secondary index maintenance is read-free. On the i5 NUC:
    • CPU/update is ~4X larger for InnoDB than MyRocks
    • KB written to storage per update is ~30X larger for InnoDB than MyRocks
    • Storage read operations per update is ~2X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
     220     1.00    850    1.00    InnoDB-5.6
     312     1.42    924    1.09    InnoDB-5.7
    2477    11.26   3004    3.53    MyRocks.none
    2296    10.44   2778    3.27    MyRocks.zstd

    update-nonindex

    MyRocks and InnoDB have similar performance here. Secondary index maintenance isn't done for this test, so MyRocks doesn't gain from read-free index maintenance. See the comment for update-inlist. On the i5 NUC:
    • Storage read operations per update is ~0.9 for InnoDB and MyRocks
    • CPU/update is similar for InnoDB and MyRocks
    • KB written to storage per update is ~5X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    1526    1.00    2683    1.00    InnoDB-5.6
    1556    1.02    2773    1.03    InnoDB-5.7
    1391    0.91    2888    1.08    MyRocks.none
    2237    1.47    2715    1.01    MyRocks.zstd

    delete

    MyRocks does better. On the i5 NUC:
    • CPU/delete is ~2X larger for InnoDB than MyRocks
    • KB written to storage per delete is ~13X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2729    1.00    5068    1.00    InnoDB-5.6
    2763    1.01    5115    1.01    InnoDB-5.7
    7666    2.81    8790    1.73    MyRocks.none
    7281    2.67    8268    1.63    MyRocks.zstd

    read-write with range-size=100

    MyRocks is slightly better than InnoDB. I was surprised by this and my guess is that MyRocks efficiency on insert/update/delete is larger than the InnoDB efficiency on range scans. Much of the write IO here can be for rows modified by the previous tests. On the i5 NUC:
    • CPU/query is ~1.2X larger for InnoDB than MyRocks
    • KB written to storage per query is ~20X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2028    1.00    2799    1.00    InnoDB-5.6
    2208    1.09    3077    1.10    InnoDB-5.7
    2565    1.26    3126    1.12    MyRocks.none
    2517    1.24    2901    1.04    MyRocks.zstd

    read-write with range-size=10000

    InnoDB-5.7 is the best here. Unlike the previous test, tt is faster than MyRocks because the range scan here is longer (10,000 vs 100 rows). Someone improved range scan performance for InnoDB in 5.7.  On the i5 NUC:
    • CPU/query is ~2X larger for MyRocks and InnoDB-5.6 than InnoDB-5.7
    • KB written to storage per query is ~12X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    169     1.00    217     1.00    InnoDB-5.6
    257     1.52    335     1.54    InnoDB-5.7
    146     0.86    202     0.93    MyRocks.none
    132     0.78    163     0.75    MyRocks.zstd

    read-only with range-size=100

    InnoDB is faster here while MyRocks did better on read-write with range-size=100. I assume this is explained by MyRocks benefiting from faster insert, update, delete on the read-write test. InnoDB in 5.7 continues to benefit from improvements to range-scan performance. On the i5 NUC:
    • CPU/query was ~1.4X larger for InnoDB-5.6 and ~1.8X larger for MyRocks than InnoDB-5.7

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2885    1.00    3687    1.00    InnoDB-5.6
    3272    1.13    4954    1.34    InnoDB-5.7
    2464    0.85    3011    0.82    MyRocks.none
    2406    0.83    2779    0.75    MyRocks.zstd

    read-only.pre with range-size=10000

    InnoDB in 5.7 continues to benefit from improvements to range-scan performance. MyRocks.none did better than I expected for this test, perhaps because it is run before the write-heavy tests. On the i5 NUC:
    • CPU/query was ~1.7X larger for InnoDB-5.6 and ~1.5X larger for MyRocks than InnoDB-5.7
    • Storage read operations and read KB per query were ~1.2X larger for InnoDB than MyRocks

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    139     1.00    173     1.00    InnoDB-5.6
    212     1.53    273     1.58    InnoDB-5.7
    137     0.99    174     1.01    MyRocks.none
    112     0.81    138     0.80    MyRocks.zstd

    read-only with range-size=100000

    InnoDB in MySQL 5.7 continues to have the best range-scan performance. The MyRocks QPS here is less compared to the same test from the previous section. The tests in the previous section are run before write-heavy tests while tests here are run after them. It costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM. On the i5 NUC:
    • CPU/query is 1.06X larger here for MyRocks.none compared to the previous test.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    140     1.00    178     1.00    InnoDB-5.6
    210     1.50    275     1.54    InnoDB-5.7
    102     0.73    166     0.93    MyRocks.none
    105     0.75    132     0.74    MyRocks.zstd

    point-query.pre

    InnoDB continues to get the best QPS but the difference between MySQL 5.6 and 5.7 is smaller than it was for range queries. On the i5 NUC:
    • CPU/query is ~1.2X larger for InnoDB-5.6 and ~1.4X larger for MyRocks than InnoDB-5.7
    • Storage read operations per query are ~1.06 larger for InnoDB than MyRocks.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    3879    1.00    5250    1.00    InnoDB-5.6
    4264    1.10    6198    1.18    InnoDB-5.7
    4177    1.08    4370    0.83    MyRocks.none
    3354    0.86    3969    0.76    MyRocks.zstd

    point-query

    InnoDB continues to get the best QPS but the difference between MySQL 5.6 and 5.7 is smaller than it was for range queries. The MyRocks QPS here is less compared to the same test from the previous section, which is expected for read-heavy tests that follow write-heavy tests. On the i5 NUC the iostat and vmstat metrics are similar to the result above for point-query.pre with two exceptions:
    • CPU/query is ~1.5X larger for MyRocks than InnoDB-5.7. This is larger than above.
    • Storage read operations/query are similar rather than 6% larger for InnoDB

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    3896    1.00    5310    1.00    InnoDB-5.6
    4332    1.11    6155    1.16    InnoDB-5.7
    2361    0.61    3966    0.75    MyRocks.none
    2742    0.70    3707    0.70    MyRocks.zstd

    random-points.pre

    InnoDB-5.7 does the best. It is odd that MyRocks does better than InnoDB-5.6 here but not above on point-query. Both do point lookups but this test does 100 lookups per SELECT while the point-query tests do 1 per SELECT. On the i5 NUC:
    • CPU/query is ~4.8X larger for InnoDB-5.6 and ~3.4X larger for MyRocks than InnoDB-5.7. I wonder if a change in the optimizer explains this because this looks like a difference between MySQL 5.6 and 5.7 rather than between engines.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    56      1.00     50     1.00    InnoDB-5.6
    69      1.23    105     2.10    InnoDB-5.7
    65      1.16     57     1.14    MyRocks.none
    50      0.89     61     1.22    MyRocks.zstd

    random-points

    Results are similar to random-points.pre: InnoDB-5.7 does the best and MySQL 5.6 uses more CPU/query than 5.7. On the i5 NUC:
    • CPU/query is ~3.5X larger for InnoDB-5.6 and ~2.6X larger for MyRocks than InnoDB-5.7. See the comment in the previous section.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    56      1.00     64     1.00    InnoDB-5.6
    51      0.91    100     1.56    InnoDB-5.7
    36      0.64     62     0.97    MyRocks.none
    41      0.73     57     0.89    MyRocks.zstd

    hot-points

    InnoDB gets at least 3X more QPS than MyRocks. This test is always in-memory and the QPS here is similar to the QPS from in-memory sysbench. On the i5 NUC:
    • CPU/query is ~3X larger for MyRocks than InnoDB.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    3609    1.00    4167    1.00    InnoDB-5.6
    3455    0.96    3771    0.90    InnoDB-5.7
    1000    0.28    1125    0.27    MyRocks.none
    1117    0.31    1088    0.26    MyRocks.zstd

    insert

    MyRocks gets a higher insert rate. On the i5 NUC:
    • Storage read operations per insert are 6X to 8X larger for InnoDB than MyRocks
    • KB written to storage per insert are ~3X larger for InnoDB than MyRocks
    • CPU/insert is ~1.1X larger for InnoDB-5.7 and MyRocks than InnoDB-5.6

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    5097    1.00    7991    1.00    InnoDB-5.6
    5730    1.12    7234    0.91    InnoDB-5.7
    7867    1.54    8652    1.08    MyRocks.none
    7828    1.54    8299    1.04    MyRocks.zstd

    Tuesday, November 28, 2017

    Marketing and the Dunning-Kruger effect

    I am wary of user reports that claim product X was lousy for them, then they moved to product Y and everything was awesome. Sometimes this means that product X was lousy -- in general or for their use case. Other times it means the team using product X did a lousy job deploying it. It is hard for the reader to figure this out. It can also be hard for some authors to figure this out thanks to the Dunning-Kruger effect so lousy reports will continue to be published. These reports are not my favorite form of marketing and some of the bad ones linger for years. We deserve better especially in the open-source database market where remarkable progress is being made.

    I have written before on benchmarketing. Other posts that mention it are here.

    Tuesday, November 21, 2017

    Sysbench, IO-bound, small server: MyRocks over time

    In this post I compare four MyRocks releases from February to October using IO-bound sysbench and a small server. The goal is to understand where we have made MyRocks faster and slower this year. I previously shared results for in-memory sysbench with MyRocks and IO-bound sysbench with InnoDB. Tests were done for builds of MyRocks from February 10, April 14, June 16, August 15 and October 16.

    tl;dr
    • There is more variance in QPS on IO-bound sysbench than on in-memory sysbench. I didn't try to determine how much of that is caused by storage devices and how much by MyRocks.
    • Not much QPS is lost when compression is used
    • A typical result is a loss of 10% of QPS from February 10 to October 16
    • Full-scan might have lost 15% of throughput from February 10 to October 16
    • Full-scan throughput is between 1.2X and 1.6X better when filesystem readahead is enabled
    • Some read-heavy tests run after write-heavy tests lose more QPS in October 16 than February 10 when compared to the same test run before write-heavy tests. This was also seen on in-memory sysbench.

    Configuration

    The tests used MyRocks from FB MySQL which is currently based on 5.6.35. Builds were done using FB MySQL as of February 10, April 14, June 16, August 15 and October 16. The git hashes for these builds are:
    • February 10 - FB MySQL f3019b, RocksDB c2ca7a
    • April 14 - FB MySQL e28823, RocksDB 9300ef
    • June 16 - FB MySQL 52e058, RocksDB 7e5fac
    • August 15 - FB MySQL 0d76ae, RocksDB 50a969
    • October 16 - FB MySQL 1d0132, RocksDB 019aa7
    All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines but there are a few new & changed options in that time. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. Compression was not used.

    Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is much larger than RAM.

    I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller.

    Tests are repeated for MyRocks without compression and then with LZ4 for the middle levels of the LSM tree and zstandard for the max level.

    Results

    All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The graphs have the relative QPS where that is the QPS for a configuration relative to the base case. The base case is the QPS for the Feb10 build without compression. When the relative QPS is less than 1 then the base case is faster. The tables that follow have the absolute and relative QPS. The tests are explained here.

    Graphs

    The graphs have the QPS relative to the Feb10 build without compression. i3-none and i5-none are results for the i3 and i5 NUCs without compression. i3-zstd and i5-zstd are results for the i3 and i5 NUCs with zstandard compression.

    There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so I provide extra graphs here. The tests are explained here.

    The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are for update-nonindex and update-index. To keep this from getting out of hand I save the analysis for the per-test sections.

    For write-heavy most of the results have a relative QPS of ~0.9 on the Oct16 builds that don't use compression. There is more variance on the i3 NUC as seen below for i3-none.

    The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are for read-write with range-size=100 and read-only with range-size=10,000. The largest regression comes after Feb10 or Apr14. From the graphs below the QPS decrease was larger on the i3 NUC.
    The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests. The largest regression comes after Apr14. The Oct16 builds without compression have a relative QPS of ~0.9.
    The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests.
    full-scan

    Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for feb10.none. Values are provided for the i3 and i5 NUC.

    The full scan of the PK index is done before and after the write-heavy tests. There is a regression on full scan throughput for the i5 NUC without compression. Otherwise there is a lot of variance. 

    QPS in the Oct16 build relative to Feb10:

    • For the i3 NUC gets better for the before and worse for the after write-heavy tests
    • For the i5 NUC gets worse for both the before and after write-heavy tests. The reduction for the after write-heavy tests in oct16.none on both the i3 and i5 NUC might be worth debugging as it is ~15%.
    I repeated the Jun16 test with an option to make filesystem readahead more likely and that increased throughput by between 1.2X and 1.6X - see jun16.none.ra and jun16.zstd.ra. This option, rocksdb_advise_random_on_open=0, isn't safe to set for general purpose workloads.

    before write-heavy
    i3 NUC          i5 NUC
    Mrps    ratio   Mrps    ratio   engine
    0.796   1.00    1.454   1.00    feb10.none
    1.019   1.39    1.409   0.97    apr14.none
    0.879   1.10    1.194   0.82    jun16.none
    1.927   2.42    2.318   1.59    jun16.none.ra
    0.860   1.08    1.198   0.82    aug15.none
    0.898   1.13    1.230   0.85    oct16.none
    -
    0.714   0.90    0.916   0.63    feb10.zstd
    0.761   0.96    0.930   0.64    apr14.zstd
    0.714   0.90    0.860   0.59    jun16.zstd
    1.006   1.26    1.280   0.88    jun16.zstd.ra
    0.737   0.93    0.833   0.57    aug15.zstd
    0.747   0.94    0.876   0.60    oct16.zstd

    after write-heavy
    i3 NUC          i5 NUC
    Mrps    ratio   Mrps    ratio   engine
    0.698   1.00    1.327   1.00    feb10.none
    0.758   1.09    1.280   0.96    apr14.none
    0.610   0.87    1.126   0.85    jun16.none
    0.969   1.39    2.133   1.61    jun16.none.ra
    0.620   0.89    1.081   0.81    aug15.none
    0.597   0.86    1.134   0.85    oct16.none
    -
    0.653   0.94    0.886   0.67    feb10.zstd
    0.575   0.82    0.881   0.66    apr14.zstd
    0.477   0.68    0.816   0.61    jun16.zstd
    0.963   1.38    1.212   0.91    jun16.zstd.ra
    0.522   0.75    0.804   0.61    aug15.zstd
    0.522   0.75    0.814   0.61    oct16.zstd

    update-inlist

    QPS in the Oct16 build relative to Feb10:
    • For the i3 NUC is better
    • For the i5 NUC is unchanged for oct16.none and better for oct16.zstd

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    375     1.00    403     1.00    feb10.none
    477     1.27    492     1.22    apr14.none
    445     1.19    430     1.07    jun16.none
    449     1.20    488     1.21    aug15.none
    455     1.21    405     1.00    oct16.none
    -
    344     0.92    443     1.10    feb10.zstd
    374     1.00    466     1.16    apr14.zstd
    363     0.97    458     1.14    jun16.zstd
    376     1.00    437     1.08    aug15.zstd
    372     0.99    463     1.15    oct16.zstd

    update-one

    QPS in the Oct16 build relative to Feb10 is worse in all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    8514    1.00    9287    1.00    feb10.none
    7854    0.92    8972    0.97    apr14.none
    7656    0.90    8508    0.92    jun16.none
    7470    0.88    8377    0.90    aug15.none
    7823    0.92    8655    0.93    oct16.none
    -
    8280    0.97    9180    0.99    feb10.zstd
    7884    0.93    9270    1.00    apr14.zstd
    7774    0.91    8749    0.94    jun16.zatd
    7596    0.89    8517    0.92    aug15.zstd
    7704    0.90    8512    0.92    oct16.zstd

    update-index

    QPS in the Oct16 build relative to Feb10 is slightly worse for oct16.none and the same or better for oct16.zstd.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2515    1.00    3057    1.00    feb10.none
    1570    0.62    3084    1.01    apr14.none
    2477    0.98    3004    0.98    jun16.none
    2460    0.98    3008    0.98    aug15.none
    2411    0.96    3038    0.99    oct16.none
    -
    2295    0.91    2704    0.88    feb10.zstd
    2279    0.91    2787    0.91    apr14.zstd
    2296    0.91    2778    0.91    jun16.zstd
    2242    0.89    2779    0.91    aug15.zstd
    2294    0.91    2799    0.92    oct16.zstd

    update-nonindex

    QPS in the Oct16 build relative to Feb10 is worse for oct16.none and better for oct16.zstd.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2393    1.00    2987    1.00    feb10.none
    2265    0.95    3115    1.04    apr14.none
    1391    0.58    2888    0.97    jun16.none
    1403    0.59    2893    0.97    aug15.none
    1445    0.60    2938    0.98    oct16.none
    -
    2257    0.94    2562    0.86    feb10.zstd
    2279    0.95    2839    0.95    apr14.zstd
    2237    0.98    2715    0.91    jun16.zstd
    2266    0.95    2680    0.90    aug15.zstd
    2265    0.95    2725    0.91    oct16.zstd

    delete

    QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    7924    1.00    9076    1.00    feb10.none
    7810    0.99    9602    1.06    apr14.none
    7666    0.97    8790    0.97    jun16.none
    7566    0.95    8806    0.97    aug15.none
    7505    0.95    8802    0.97    oct16.none
    -
    7373    0.93    8079    0.89    feb10.zstd
    7222    0.91    9002    0.99    apr14.zstd
    7281    0.92    8268    0.91    jun16.zstd
    6955    0.88    8313    0.92    aug15.zstd
    7000    0.88    8397    0.93    oct16.zstd

    read-write with range-size=100

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2992    1.00    3360    1.00    feb10.none
    2831    0.95    3316    0.99    apr14.none
    2565    0.86    3126    0.93    jun16.none
    2608    0.87    3092    0.92    aug15.none
    2595    0.87    3105    0.92    oct16.none
    -
    2543    0.85    2988    0.89    feb10.zstd
    2572    0.86    3008    0.90    apr14.zstd
    2517    0.84    2901    0.86    jun16.zstd
    2472    0.83    2780    0.83    aug15.zstd
    2514    0.84    2887    0.86    oct16.zstd

    read-write with range-size=10000

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    168     1.00    226     1.00    feb10.none
    163     0.97    223     0.99    apr14.none
    146     0.87    202     0.89    jun16.none
    147     0.88    205     0.91    aug15.none
    149     0.89    202     0.89    oct16.none
    -
    142     0.85    175     0.77    feb10.zstd
    134     0.80    170     0.75    apr14.zstd
    132     0.79    163     0.72    jun16.zstd
    132     0.79    161     0.71    aug15.zstd
    136     0.81    163     0.72    oct16.zstd

    read-only with range-size=100

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2866    1.00    3257    1.00    feb10.none
    2677    0.93    3137    0.96    apr14.none
    2464    0.86    3011    0.92    jun16.none
    2528    0.88    3069    0.94    aug15.none
    2531    0.88    3011    0.92    oct16.none
    -
    2569    0.90    3142    0.96    feb10.zstd
    2581    0.90    3003    0.92    apr14.zstd
    2406    0.84    2779    0.85    jun16.zstd
    2419    0.84    2777    0.85    aug15.zstd
    2476    0.86    2819    0.87    oct16.zstd

    read-only.pre with range-size=10000

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    150     1.00    189     1.00    feb10.none
    150     1.00    195     1.03    apr14.none
    137     0.91    174     0.92    jun16.none
    137     0.91    176     0.93    aug15.none
    136     0.91    173     0.92    oct16.none
    -
    118     0.79    145     0.77    feb10.zstd
    117     0.78    143     0.76    apr14.zstd
    112     0.75    138     0.73    jun16.zstd
    112     0.75    136     0.72    aug15.zstd
    114     0.76    139     0.74    oct16.zstd

    read-only with range-size=100000

    QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

    The QPS here is less compared to the same test from the previous section. The tests in the previous section are run before write-heavy tests while tests here are run after them. It costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM.

    The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    129     1.00    184     1.00    feb10.none
    102     0.79    181     0.98    apr14.none
    102     0.79    166     0.90    jun16.none
     95     0.74    166     0.90    aug15.none
    101     0.78    164     0.89    oct16.none
    -
    101     0.78    142     0.77    feb10.zstd
    108     0.84    138     0.75    apr14.zstd
    105     0.81    132     0.72    jun16.zstd
    104     0.81    130     0.71    aug15.zstd
    107     0.83    132     0.72    oct16.zstd

    point-query.pre

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    4435    1.00    4900    1.00    feb10.none
    4596    1.04    4994    1.02    apr14.none
    4177    0.94    4370    0.89    jun16.none
    4137    0.93    4494    0.92    aug15.none
    4226    0.95    4438    0.91    oct16.none
    -
    3422    0.77    4370    0.89    feb10.zstd
    3439    0.78    4325    0.88    apr14.zstd
    3354    0.76    3969    0.81    jun16.zstd
    3293    0.74    3992    0.81    aug15.zstd
    3305    0.75    3962    0.81    oct16.zstd

    point-query

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    The QPS here is less compared to the same test from the previous section, which is expected for read-heavy tests that follow write-heavy tests. But the decrease is huge for the i3 NUC. I didn't debug that.

    The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    2735    1.00    4420    1.00    feb10.none
    2858    1.04    4261    0.96    apr14.none
    2361    0.86    3966    0.90    jun16.none
    2452    0.90    3995    0.90    aug15.none
    2346    0.86    4022    0.91    oct16.none
    -
    2764    1.01    4117    0.93    feb10.zstd
    2638    0.96    3958    0.90    apr14.zstd
    2742    1.00    3707    0.84    jun16.zstd
    2667    0.98    3721    0.84    aug15.zstd
    2628    0.96    3731    0.84    oct16.zstd

    random-points.pre

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    68      1.00    70      1.00    feb10.none
    73      1.07    65      0.93    apr14.none
    65      0.96    57      0.81    jun16.none
    65      0.96    65      0.93    aug15.none
    64      0.94    54      0.77    oct16.none
    -
    52      0.76    65      0.93    feb10.zstd
    52      0.76    65      0.93    apr14.zstd
    50      0.74    61      0.87    jun16.zstd
    50      0.74    60      0.86    aug15.zstd
    50      0.74    61      0.87    oct16.zstd

    random-points

    QPS in the Oct16 build relative to Feb10 is worse for all cases. What I wrote in the point-query section is mostly true here, especially the part about QPS being worse for the test run after write-heavy tests.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    50      1.00    56      1.00    feb10.none
    44      0.88    54      0.96    apr14.none
    36      0.72    62      1.11    jun16.none
    40      0.80    63      1.13    aug15.none
    40      0.80    50      0.89    oct16.none
    -
    43      0.86    62      1.11    feb10.zstd
    44      0.88    62      1.11    apr14.zstd
    41      0.82    57      1.02    jun16.zstd
    40      0.80    55      0.98    aug15.zstd
    37      0.74    57      1.02    oct16.zstd

    hot-points

    While this is an IO-bound benchmark the hot-points test is always in-memory. But the results here have more variance than on in-memory sysbench. I didn't debug that.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    1437    1.00    1327    1.00    feb10.none
    1263    0.88    1456    1.10    apr14.none
    1000    0.70    1125    0.85    jun16.none
    1162    0.81    1307    0.98    aug15.none
    1288    0.90    1339    1.01    oct16.none
    -
    1311    0.91    1417    1.07    feb10.zstd
    1399    0.97    1450    1.09    apr14.zstd
    1117    0.78    1088    0.82    jun16.zstd
    1139    0.79    1391    1.05    aug15.zstd
    1310    0.91    1378    1.04    oct16.zstd

    insert

    QPS in the Oct16 build relative to Feb10 is worse for all cases.

    i3 NUC          i5 NUC
    QPS     ratio   QPS     ratio   engine
    8056    1.00    8654    1.00    feb10.none
    8233    1.02    9403    1.09    apr14.none
    7867    0.98    8652    1.00    jun16.none
    7930    0.98    8864    1.02    aug15.none
    7398    0.92    8236    0.95    oct16.none
    -
    7922    0.98    8540    0.99    feb10.zstd
    8386    1.04    8981    1.04    apr14.zstd
    7828    0.97    8299    0.96    jun16.zstd
    7637    0.95    8538    0.99    aug15.zstd
    6194    0.77    8075    0.93    oct16.zstd

    Battle of the Mallocators: part 2

    This post addresses some of the feedback I received from my previous post on the impact of the malloc library when using RocksDB and MyRock...

    close