Cardinality estimation problem on inner join

Question

I'm struggling to understand why row estimation is so terribly wrong, here is my case:

Simple join - using SQL Server 2016 sp2 (same issue on sp1), dbcompatiblity=130.

select Amount_TransactionCurrency_id, CurrencyShareds.id from CurrencyShareds INNER JOIN annexes ON Amount_TransactionCurrency_id = CurrencyShareds.Id option (QUERYTRACEON 3604, QUERYTRACEON 2363);

SQL estimates 1 row, whereas it's 107131 and chooses to do a nested loop (link to plan). After statistics are updated on CurrencyShareds then estimation is fine and a merge join is chosen (link to new plan). As soon as just one record is added to CurrencyShareds, then statistics become "stale" and sql goes back to wrong estimation.

I wouldn't worry that much about this simple query, but this is just a part of a larger one, and this is the begining of a domino...

Why adding one row to 100 records table causes such a damage? When looking into the output of cardinality estimation trace, I see this warning ***WARNING: badly-formed histogram *** but I couldn't find anything more on this topic.

Here is output the full output from cardinality estimation:

Begin selectivity computation Input tree: LogOp_Join CStCollBaseTable(ID=1, CARD=107131 TBL: annexes) CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds) ScaOp_Comp x_cmpEq ScaOp_Identifier QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id ScaOp_Identifier QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id Plan for computation: CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id ) Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7 Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 *** WARNING: badly-formed histogram *** Selectivity: 4.59503e-018 Stats collection generated: CStCollJoin(ID=3, CARD=1 x_jtInner) CStCollBaseTable(ID=1, CARD=107131 TBL: annexes) CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds) End selectivity computation Estimating distinct count in utility function Input stats collection: CStCollBaseTable(ID=1, CARD=107131 TBL: annexes) Columns to distinct on:QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id Plan for computation: CDVCPlanLeaf 0 Multi-Column Stats, 1 Single-Column Stats, 0 Guesses Covering multi-col stats id: 7 Using ambient cardinality 107131 to combine distinct counts: 5 Combined distinct count: 5 Result of computation: 5 Estimating distinct count in utility function Input stats collection: CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds) Columns to distinct on:QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id Plan for computation: CDVCPlanUniqueKey Result of computation: 100

And when I update the statistics on CurrencyShareds the part with "badly-formed histogram" changes and cardinality is calculated correctly

Plan for computation: CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id ) Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7 Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 Selectivity: 0.01 Stats collection generated: CStCollJoin(ID=3, CARD=107131 x_jtInner) CStCollBaseTable(ID=1, CARD=107131 TBL: annexes) CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds) End selectivity computation

And stats info for this "[CurrencyShareds].Id from stats with id 1" with warning about histogram, which looks fine to me...

Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows Persisted Sample Percent -------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------- ------------------------ PK_CurrencyShareds_Id May 23 2018 10:43PM 98 98 75 1 8 NO NULL 98 0 (1 row affected) All density Average Length Columns ------------- -------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0,01020408 8 Id (1 row affected) RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS -------------------- ------------- ------------- -------------------- -------------- 119762190797406464 0 1 0 1 119762190797406466 1 1 1 1 119762190797406468 1 1 1 1 119762190797406470 1 1 1 1 119762190797406472 1 1 1 1 119762190797406474 1 1 1 1 119762190797406476 1 1 1 1 119762190797406478 1 1 1 1 119762190797406480 1 1 1 1 119762190797406482 1 1 1 1 119762190797406484 1 1 1 1 119762190797406486 1 1 1 1 119762190797406488 1 1 1 1 119762190797406490 1 1 1 1 119762190797406492 1 1 1 1 119762190797406494 1 1 1 1 119762190797406496 1 1 1 1 119762190797406498 1 1 1 1 119762190797406500 1 1 1 1 119762190797406502 1 1 1 1 119762190797406504 1 1 1 1 119762190797406506 1 1 1 1 119762190797406507 0 1 0 1 478531702587687680 0 1 0 1 478531702591881728 0 1 0 1 478531702591881729 0 1 0 1 478531702591881984 0 1 0 1 478531702591881985 0 1 0 1 478531702596076032 0 1 0 1 478531702596076033 0 1 0 1 478531702596076288 0 1 0 1 478531702600270336 0 1 0 1 478531702600270592 0 1 0 1 478532235583062528 0 1 0 1 478532235583062784 0 1 0 1 478532235587256832 0 1 0 1 530792464911467264 0 1 0 1 530792464924049920 0 1 0 1 530792464924050176 0 1 0 1 530792464928244224 0 1 0 1 530792464928244480 0 1 0 1 530792464932438528 0 1 0 1 530792464932438784 0 1 0 1 530792464936632832 0 1 0 1 530792464936632833 0 1 0 1 530792464936633088 0 1 0 1 530792464940827136 0 1 0 1 530792464940827392 0 1 0 1 530792464949216000 2 1 2 1 530792464953410048 0 1 0 1 530792464953410304 0 1 0 1 530792464957604352 0 1 0 1 530792464957604353 0 1 0 1 530792464957604608 0 1 0 1 530792464961798656 0 1 0 1 530792464961798912 0 1 0 1 530792464965992960 0 1 0 1 530792464965993216 0 1 0 1 530792464965993217 0 1 0 1 530792464970187264 0 1 0 1 530792464970187265 0 1 0 1 530792464970187520 0 1 0 1 530792464974381568 0 1 0 1 530792464974381824 0 1 0 1 530792464974381825 0 1 0 1 530792464978575872 0 1 0 1 530792464978575873 0 1 0 1 530792464978576128 0 1 0 1 867420708903354880 0 1 0 1 867420708903355136 0 1 0 1 867420708903355137 0 1 0 1 960876568220042240 0 1 0 1 976385263448130048 0 1 0 1 977302121709864192 0 1 0 1 977955748426318592 0 1 0 1

and info for the second index:

Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows Persisted Sample Percent -------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------- ------------------------ IX_FK_Amount_TransactionCurrency May 21 2018 3:29PM 107204 107204 5 0 16 NO NULL 107204 0 (1 row affected) All density Average Length Columns ------------- -------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0,2 8 Amount_TransactionCurrency_id 9,32801E-06 16 Amount_TransactionCurrency_id, Id (2 rows affected) RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS -------------------- ------------- ------------- -------------------- -------------- 119762190797406475 0 160 0 1 119762190797406478 0 867 0 1 119762190797406481 0 106 0 1 119762190797406494 0 105742 0 1 119762190797406496 0 329 0 1

Joe Obbish · Accepted Answer · 2018-05-24 01:22:22Z

Based on your histograms I was able to repro the issue in 2017 CU6. I wouldn't say that you're doing something wrong. Rather, something is going wrong with cardinality estimation. Here's what I get before inserting a row:

The final cardinality estimate falls quite a bit after inserting a row:

You have a pretty simple repro here so my advice is to file product feedback or to open a support ticket with Microsoft. I was able to find a few workarounds that worked on your sample data and one of the might be acceptable for you.

Drop the unique index on CurrencyShareds.Id. I can't get the repro to work without a unique index. The table is small, so maybe you can get by without the index. Of course, you might have very good reasons for keeping it.
Materialize the results of the join into a temp table. Based on your question it's important to get a reasonable estimate at this step so the larger query performs well. A temp table is one way to make that happen.
Use the legacy CE. I can't get the issue to reproduce with it. Of course, this might have negative consequences on the rest of your query.
Trick the query optimizer with silly code. For example, in my testing the following rewrite works great:

.

select Amount_TransactionCurrency_id, CurrencyShareds.id from CurrencyShareds INNER JOIN annexes ON Amount_TransactionCurrency_id % 9223372036854775809 = CurrencyShareds.Id % 9223372036854775809

I suspect that this works because the CE appears to use the density instead of the histogram. Other similar rewrites may have the same effect. There's no guarantee that type of query will continue to work well in the future. That's why you should contact Microsoft to improve the odds that one day a fix for your issue will make it into the released product.

LeMaciek · Accepted Answer · 2018-06-08 09:38:54Z

Ok, I hope I understand it now - so this our case

Given

A reference table (CurrencyShareds) with ~100 rows, but ids are large, and min, max values differ very much - min: 119,762,190,797,406,464 vs max: 977,955,748,426,318,592
A table (Annexes) that has simple FK to CurrencyShared, but only few Currencies are used - you can see that histogram for IX_FK_Amount_TransactionCurrency lists 5 ids - and what is important only those "low" ids, as others are not used.

When all stats are up to date then

CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id ) Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7 Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 Selectivity: 0.01

Then selectivity calculated for the join is fine, as 100 * 107,131 * 0.01 = 107,131

When stats for currencyshareds are not up to date, then

CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id ) Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7 Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 *** WARNING: badly-formed histogram *** Selectivity: 4.59503e-018

Selectivity drops dramatically, and hence the estimated row number of the join is 1.

When histogram changes

After I add a single row to annexes that refrences CurrencyShared with high id, then in result the histogram for IX_FK_Amount_TransactionCurrency changes to

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS -------------------- ------------- ------------- -------------------- -------------- 119762190797406475 0 173 0 1 119762190797406478 0 868 0 1 119762190797406481 0 107 0 1 119762190797406494 0 105745 0 1 119762190797406496 0 330 0 1 119762190797406618 0 1 0 1 119762190797406628 0 1 0 1 977955748426318623 0 1 0 1

With this histogram the problem disappears, now adding a new row to currencyshareds does not cause dramatic drop in cardinality estimation.

Why is that?

I suspect this is how the coarse histogram estimation algorithm works in sql2014+, and I am basing my guess on this great post https://www.sqlshack.com/join-estimation-internals/

Coarse Histogram Estimation is a new algorithm and less documented, even in terms of general concepts. It is known that instead of aligning histograms step by step, it aligns them with only minimum and maximum histogram boundaries. This method potentially introduces less CE mistakes (not always however, because we remember that this is just a model).

Just to make everything clear - why do we have such strange ids in currencyshareds?

It's quite simple - our ids are globally unique and are based in part on timestamp (implementation based on snowflake). The most common currencies were added at the start of the application several years ago, and only those few are really used in production, that is why in histogram there are only those with "low" id.

The problem surfaced on our test environments, where some automated tests started adding test currencies, causing some queries to execute longer or to timeout...

How to fix the problem?

We'll update statistics for those reference tables (we might have a similar problem with other similar reference data tables) more often - those tables are small so updating stats is not a problem

Lessons Learned

Up to date stats are important!!!
plain old identity column would not cause these problems :)

Regarding coarse alignment: sqlperformance.com/2018/11/sql-optimizer/… — Paul White, CommentedJun 8, 2019 at 16:40

Stack Exchange Network

Cardinality estimation problem on inner join

2 Answers 2

Given

When all stats are up to date then

When stats for currencyshareds are not up to date, then

When histogram changes

Why is that?

Just to make everything clear - why do we have such strange ids in currencyshareds?

How to fix the problem?

Lessons Learned

Hot Network Questions

Cardinality estimation problem on inner join

2 Answers 2

Given

When all stats are up to date then

When stats for currencyshareds are not up to date, then

When histogram changes

Why is that?

Just to make everything clear - why do we have such strange ids in currencyshareds?

How to fix the problem?

Lessons Learned

Related

Hot Network Questions