0

I am working with SQL Server 2019 and have a TVF that retrieves products based on a list of categories. However, I noticed that when I use SELECT *, the query results in a table scan, whereas selecting specific indexed columns results in an index seek, leading to better performance.

I need to keep using SELECT * because this function is used in multiple areas where different columns are needed. Additionally, forcing an index is not an option because the results might be sorted later outside the TVF, and forcing an index could degrade performance in such cases.

Reproducible example

I have a Products table with an index on CategoryId:

CREATE TABLE Products ( ProductId INT PRIMARY KEY, Name NVARCHAR(100), CategoryId CHAR(1), Price DECIMAL(10,2), CreatedAt DATETIME ); CREATE INDEX IX_Products_CategoryId ON Products (CategoryId); 

I also have a table type to pass category filters:

CREATE TYPE Categories AS TABLE ( CategoryId CHAR(1) PRIMARY KEY ); 

And my TVF query is:

DECLARE @Categories Categories; INSERT INTO @Categories (CategoryId) VALUES ('B'), ('C'); SELECT TOP 10 * FROM Products WHERE CategoryId IN (SELECT CategoryId FROM @Categories); 

Observations

  • If I select specific columns covered by the index, I get an Index Seek (good performance):
SELECT TOP 10 ProductId, CategoryId FROM Products WHERE CategoryId IN (SELECT CategoryId FROM @Categories); 
  • If I do SELECT * (all columns), I get a table scan (bad performance):
SELECT TOP 10 * FROM Products WHERE CategoryId IN (SELECT CategoryId FROM @Categories); 

Constraints & requirements

  1. I cannot avoid SELECT * because different usages of this TVF require different columns
  2. I cannot force an index because it may negatively impact performance when sorting is applied outside the TVF.
  3. Sorting the data inside the TVF is not an option since sorting is handled by the caller
  4. The function must remain a TVF as it is used in multiple queries with additional filters applied

Questions

  • How can I modify the query or structure to ensure index seek behavior while keeping SELECT *?

  • Are there any best practices for handling this inside a TVF without forcing an index?

Execution Plan

Attaching execution plan (generated with small amount of data):

SELECT *:

enter image description here

SELECT CategoryId:

enter image description here

Any insights or alternative approaches would be greatly appreciated!

9
  • 1
    So, what changes do you want to apply if no changes are allowed? SQL Server will use whatever QP is applicable, the only changes you could try is perhaps either change to INNER / LOOP JOIN or OPTION(RECOMPILE). Seek might not be correct and slower anyway, because for categories with a lot of rows, seeking might be slow. Without real query plans, i'm gonna vote on closeCommentedMar 10 at 12:44
  • 3
    "I need to keep using SELECT * because this function is used in multiple areas where different columns are needed." - so maybe those multiple areas should have dedicated TVFs that fulfil their needs, instead of having a swiss-army-knife trying to satisfy too many competing requirements.CommentedMar 10 at 14:02
  • 1
    Please show the full TVF definition, as well as any queries using it. Is the TVF an inline TVF or multi-statement? Also "I cannot force an index because it may negatively impact performance when sorting is applied outside the TVF" why would you think that? If forcing the index will get these 10 rows fast then sorting 10 rows on the outside will make little impact. "Sorting the data inside the TVF is not an option since sorting is handled by the caller." doesn't make sense, how is sorting relevant at all here?CommentedMar 10 at 14:10
  • 4
    Having TOP 10 without any ORDER BY to define "what 10" is extremely questionableCommentedMar 10 at 14:12
  • 1
    @Charlieface i'm guessing what OP means is that if one puts a query hint or loop join in the function, it will not be properly "unwrapped" if rest of selects make some other selection where the index hint doesn't make sense. Basically, i think OP is asking: how can i make my queries always take the best choicesCommentedMar 10 at 16:59

2 Answers 2

1

This is a normal behaviour. Look at the cost of the plan and, for both queries, you can compare in forcing the other alternative...

SELECT TOP 10 ProductId, CategoryId FROM Products WITH(INDEX(???)) --> put then PK index name in place of ??? WHERE CategoryId IN (SELECT CategoryId FROM @Categories); 

and :

SELECT TOP 10 * FROM Products WITH(INDEX(IX_Products_CategoryId )) WHERE CategoryId IN (SELECT CategoryId FROM @Categories); 

Compare the cost and you will see that MS Query Engine do a good job because if the index has not all the columns into the definition, a join is required between the index and the table to get all extra columns... This cost a lot and table scan is often a good choice to avoid this join

4
  • For SELECT *, I get table scan which takes few seconds (with million of rows) however when I force the index use I get the expected behavior - Index Seek + Key Lookup which take milliseconds to complete. I would expect it to do it by default not sure what I'm missing in this scenario
    – Ron537
    CommentedMar 10 at 13:32
  • 1
    Perhaps statsistics are not accurate. Execute an UPDATE STATISTICS ... WITH FULLSCAN on the "Products " table. Another thing is to use a temporary table with the CategoryId AS PK instaed of table variable...
    – SQLpro
    CommentedMar 10 at 13:53
  • 1
    @Ron537: Key Lookup is a very expensive operation - so if you're doing this on a large number of rows, it will be awfully slow. The SQL Server query optimizer is quite smart that way and will opt to use a table scan instead, since for a large number of rows (not your "dummy" TOP 10 rows....) and requesting ALL columns, this will ultimately be significantly faster
    – marc_s
    CommentedMar 10 at 16:00
  • @marc_s Arguably the optimizer is overly cautious about key lookups. If the table is in the buffer pool already, or stored on fast SSDs, then the tipover point should be much higher than it is.CommentedMar 10 at 16:33
0

Interestingly, if the query is written in this form, there should be an index scan, then a Key Lookup(Index seek) and Join.
If the selection is IN(...) it gives a small number of rows in the output(for example - 10 rows), it should be better this way without hints.
Key Lookup for 10 rows - an inexpensive operation, no matter how the rows are scattered. Maximum of 10 readings))

select p.* from( SELECT TOP 10 ProductId, CategoryId FROM Products p2 WHERE CategoryId IN (SELECT CategoryId FROM @Categories) )filt inner join Products p on p.ProductId=filt.ProductId and p.CategoryId=filt.CategoryId ; 

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.