Usually we use constraints to specify the allowable values for the columns in tables. And it works perfect! But beyond that, constraints help query analizer to generate more effective query plans. How it works? Read below.
There is an opinion «Give MS SQL Server all information you know about data you store». What does it mean? First of all, obvious things: use the most suitable data type (use int, instead of bigint or float for example), set size of strings and binary data (nvarchar(10) instead of nvarchar(max)). And second — the constraints: if the value in a column may be in the range from 1 to 4, you should specify this as a contraint, SQL Server will thank you!
Example
Let’s create a table:
CREATE TABLE [TestTable](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](max) NOT NULL,
[Status] [int] NOT NULL
)
Add one million rows with data like this:
Id Value Status
1 value 1
2 value 2
3 value 3
4 value 1
5 value 2
6 value 3
7 value 1
8 value 2
9 value 3
10 value 1
And execute a query (with statistics output):
SET STATISTICS IO ON
SELECT tt.Status, COUNT(*)
FROM [TestTable] AS tt
WHERE tt.Status <> 1
GROUP BY tt.Status
The query is quite real — we need amount of rows in all statuses excluding 1 («Calc amount of orders in all statuses excluding Open»).
Query statistics:
Table 'TestTable'. Scan count 9, logical reads 4590, physical reads 0, read-ahead reads 4038, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Execution plan:
Plain Table Scan — server had to look through a table because it does not have indexes.
Add new index:
CREATE NONCLUSTERED INDEX [IX_TestTable_Status] ON [TestTable]
(
[Status] ASC
)
And run query again:
Table 'TestTable'. Scan count 2, logical reads 1501, physical reads 0, read-ahead reads 10, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
As expected, we see Index Seek with interesting predicate:Staus < 1
иStatus > 1
. If you remember index is implemented as a tree, and search operation on tree is quite fast. SQL Server converted inequality into two comparison, that can be processed in the most effective way. Wherein, logical reads is also decreased.
And let’s tell SQL Server, that column Status can store only values 1, 2, 3 and 4:
ALTER TABLE [TestTable] WITH CHECK ADD CONSTRAINT [CK_TestTable]
CHECK (([Status]=(4) OR [Status]=(3) OR [Status]=(2) OR [Status]=(1)))
Run query:
Table 'TestTable'. Scan count 3, logical reads 1508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Surprisingly, there is nothing about status «1» in actual execution plan. Instead of this analizer replaced<> 1
with predicateIN (2, 3, 4)
.
But we dont get any advantage in logical reads.
Here is another example with a little bit different query:
SET STATISTICS IO ON
SELECT tt.Status, COUNT(*)
FROM [TestTable] AS tt
WHERE tt.Status NOT IN (1,3)
GROUP BY tt.Status
Remove another one status. Remove constraint and run query:
Table 'TestTable'. Scan count 2, logical reads 1501, physical reads 1, read-ahead reads 1959, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Analizer converted NOT IN into a series of comparisons «less-more».
Add constraint and run query:
Table 'TestTable'. Scan count 2, logical reads 757, physical reads 0, read-ahead reads 12, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Analizer knows that column can store only values from set {1,2,3,4}, so predicates NOT IN (1, 3)
and IN (2, 4)
are equivalent. But logical read is decreased in 2 times!
Result by logical reads
Predicate | No index, no constraint | With index | With index and constraint |
---|---|---|---|
<> 1 |
4590 | 1501 | 1508 |
NOT IN (1,3) |
4590 | 1501 | 757 |