Surrogate Key Vs Natural Key Differences and When To Use in SQL Server
Surrogate Key Vs Natural Key Differences and When To Use in SQL Server
By: Ben Snaidero | Updated: 2022-01-31 | Comments (6) | Related: More >
Database Design
Problem
If you polled any number of Microsoft SQL Server database professionals and asked the
question, "Which is better when defining a primary key, having surrogate key or natural
key column(s)?", I'd bet the answer would be very close to a 50/50 split. About the only
definitive answer you will get on the subject is most people agree that when implementing
a data warehouse, you have to use surrogate keys for your dimension and fact tables.
This is because a source OLTP relational database can change at any time due to
business requirements and your data warehouse should be able to handle these changes
without needing any updates. This tip will go through some of the pros and cons of each
type of primary key so that you can make a better decision when deciding which one to
implement in your own environments.
Solution
Before we get into the pros and cons let's first make sure we understand the difference
between a surrogate and natural key.
1/7
Natural Key Overview
A natural key is a column or set of columns that already exist in the table (e.g. they are
attributes of the entity within the data model) and uniquely identify a record in the table.
Since these columns are attributes of the entity they obviously have business meaning.
The following is an example of a table with a natural key (SSN column) along with some
sample data. Notice that the key for the data in this table has business meaning.
2/7
Key values have business meaning and can be used as a search key when
querying the table
Column(s) and primary key index already exist so no disk extra space is required for
the extra column/index that would be used by a surrogate key column
Fewer table joins since join columns have meaning. For example, this can reduce
disk IO by not having to perform extra reads on a lookup table
3/7
Extra column(s)/index for surrogate key will require extra IO when insert/update
data
Requires more table joins to child tables since data has no meaning on its own.
Can have duplicate values of natural key in table if there is no other unique
constraint defined on the natural key
Difficult to differentiate between test and production data. For example, since
surrogate key values are just auto-generated values with no business meaning it's
hard to tell if someone took production data and loaded it into a test environment.
Key value has no relation to data so technically design breaks 3NF (i.e.
normalization)
The surrogate key value can't be used as a search key
Different implementations are required based on database platform. For example,
SQL Server identity columns are implemented a little bit different than they are in
Postgres or DB2.
Summary
As mentioned above it's easy to see why this continues to be debated. Each type of key
has a similar number of pros and cons. If you read through them though you can see
how based your requirements some of the cons might not even apply in your
environment. If that's the case then it makes it much easier to decide which type of key is
the best fit for your application.
Next Steps
Read more tips on SQL Server constraints
Read other tips on data warehousing
Read more information auto generated keys in SQL Server
4/7
About the author
Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on
performance tuning.
This author pledges the content of this article is based on professional experience and
not AI generated.
5/7
Article Last Updated: 2022-01-31
I am shocked no one pointed out that you shouldn't be storing clear text SSN. Period.
Wednesday, April 18, 2018 - 8:21:47 PM - Joe Celko Back To Top (75731)
I wish more people would read Codd's original work. His definition of a surrogate key is
that it is hidden from the view of the user, and the engine uses it to build the joins or
other constructs. Think of a hash code or something, it's only used by the engine and
never exposed. Unfortunately, the SQL Server community wants to define it is
something they actually build themselves and expose. Obviously, you have to keep the
"natural" keys for data integrity, and then carry the extra burden of the exposed
surrogates. Given modern hardware and software, it's not that much trouble to use
insanely long natural keys for joins.
Well, here are a couple more very big factors. First, that most SQL Server pros, most
of the time, do use surrogate keys, most frequently an identity int or bigint, sometimes a
GUID. And that they even use this as the clustered PK more often than not.
And second, that they do this for a good reason, and that's because the CK and PK
have special uses in SQL Server, the nonclustered keys go through them, they are
used to validate FKs, and more. SQL Server does not really separate the logical and
physical implementations that well. This causes surrogates to be much more highly
used in SQL Server than might otherwise be true. I'd say also that the optimizer often
has trouble with multi-field indexes, but that's a whole separate discussion.
Monday, April 16, 2018 - 9:28:36 AM - Adel Yousuf Back To Top (75712)
Good Topic
Monday, April 16, 2018 - 3:52:24 AM - Arno Tolmeijer Back To Top (75707)
6/7
Hi Ben,
Great article, but I miss one point: due to security regulations, such as GDPR,
encryption and data masking may influence to usability of a natural key. Greetings,
Arno Tolmeijer
Monday, April 16, 2018 - 2:48:13 AM - Vinod Arvind Bhilare Back To Top (75706)
Hi ,
7/7