0% found this document useful (0 votes)
66 views

Surrogate Key Vs Natural Key Differences and When To Use in SQL Server

The document discusses the differences between surrogate keys and natural keys in SQL Server and considerations for when to use each type. It provides an overview of surrogate keys, which are system-generated values without business meaning, and natural keys, which are columns that already exist in the table and have business meaning. The document then lists pros and cons of each type of key, such as surrogate keys being less prone to changes in business requirements but requiring more storage and joins. It concludes that the best approach depends on one's specific requirements, as each key type has similar numbers of advantages and disadvantages.

Uploaded by

elliottjs1091
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Surrogate Key Vs Natural Key Differences and When To Use in SQL Server

The document discusses the differences between surrogate keys and natural keys in SQL Server and considerations for when to use each type. It provides an overview of surrogate keys, which are system-generated values without business meaning, and natural keys, which are columns that already exist in the table and have business meaning. The document then lists pros and cons of each type of key, such as surrogate keys being less prone to changes in business requirements but requiring more storage and joins. It concludes that the best approach depends on one's specific requirements, as each key type has similar numbers of advantages and disadvantages.

Uploaded by

elliottjs1091
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Surrogate Key vs Natural Key Differences and When to

Use in SQL Server


mssqltips.com/sqlservertip/5431/surrogate-key-vs-natural-key-differences-and-when-to-use-in-sql-server/

By: Ben Snaidero | Updated: 2022-01-31 | Comments (6) | Related: More >
Database Design

Problem
If you polled any number of Microsoft SQL Server database professionals and asked the
question, "Which is better when defining a primary key, having surrogate key or natural
key column(s)?", I'd bet the answer would be very close to a 50/50 split. About the only
definitive answer you will get on the subject is most people agree that when implementing
a data warehouse, you have to use surrogate keys for your dimension and fact tables.
This is because a source OLTP relational database can change at any time due to
business requirements and your data warehouse should be able to handle these changes
without needing any updates. This tip will go through some of the pros and cons of each
type of primary key so that you can make a better decision when deciding which one to
implement in your own environments.

Solution

Before we get into the pros and cons let's first make sure we understand the difference
between a surrogate and natural key.

Surrogate Key Overview


A surrogate key is a system generated (could be GUID, sequence, unique identifier, etc.)
value with no business meaning that is used to uniquely identify a record in a table. The
key itself could be made up of one or multiple columns (i.e. Composite Key). The
following diagram shows an example of a table with a surrogate key (AddressID column)
along with some sample data. Notice the key itself has no business meaning, it's just a
sequential integer serving as a unique key.

1/7
Natural Key Overview
A natural key is a column or set of columns that already exist in the table (e.g. they are
attributes of the entity within the data model) and uniquely identify a record in the table.
Since these columns are attributes of the entity they obviously have business meaning.
The following is an example of a table with a natural key (SSN column) along with some
sample data. Notice that the key for the data in this table has business meaning.

Natural Key vs. Surrogate Key for Database Design


Since this topic has been debated for years with no definitive answer as to which is better,
I thought with this tutorial I would put together a list of all the pros and cons of each type
of key. This list can then be used as a reference when deciding what type of key would
be best suited for your own environment/application. After all, everyone's requirements
are different. What works or performs well in one application might not work so well in
another.

Natural Key Pros

2/7
Key values have business meaning and can be used as a search key when
querying the table
Column(s) and primary key index already exist so no disk extra space is required for
the extra column/index that would be used by a surrogate key column
Fewer table joins since join columns have meaning. For example, this can reduce
disk IO by not having to perform extra reads on a lookup table

Natural Key Cons


May need to change/rework key if business requirements change. For example, if
you used SSN for your employee as in the example above and your company
expands outside of the United States not all employees would have a SSN so you
would have to come up with a new key for your database tables.
More difficult to maintain if key requires multiple columns. It's much easier from the
application side dealing with a key column that is constructed with just a single
column.
Poorer performance since key value is usually larger and/or is made up of multiple
columns. Larger keys will require more IO both when inserting/updating data as
well as when you query.
Can't enter record until key value is known. It's sometimes beneficial for an
application to load a placeholder record in one table then load other tables and then
come back and update the main table.
Can sometimes be difficult to pick a good key. There might be multiple candidate
keys each with their own trade-offs when it comes to design and/or performance.

Surrogate Key Pros


No business logic in key so no changes based on business requirements. For
example, if the Employee table above used a integer surrogate key you could
simply add a separate column for SIN if you added an office in Canada (to be used
in place of a Social Security Number)
Less code if maintaining same key strategy across all entities. For example,
application code can be reused when referencing primary keys if they are all
implemented as a sequential integer.
Better performance since key value is smaller. Less disk IO is required on when
accessing single column indexes from an optimization perspective.
Surrogate key is guaranteed to be unique. For example, when moving data
between test systems you don't have to worry about duplicate keys since new key
will be generated as data is inserted.
If a sequence used then there is little index maintenance required since the value is
ever increasing which leads to less index fragmentation.

Surrogate Key Cons


Extra column(s)/index for surrogate key will require extra disk space

3/7
Extra column(s)/index for surrogate key will require extra IO when insert/update
data
Requires more table joins to child tables since data has no meaning on its own.
Can have duplicate values of natural key in table if there is no other unique
constraint defined on the natural key
Difficult to differentiate between test and production data. For example, since
surrogate key values are just auto-generated values with no business meaning it's
hard to tell if someone took production data and loaded it into a test environment.
Key value has no relation to data so technically design breaks 3NF (i.e.
normalization)
The surrogate key value can't be used as a search key
Different implementations are required based on database platform. For example,
SQL Server identity columns are implemented a little bit different than they are in
Postgres or DB2.

Summary
As mentioned above it's easy to see why this continues to be debated. Each type of key
has a similar number of pros and cons. If you read through them though you can see
how based your requirements some of the cons might not even apply in your
environment. If that's the case then it makes it much easier to decide which type of key is
the best fit for your application.

Next Steps
Read more tips on SQL Server constraints
Read other tips on data warehousing
Read more information auto generated keys in SQL Server

4/7
About the author

Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on
performance tuning.

This author pledges the content of this article is based on professional experience and
not AI generated.

View all my tips

5/7
Article Last Updated: 2022-01-31

Comments For This Article

Friday, October 9, 2020 - 10:49:14 AM - Fred smith Back To Top (86625)

I am shocked no one pointed out that you shouldn't be storing clear text SSN. Period.

Wednesday, April 18, 2018 - 8:21:47 PM - Joe Celko Back To Top (75731)

I wish more people would read Codd's original work. His definition of a surrogate key is
that it is hidden from the view of the user, and the engine uses it to build the joins or
other constructs. Think of a hash code or something, it's only used by the engine and
never exposed. Unfortunately, the SQL Server community wants to define it is
something they actually build themselves and expose. Obviously, you have to keep the
"natural" keys for data integrity, and then carry the extra burden of the exposed
surrogates. Given modern hardware and software, it's not that much trouble to use
insanely long natural keys for joins.

Monday, April 16, 2018 - 12:19:57 PM - JRStern Back To Top (75714)

Well, here are a couple more very big factors. First, that most SQL Server pros, most
of the time, do use surrogate keys, most frequently an identity int or bigint, sometimes a
GUID. And that they even use this as the clustered PK more often than not.
And second, that they do this for a good reason, and that's because the CK and PK
have special uses in SQL Server, the nonclustered keys go through them, they are
used to validate FKs, and more. SQL Server does not really separate the logical and
physical implementations that well. This causes surrogates to be much more highly
used in SQL Server than might otherwise be true. I'd say also that the optimizer often
has trouble with multi-field indexes, but that's a whole separate discussion.

Monday, April 16, 2018 - 9:28:36 AM - Adel Yousuf Back To Top (75712)

Good Topic

Monday, April 16, 2018 - 3:52:24 AM - Arno Tolmeijer Back To Top (75707)

6/7
Hi Ben,

Great article, but I miss one point: due to security regulations, such as GDPR,
encryption and data masking may influence to usability of a natural key. Greetings,
Arno Tolmeijer

Monday, April 16, 2018 - 2:48:13 AM - Vinod Arvind Bhilare Back To Top (75706)

Hi ,

It help us alot for me to improve my SQL knowledge

7/7

You might also like