SQL Server in Memory Oltp
SQL Server in Memory Oltp
By Kalen Delaney
SQL Server Internals:
In-Memory OLTP
Inside the SQL Server 2016 Hekaton Engine
By Kalen Delaney
Introduction 16
Memory-optimized tables 22
Performance 29
Summary 33
Additional Resources 34
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables 35
Creating Databases 35
Creating Tables 38
Durability 39
Interpreted T-SQL 46
Summary 52
Additional Resources 53
Row header 55
Payload area 56
Processing phase 59
Validation phase 62
Post-processing 62
Summary 64
Additional Resources 64
Hash indexes 67
Row organization 68
Range indexes 73
The Bw-tree 74
Columnstore Indexes 86
Summary 97
Additional Resources 98
Post-processing 123
Summary 126
Checkpoint 135
The threads and tasks comprising the full checkpoint process 149
Recovery 155
Summary 156
Summary 174
Summary 200
It is my great pleasure to write a foreword to Kalen's book, SQL Server Internals: In-Memory
OLTP, Inside the SQL Server 2016 Hekaton Engine. Kalen has been working with SQL
Server for over 29 years, and even after three decades of working with the SQL Server
technology, she is still very excited to teach, write, and share her knowledge about SQL
Server. She is one of the best-respected technology writers, and her Twitter handle, @
sqlqueen, says it best. I remember when I first started developing for the SQL Server engine,
many years ago, as an intern at Microsoft, I complemented all of our internal documentation
with her Inside SQL Server book. Who knows, maybe if not for that book, I might have not
excelled at my internship, which eventually led to an incredibly rewarding career working
inside the SQL Server engine. Thank you, Kalen!
This book, SQL Server Internals: In-Memory OLTP is a true gem for modern data devel-
opers and data professionals. It contains an unprecedented level of details about the
in-memory technology of SQL Server. In my many conversations with customers, I often
hear a confusion between the terms "In-Memory" and "Memory-Optimized." Many think
that they are one and the same. If you continue reading this book, you will realize the
distinction. In-Memory OLTP is a game changer for relational databases, and especially for
OLTP systems. Processors are not getting dramatically faster, but the number of cores and
the amount of memory is increasing drastically. Machines with terabytes of memory are
becoming a commodity. Over the last 30 years, memory prices have dropped by a factor of
10 every 5 years. Both core counts and memory sizes are increasing at an accelerated pace.
The majority of OLTP databases fit entirely in 1 TB and even the largest OLTP databases
can keep the active working set in memory. A technology that takes advantage of this ever-
changing hardware landscape is Microsoft's in-memory OLTP.
To understand this better, let us travel back in time a few years to when the sizes of OLTP
databases were much larger than the memory available on the server. For example, your
OLTP database could be 500 GB while your SQL Server has 128 GB of memory. We all
know the familiar strategy to address this, by storing data and indexes in smaller chunks,
xii
or pages. SQL Server supports 8 K pages. These pages are read and written in and out of
memory using sophisticated heuristics implemented as part of the buffer pool in SQL Server.
When running a query, if the page containing the requested row(s) is not found in the buffer
pool, an explicit physical I/O is incurred to bring it into memory. This explicit physical I/O
can significantly reduce query performance. Today, you can get around this issue by buying a
server machine with terabytes of physical memory and keeping your entire 500 GB database
in memory, effectively removing any bottleneck due to I/O. However, the more important
question to be asked is: "Is your database optimized for being in-memory?" This book will
teach you how to do it.
Another aspect to consider is locking and latching, and its impact on performance. When you
query your data, SQL Server loads data pages in-memory and keeps them there until it needs
more memory for something else. But traditional tables are still optimized for disk access,
a slow medium. As such, it has a variety of bottlenecks. A big issue is the contention due to
the different locking mechanisms. Each time a transaction reads data, it acquires a read lock
on that data. When another transaction wants to write on the same data, it must acquire a
write-lock and therefore wait for the first transaction to complete, since you can't write while
data is being read. SQL Server also implements latches and spinlocks at different granularity
levels. All those locking mechanisms take time to manage and, moreover, they again jeop-
ardize the performance. From the ground up, in-memory OLTP is designed for high levels
of concurrency. Any thread can access any row in a table without acquiring latches or locks.
The engine uses latch-free and lock-free data structures to avoid physical interference among
threads and a new optimistic, multi-version concurrency control technique to avoid interfer-
ence among transactions using a state-of-the-art lock- and latch-free implementation.
This book describes all of the key aspects of the in-memory OLTP technology that can help
improve the performance of your transactional workloads, including:
• new data structures and data access methods built around the assumption that the
active working set resides in memory
• lock- and latch-free implementation that provides high scalability
• native compilation of T-SQL for more efficient transaction processing.
Announced several years ago, in-memory OLTP has been implemented in production by
numerous companies. Whether you need to support a high data insert rate for system telem-
etry or smart metering, or get high read performance at scale for social network browsing,
or do compute-heavy data processing for manufacturing or retail supply chains, or achieve
xiii
low latency for online gaming platforms or capital markets, or do session management for
heavily-visited websites, in-memory OLTP can help you. One of the best things about this
technology is that it is not all or nothing. The in-memory OLTP engine is integrated into SQL
Server; it is not a separate DBMS. Even if much of your processing is not OLTP, even if your
total system memory is nowhere near the terabyte range, you can still choose one or more
critical tables to migrate to the new in-memory structures. You can also choose a frequently-
run stored procedure to recreate as a natively compiled procedure. And you can get measur-
able performance improvements as a result!
Acknowledgements
First of all, I would like to thank Kevin Liu, formerly with Microsoft, who brought me
onboard with the Hekaton project at the end of 2012, with the goal of providing in-depth
white papers describing this exciting new technology. Under Kevin's guidance, I wrote two
white papers, which were published near the release date at each of the CTPs for SQL Server
2014. As the paper got longer with each release, a new white paper for the final released
product would be as long as a book. So, with Kevin's encouragement, it became the first
edition of this book. For the SQL Server 2016 version of in-memory OLTP, Marko Hotti
brought me onboard to write two white papers, including one for the final version of SQL
Server 2016. This book expands upon that second white paper.
xiv
My main mentors for this second edition, based on the changes in SQL Server 2016, were
Sunil Agarwal and Jos de Bruijn. I know they thought my questions would never stop, but
they kept answering them anyway. I am deeply indebted to both of them.
I would also like to thank my devoted reviewers and question answerers at Microsoft,
without whom this work would have taken much longer: in addition to Sunil and Jos,
Kevin Farlee was always willing and able to answer in-depth questions about storage, and
Denzil Ribeiro and Alex Budovski were always quick to respond and were very thorough
in answering my sometimes seemingly-endless questions. Thank you for all your assistance
and support. And THANK YOU to the entire SQL Server team at Microsoft for giving us this
incredible technology!
xv
Introduction
The original design of the SQL Server engine, as well as that of most other RDBMS products
of the time, assumed that main memory was very limited, and so data needed to reside on
disk except when it was actually needed for processing. However, over the past thirty years,
the sustained fulfillment of Moore's Law, predicting that computing power will double year
on year, has rendered this assumption largely invalid.
Moore's law has had a dramatic impact on the availability and affordability of both large
amounts of memory and multiple-core processing power. Today one can buy a server with
32 cores and 1 TB of memory for under $30K. Looking further ahead, it's entirely possible
that in a few years we'll be able to build distributed DRAM-based systems with capacities of
1–10 Petabytes at a cost of less than $5/GB. It is also only a question of time before non-
volatile RAM becomes viable as main-memory storage.
At the same time, the near-ubiquity of 64-bit architectures removes the previous 4 GB limit
on "addressable" memory and means that SQL Server has, in theory, near-limitless amounts
of memory at its disposal. This has helped to significantly drive down latency time for read
operations, simply because we can fit so much more data in memory. For example, many, if
not most, of the OLTP databases in production can fit entirely in 1 TB. Even for the largest
financial, online retail and airline reservation systems, with databases between 500 GB and
5 TB in size, the performance-sensitive working dataset, i.e. the "hot" data pages, is signifi-
cantly smaller and could reside entirely in memory.
However, the fact remains that the traditional SQL Server engine is optimized for disk-
based storage, for reading specific 8 KB data pages into memory for processing, and writing
specific 8 KB data pages back out to disk after data modification, having first "hardened" the
changes to disk in the transaction log. Reading and writing 8 KB data pages from and to disk
can generate a lot of random I/O and incurs a higher latency cost.
In fact, given the amount of data we can fit in memory, and the high number of cores avail-
able to process it, the end result has been that most current SQL Server systems are I/O
bound. In other words, the I/O subsystem struggles to "keep up," and many organizations
sink huge sums of money into the hardware that they hope will improve write latency. Even
when the data is in the buffer cache, SQL Server is architected to assume that it is not, which
leads to inefficient CPU usage, with latching and spinlocks.
16
Assuming all, or most, of the data will need to be read from disk also leads to unrealistic cost
estimations for the possible query plans and a potential for not being able to determine which
plans will really perform best.
As a result of these trends, and the limitations of traditional disk-based storage structures,
the SQL Server team at Microsoft began building a database engine optimized for large main
memories and many-core CPUs, driven by the recognition that systems designed for a partic-
ular class of workload can frequently outperform more general purpose systems by a factor of
ten or more. Most specialized systems, including those for Complex Event Processing (CEP),
Data Warehousing and Business Intelligence (DW/BI), and Online Transaction Processing
(OLTP), optimize data structures and algorithms by focusing on in-memory structures.
The team set about building a specialized database engine specifically for in-memory work-
loads, which could be tuned just for those workloads. The original concept was proposed at
the end of 2008, envisioning a relational database engine that was 100 times faster than the
existing SQL Server engine. In fact, the codename for this feature, Hekaton, comes from the
Greek word hekaton (ἑκατόν) meaning 100.
Serious planning and design began in 2010, and product development began in 2011. At that
time, the team did not know whether the current SQL Server product could support this new
concept, and the original vision was that it might be a separate product. Fortunately, it soon
became evident that it would be possible to incorporate the "in-memory" processing engine
into SQL Server itself. The team then established four main goals as the foundation for
further design and planning:
1. Optimized for data that was stored completely in-memory but was also durable on
SQL Server restarts.
2. Fully integrated into the existing SQL Server engine.
3. Very high performance for OLTP operations.
4. Architected for modern CPUs (e.g. use of complex atomic instructions).
SQL Server In-Memory OLTP, formerly known and loved as Hekaton, meets all of these
goals, and in this book, you will learn how it meets them. The focus will be on the features
that allow high performance for OLTP operations. As well as eliminating read latency, since
the data will always be in memory, fundamental changes to the memory-optimized versions
of tables and indexes, as well as changes to the logging mechanism, mean that in-memory
OLTP also offers greatly reduced latency when writing to disk.
17
The first four chapters of the book offer a basic overview of how the technology works
(Chapter 1), how to create in-memory databases and tables (Chapter 2), the basics of row
versioning and the new multi-version concurrency control model (Chapter 3), how memory-
optimized tables and their indexes store data, and how columnstore indexes, available for
memory-optimized tables as of SQL Server 2016, allow you to perform efficient OLTP
operations, as well as run analytic queries, on your in-memory data (Chapter 4).
Chapters in the latter half of the book focus on how the new in-memory engine delivers the
required performance boost, while still ensuring transactional consistency (ACID compli-
ance). In order to deliver on performance, the SQL Server team realized they had to address
some significant performance bottlenecks. Two major bottlenecks were the traditional
locking and latching mechanisms: if the new in-memory OTLP engine retained these mecha-
nisms, with the waiting and possible blocking that they could cause, it could negate much
of the benefit inherent in the vastly increased speed of in-memory processing. Instead, SQL
Server In-Memory OLTP delivers a completely lock- and latch-free system, and true opti-
mistic multi-version concurrency control (Chapter 5).
Other potential bottlenecks were the existing CHECKPOINT and transaction logging
processes. The need to write to durable storage still exists for in-memory tables, but in SQL
Server In-Memory OLTP these processes are adapted to be much more efficient, in order to
prevent them becoming performance limiting, especially given the potential to support vastly
increased workloads (Chapter 6).
The final bottleneck derives from the fact that the SQL Server query processor is essentially
an interpreter; it re-processes statements continually, at runtime. It is not a true compiler. Of
course, this is not a major performance concern, when the cost of physically reading data
pages into memory from disk dwarfs the cost of query interpretation. However, once there
is no cost of reading pages, the difference in efficiency between interpreting queries and
running compiled queries can be enormous. Consequently, the new SQL Server In-Memory
OLTP engine component provides the ability to create natively compiled procedures, i.e.
machine code, for our most commonly executed data processing operations (Chapter 7).
Finally, we turn our attention to tools for managing SQL Server In-Memory OLTP structures,
for monitoring and tuning performance, and considerations for migrating existing OLTP
workloads over to in-memory (Chapter 8).
18
Intended Audience and Prerequisites
This book is for anyone using SQL Server as a programmer or as an administrator who
wants to understand how the new Hekaton engine works behind the scenes. It is specifically
a book about Hekaton internals, focusing on details of memory-optimized tables and
indexes, how the in-memory engine delivers transactional consistency (ACID compliance)
without locking or latching, and the mechanics of its checkpointing, logging and garbage
collection mechanisms.
SQL Server In-Memory OLTP is a new technology and this is not a book specifically on
performance tuning and best practices. However, as you learn about how the Hekaton engine
works internally to process your queries, certain best practices, and opportunities for perfor-
mance tuning will become obvious.
This book does not assume that you're a SQL Server expert, but I do expect that you have
basic technical competency and familiarity with the standard SQL Server engine, and relative
fluency with basic SQL statements.
You should have access to a SQL Server 2016 installation, even if it is the Evaluation edition
available free from Microsoft. In addition, SQL Server Developer Edition, which doesn't
have an expiration date, is also available free of charge. Downloads are available from this
link: http://preview.tinyurl.com/lea3ep8. As of SQL Server 2016, Service Pack 1, in-memory
OLTP is available in all editions of SQL Server.
19
1: What's Special About
In-Memory OLTP?
SQL Server 2016's in-memory OLTP feature provides a suite of technologies for working
with memory-optimized tables, in addition to the disk-based tables which SQL Server has
always provided.
The SQL Server team designed the in-memory OLTP engine to be transparently accessible
through familiar interfaces such as T-SQL and SQL Server Management Studio (SSMS).
Therefore, during most data processing operations, users may be unaware that they are
working with memory-optimized tables rather than disk-based ones.
However, SQL Server works with the data very differently if it is stored in memory-opti-
mized tables. This chapter describes, at a high level, some of the fundamental differences
between data storage structures and data operations, when working with memory-optimized,
rather than standard disk-based tables and indexes.
It will also discuss SQL Server In-Memory OLTP in the context of similar, competing
memory-optimized database solutions, and explain why the former is different.
20
1: What's Special About In-Memory OLTP?
on these structures during reading or writing, so it can allow concurrent access without
blocking. Also, logging changes to memory-optimized tables is usually much more efficient
than logging changes to disk-based tables.
Figure 1-1: The SQL Server engine including the in-memory OLTP components.
21
1: What's Special About In-Memory OLTP?
On the left side of Figure 1-1 we have the memory-optimized tables and indexes, added as
part of in-memory OLTP, and on the right we see the disk-based tables, which use the data
structures that SQL Server has always used, and which require writing and reading 8 KB data
pages, as a unit, to and from disk.
In-memory OLTP also supports natively compiled stored procedures, an object type that is
compiled to machine code by a new in-memory OLTP compiler and which has the potential
to offer a further performance boost beyond that available solely from the use of memory-
optimized tables. The standard counterpart is interpreted T-SQL stored procedures, which is
what SQL Server has always used. Natively compiled stored procedures can reference only
memory-optimized tables.
The Query Interop component allows interpreted T-SQL to reference memory-optimized
tables. If a transaction can reference both memory-optimized tables and disk-based tables, we
refer to it as a cross-container transaction.
Notice that the client application uses the same TDS Handler (Tabular Data Stream, the
underlying networking protocol that is used to communicate with SQL Server), regardless
of whether it is accessing memory-optimized tables or disk-based tables, or calling natively
compiled stored procedures or interpreted T-SQL.
Memory-optimized tables
This section takes a broad look at three of the key differences between memory-optimized
tables and their disk-based counterparts; subsequent chapters will fill in the details.
22
1: What's Special About In-Memory OLTP?
As user processes modify in-memory data, SQL Server still needs to perform some disk I/O
for any table that we wish to be durable, in other words where we wish a table to retain the
in-memory data in the event of a server crash or restart. We'll return to this a little later in this
chapter, in the Data durability and recovery section.
23
1: What's Special About In-Memory OLTP?
As this implies, many versions of the same row can coexist at any given time. This allows
concurrent access of the same row, during data modifications, with SQL Server displaying the
row version relevant to each transaction according to the time the transaction started relative
to the timestamps of the row version. This is the essence of the new multi-version concur-
rency control (MVCC) mechanism for in-memory tables, which I'll describe in a little more
detail later in the chapter.
24
1: What's Special About In-Memory OLTP?
However, there are limitations on the T-SQL language constructs that are allowed inside a
natively compiled stored procedure, compared to the rich feature set available with inter-
preted code. In addition, natively compiled stored procedures can only access memory-opti-
mized tables and cannot reference disk-based tables. Chapter 7 discusses natively compiled
stored modules in detail.
25
1: What's Special About In-Memory OLTP?
26
1: What's Special About In-Memory OLTP?
A nonclustered range index, useful for retrieving ranges of values, is more like the sort of
index we're familiar with when working with disk-based tables. However, again, the structure
is different. The memory-optimized counterparts use a special Bw-tree storage structure.
A Bw-tree is similar to a disk-based B-tree index in that it has index pages organized into
a root page, a leaf level, and possibly intermediate-level pages. However, the pages of a
Bw-tree are very different structures from their disk-based counterparts. The pages can be
of varying sizes, and the pages themselves are never modified; new pages are created when
necessary, when the underlying rows are modified.
Columnstore indexes, which were added to the product in SQL Server 2012, are available
with memory-optimized tables starting in SQL Server 2016. These indexes allow you to
perform analytics, efficiently, on the data that is stored in memory. As the name "in-memory
OLTP" implies, memory-optimized tables are optimized for transaction processing, but by
adding columnstore indexes, we can also get good performance with analytical operations,
such as reports which need to process and summarize all the rows in the table. Columnstore
indexes are described in detail in Chapter 4.
SQL Server In-Memory OLTP also continuously persists the table data to disk in special
checkpoint files. It uses these files only for database recovery, and only ever writes to them
"offline," using a background thread. Therefore, when we create a database that will use
27
1: What's Special About In-Memory OLTP?
memory-optimized data structures, we must create, not only the data file (used only for disk-
based table storage) and the log file, but also a special MEMORY_OPTIMIZED_DATA file-
group that will contain the checkpoint files. There are four types of checkpoint files: DATA
files and DELTA files exist as pairs, and so are frequently referred to as checkpoint file pairs.
In SQL Server 2016, we also have ROOT files and LARGE DATA files. We'll see more on
these checkpoint files in Chapter 6.
These checkpoint files are append-only and SQL Server writes to them strictly sequentially,
in the order of the transactions in the transaction log, to minimize the I/O cost. In case of a
system crash or server shutdown, SQL Server can recreate the rows of data in the memory-
optimized tables from the checkpoint files and the transaction log.
When we insert a data row into a memory-optimized table, the background thread (called the
offline checkpoint thread) will, at some point, append the inserted row to the corresponding
DATA checkpoint file. Likewise, when we delete a row, the thread will append a reference to
the deleted row to the corresponding DELTA checkpoint file. So, a "deleted" row remains in
the DATA file but the corresponding DELTA file records the fact that it was deleted. As the
checkpoint files grow, SQL Server will at some point merge them, so that rows marked as
deleted actually get deleted from the DATA checkpoint file, and create a new file pair. Again,
further details of how all this works, as well as details about the other types of checkpoint
files, can be found in Chapter 6.
In-memory OLTP does provide the option to create a table that is non-durable, using an
option called SCHEMA_ONLY. As the option indicates, SQL Server will log the table
creation, so the table schema will be durable, but will not log any data manipulation language
(DML) on the table, so the data will not be durable. These tables do not require any I/O
operations during transaction processing, but the data is only available in memory while SQL
Server is running. These non-durable tables could be useful in certain cases, for example as
staging tables in ETL scenarios or for storing web server session state.
In the event of a SQL Server shutdown, or an AlwaysOn Availability Group fail-over,
the data in these non-durable tables is lost. When SQL Server runs recovery on the database,
it will recreate the tables but without the data. Although the data is not durable, operations
on these tables meet all the other transactional requirements; they are atomic, isolated,
and consistent.
We'll see how to create both durable and non-durable tables in Chapter 2.
28
1: What's Special About In-Memory OLTP?
Performance
The special data structures for rows and indexes, the elimination of locks and latches, and the
ability to create natively compiled stored procedures and functions, all allow for incredible
performance when working with memory-optimized tables. In the Microsoft lab, one partic-
ular customer achieved 1 million batch requests/sec, with 4 KB of data read and written with
each batch, without maxing out the CPU.
Although that workload used SCHEMA_ONLY tables, durable (SCHEMA_AND_DATA) tables
can also get impressive results. Lab results repeatedly showed a sustained ingestion of 10M
rows/second, with an average of 100 bytes per row. A representative order processing work-
load showed 260 K transactions per second, with 1 GB/sec of log generation.
Both the SCHEMA_ONLY and SCHEMA_AND_DATA tables were created on 4-socket servers
with a total of 72 physical cores.
SQL Server 2014 was consistently showing in-memory OLTP applications achieving a
30–40x improvement, measured mainly with batch requests processed per second. In SQL
Server 2016, the visionary improvement of a 100-fold increase has been achieved by one
of the earliest adopters of SQL Server In-Memory OLTP. A process running on SQL Server
2012 had measured overall throughput of 12,000 batch requests per second, with the main
bottleneck being latch contention. The same process running on SQL Server 2016, taking
advantage of memory-optimized tables with large object (LOB) support, and natively
compiled procedures, showed 1.2 million batch requests per second. The bottleneck in that
case was CPU.
The designers of the in-memory OLTP technology in SQL Server were so sure of the
performance gains that would be realized, that the component name "xtp," which stands for
"eXtreme Transaction Processing," is used in most of the Hekaton metadata object names.
29
1: What's Special About In-Memory OLTP?
30
1: What's Special About In-Memory OLTP?
31
1: What's Special About In-Memory OLTP?
32
1: What's Special About In-Memory OLTP?
Summary
This chapter took a first, broad-brush look at the new SQL Server In-Memory OLTP engine.
Memory-optimized data structures are entirely resident in memory, so user processes will
always find the data they need by traversing these structures in memory, without the need for
disk I/O. Furthermore, the new MVCC model means that SQL Server can mediate concur-
rent access of these data structures, and ensure ACID transaction properties, without the use
of any locks and latches; no user transactions against memory-optimized data structures will
ever be forced to wait to acquire a lock!
Natively compiled stored procedures, as well as table-valued user-defined functions, provide
highly efficient data access to these data structures, offering a further performance boost.
Even the logging mechanisms for memory-optimized tables, to ensure transaction durability,
are far more efficient than for standard disk-based tables.
In-memory OLTP for SQL Server 2016 removes some of the major "adoption blockers" for
some prospective users of the first release, in SQL Server 2014. Among other enhancements,
the new version supports foreign keys and other constraints and allows ALTER operations to
be performed on your memory-optimized tables and indexes.
Combined, all these features make the use of SQL Server In-Memory OLTP a very attractive
proposition for many OLTP workloads. Of course, as ever, it is no silver bullet. While it can
and will offer substantial performance improvements to many applications, its use requires
careful planning, and almost certainly some redesign of existing tables and procedures, as
we'll discuss as we progress deeper into this book.
33
1: What's Special About In-Memory OLTP?
Additional Resources
As well as bookmarking the online documentation for in-memory OLTP (see below),
you should keep your eyes on whatever Microsoft has to say on the topic, on their
SQL Server website (http://www.microsoft.com/sqlserver), or on their documentation
site: http://preview.tinyurl.com/ydclrypc.
• SQL Server 2016 online documentation – high-level information about
SQL Server's in-memory OLTP technology:
http://msdn.microsoft.com/en-us/library/dn133186(v=sql.130).aspx.
• Wikipedia – general background about in-memory databases, with
links to other vendors:
http://en.wikipedia.org/wiki/In-memory_database.
• Hekaton: SQL Server's Memory-Optimized OLTP Engine – a publication
submitted to the ACM by the team at Microsoft Research that was responsible
for the Hekaton project:
http://research.microsoft.com/apps/pubs/default.aspx?id=193594.
• The Path to In-Memory Database Technology – an excellent blog post about
the history of relational databases and the path that led to in-memory OLTP:
http://preview.tinyurl.com/n9yv6rp.
• To see the details of the 100x performance improvement achieved by bwin,
take a look at this post:
http://preview.tinyurl.com/y9snrqew.
34
Chapter 2: Creating and Accessing
In-Memory OLTP Databases and Tables
Prior to SQL Server 2016 SP1, and in all releases of SQL Server 2014, the in-memory
OLTP components are installed automatically in 64-bit installations of the Enterprise or
Developer editions. For SQL Server 2016 SP1 and later, in-memory OLTP is an automatic
component of the SQL Server setup process for any installation of SQL Server 2016 that
includes the database engine components. In-memory OLTP is not available at all with any
32-bit installations of SQL Server.
Therefore, with no further setup, we can begin creating databases and data structures that
will store memory-optimized data.
Creating Databases
Any database that will contain memory-optimized tables needs to have a single MEMORY_
OPTIMIZED_DATA filegroup having at least one container, which stores the checkpoint files
needed by SQL Server to recover the memory-optimized tables. These are the checkpoint
DATA and DELTA files, plus the ROOT and LARGE DATA files that were introduced briefly in
Chapter 1. SQL Server populates these files during CHECKPOINT operations, and reads them
during the recovery process, which we'll discuss in Chapter 6.
The syntax for creating a MEMORY_OPTIMIZED_DATA filegroup is almost the same as that
for creating a regular FILESTREAM filegroup, but it must specify the option CONTAINS
MEMORY_OPTIMIZED_DATA. Listing 2-1 provides an example of a CREATE DATABASE
statement for a database that can support memory-optimized tables (edit the path names to
match your system; if you create the containers on the same drive you'll need to differentiate
the two file names).
USE master
GO
DROP DATABASE IF EXISTS HKDB;
GO
CREATE DATABASE HKDB
ON
PRIMARY(NAME = [HKDB_data],
35
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
In Listing 2-1, we create a regular data file (HKDB_data.mdf), used for disk-based
table storage only, and a regular log file (HKDB_log.ldf). In addition, we create a
memory-optimized filegroup, HKDB_mod_fg with, in this case, two file containers
each called HKDB_mod_dir.
Note that although the syntax indicates that we are specifying files, using the FILENAME
argument, there is no file suffix specified because we are actually specifying a path for a
folder that will contain the checkpoint files. We will discuss the checkpoint files in detail in
a later section. Many of my examples will use IMDB or HKDB (or some variation thereof) as
the database name, to indicate it is an ''in-memory" or "Hekaton" "database." Also, many of
my filegroups and container names for storing the memory-optimized data will contain the
acronym MOD, for memory-optimized data.
These containers host the checkpoint files to which the CHECKPOINT process will write
data, for use during database recovery. The DATA checkpoint file stores inserted rows and
the DELTA files reference deleted rows, and these files always come in pairs. The paired
DATA and DELTA files may be in the same or different containers, depending on the number
of containers specified. With multiple containers, the checkpoint files will be spread across
them. SQL Server 2016 uses a round-robin allocation algorithm for each type of file (DATA,
DELTA, LARGE OBJECT and ROOT). Thus, each container contains all types of files.
Multiple containers can be used to parallelize data load. Basically, if creating a second
container reduces data load time (most likely because it is on a separate drive), then use it.
If the second container does not speed up data transfer (because it is just another directory
on the same drive), then don't do it. The basic recommendation is to create one container per
spindle (or I/O bus).
Notice that we place the primary data file, each of the checkpoint file containers, and the
transaction log, on separate drives. Even though the data in a memory-optimized table is
36
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
never read from or written to disk "inline" during query processing, it can still be useful to
consider placement of your checkpoint files and log file for optimum I/O performance during
logging, checkpoint, and recovery. To help ensure optimum recovery speed, you will want to
put each of the containers in the MEMORY_OPTIMIZED filegroup on a separate drive, with
fast sequential I/O.
To reduce any log waits, and improve overall transaction throughput, it's best to place the log
file on a drive with fast random I/O, such as an SSD drive. As the use of memory-optimized
tables allows for a much greater throughput, we'll start to see a lot of activity needing to be
written to the transaction log although, as we'll see in Chapter 6, the overall efficiency of the
logging process is much higher for in-memory tables than for disk-based tables.
If, instead of creating a new database, we want to allow an existing database to store
memory-optimized objects and data, we simply add a MEMORY_OPTIMIZED_DATA
filegroup to an existing database, and then add a container to that filegroup, as shown in
Listing 2-2.
ALTER DATABASE AdventureWorks2014
ADD FILEGROUP AW2014_mod_fg CONTAINS MEMORY_OPTIMIZED_DATA;
GO
ALTER DATABASE AdventureWorks2016
ADD FILE (NAME='AW014_mod_dir',
FILENAME='c:\HKData\MyAW2014_mod_dir')
TO FILEGROUP AW2014_mod_fg;
GO
Listing 2-2: Adding a filegroup and file for storing memory-optimized table data.
37
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
of a transaction, your application may be able to determine that the transaction was not
committed, so the data does not exist, but if the failure occurs outside of a transaction, Azure
SQL Database will not notify you of the failure. It will automatically recover, but any non-
durable memory-optimized tables will then be completely empty.
Creating Tables
The syntax for creating memory-optimized tables is almost identical to the syntax for
creating disk-based tables, but with a few required extensions, and a few restrictions on the
data types, indexes and other options, that memory-optimized tables can support.
To specify that a table is a memory-optimized table, we use the MEMORY_OPTIMIZED =
ON clause. Apart from that, and assuming we're using only the supported data types and other
objects, the only other requirement is that CREATE TABLE statement includes at least one
index, which could be the index automatically created to support the PRIMARY KEY. Listing
2-3 shows a basic example.
USE HKDB;
GO
CREATE TABLE T1
(
[Name] varchar(32) not null PRIMARY KEY NONCLUSTERED HASH
WITH (BUCKET_COUNT =
100000),
[City] varchar(32) null,
[State_Province] varchar(32) null,
[LastModified] datetime not null,
Listing 2-3: Creating a memory-optimized table with the index definition inline.
38
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Durability
We can define a memory-optimized table with one of two DURABILITY values:
SCHEMA_AND_DATA or SCHEMA_ONLY, with the former being the default. If we define
a memory-optimized table with DURABILITY=SCHEMA_ONLY, then SQL Server will not
log changes to the table's data, nor will it persist the data in the table to the checkpoint files,
on disk. However, it will still persist the schema (i.e. the table structure) as part of the data-
base metadata, so the empty table will be available after the database is recovered during a
SQL Server restart.
39
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
CREATE TABLE T1
(
[Name] varchar(32) not null PRIMARY KEY NONCLUSTERED,
[City] varchar(32) not null INDEX T1_hdx_c2 HASH
WITH (BUCKET_COUNT = 10000),
[State_Province] varchar(32) null,
[LastModified] datetime not null,
Listing 2-4: Creating an in-memory table with two indexes defined inline.
For non-PRIMARY KEY columns, the NONCLUSTERED keyword is optional, but we must
specify it explicitly when defining the PRIMARY KEY because otherwise SQL Server will
try to create a clustered index, the default for a PRIMARY KEY, and will generate an error
because clustered indexes are not allowed on memory-optimized tables.
For composite indexes, we create them after the column definitions. Listing 2-5 creates a new
table, T2, with the same hash index for the primary key on the Name column, plus a range
index on the City and State_Province columns.
CREATE TABLE T2
(
[Name] varchar(32) not null PRIMARY KEY NONCLUSTERED HASH
WITH (BUCKET_COUNT = 100000),
[City] varchar(32) not null,
[State_Province] varchar(32) not null,
[LastModified] datetime not null,
Note that there are no specific index DDL commands (i.e. CREATE INDEX, ALTER INDEX,
DROP INDEX). We always define indexes as part of the table creation, or, as of SQL Server
2016, as part of an ALTER TABLE operation.
40
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Most of the prior restrictions on table properties have been removed in SQL Server 2016.
For example, FOREIGN KEY, UNIQUE and CHECK constraints can be created. We can
create AFTER triggers, if they are natively compiled. One of the only limitations left is that
IDENTITY columns still can only be defined with a SEED and INCREMENT values of 1.
41
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
• Time
• In a disk-based table, storage size can be 3, 4 or 5 bytes, depending on the
precision of the fractional seconds.
• In a memory-optimized table, the time data type is always stored in 8 bytes.
• Date
• In a disk-based table, storage is always 3 bytes.
• In a memory-optimized table, storage is always 4 bytes.
• Datetime2
• In a disk-based table, storage size can be 6, 7 or 8 bytes, depending on the
precision of the fractional seconds (this is 3 bytes for the date, plus the bytes
needed for time).
• In a memory-optimized table, the datetime2 data type is always
stored in 8 bytes.
42
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Listing 2-6: Creating a memory-optimized table type and declaring a variable using it.
Table variables, whether in-memory or disk-based, generally do not have their statistics
updated due to changes to data values, so recompiles of queries using table variables must be
forced, when desired. However, with trace flag 2453, which was added in SQL Server 2014,
SQL Server will keep rowcount information about table variables, whether in-memory
or disk-based. These rowcount values can then be used to trigger recompilation when the
number of rows modified exceeds the recompile threshold. You can get more details of this
behavior in the KnowledgeBase article at: https://support.microsoft.com/en-us/kb/2952444.
The article refers to SQL Server 2012, but that is irrelevant for our purposes, because we are
only concerned with memory-optimized tables, which weren't available in that version.
Memory-optimized table variables offer the following advantages, when compared to disk-
based table variables:
• The variables are only stored in memory. Data access is more efficient because
memory-optimized table types use the same data structures used for memory-opti-
mized tables. The efficiency is increased further when the memory-optimized table
variable is used in a natively compiled module.
• Table variables are not stored in tempdb and do not use any resources
in tempdb.
43
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
If multiple changes need to be made to a single table, changes of the same type can be
combined into a single ALTER TABLE command. For example, you can ADD multiple
columns, indexes, and constraints in a single ALTER TABLE and you can DROP multiple
columns, indexes, and constraints in a single ALTER TABLE. It is recommended that you
combine changes wherever possible, so that the table will need to be rebuilt as few times
as possible.
Most ALTER TABLE operations can run in parallel and the operation is log-optimized, which
means that only metadata changes are written to the transaction log. However, there are some
ALTER TABLE operations that can only run single-threaded and will write every row to the
log as the ALTER TABLE is being performed. The following operations are the ones that run
single-threaded and will log every row:
• ADD/ALTER a column to use a LOB type: nvarchar(MAX), varchar(MAX),
or varbinary(MAX).
• ADD/DROP a COLUMNSTORE index.
• ADD/ALTER an off-row column and ADD/ALTER/DROP operations that cause an
in-row column to be moved off-row, or an off-row column to be moved in-row.
Note that ALTER operations that increase the length of a column that is already
stored off-row are log-optimized.
Listing 2-7 provides some code examples illustrating the SQL Server 2016 ALTER TABLE
operations on a memory-optimized table.
USE HKDB
GO
DROP TABLE IF EXISTS dbo.OrderDetails;
GO
44
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
GO
-- index operations
-- add index
ALTER TABLE dbo.OrderDetails
ADD INDEX IX_UnitPrice NONCLUSTERED (UnitPrice);
GO
-- drop index
ALTER TABLE dbo.OrderDetails
DROP INDEX IX_UnitPrice;
GO
-- Drop a column
ALTER TABLE dbo.OrderDetails
DROP COLUMN Discount;
GO
45
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Interpreted T-SQL
When accessing memory-optimized tables using interpreted T-SQL, via Query Interop, we
have access to virtually the full T-SQL surface area (i.e. the full list of statements and expres-
sions). However, we should not expect the same performance that we could achieve if we
accessed memory-optimized tables using natively compiled stored procedures (Chapter 7
shows a performance comparison).
Use of interop is the appropriate choice when running ad hoc queries, or to use while
migrating your applications to in-memory OLTP, as a step in the migration process, before
migrating the most performance-critical procedures. Interpreted T-SQL should also be used
when you need to access both memory-optimized tables and disk-based tables.
The only T-SQL features or capabilities not supported when accessing memory-optimized
tables using interop are the following:
• TRUNCATE TABLE.
• MERGE (when a memory-optimized table is the target).
• Dynamic and keyset cursors (these are automatically degraded to static cursors).
• Cross-database queries.
• Cross-database transactions.
• Linked servers.
• Locking hints: TABLOCK, XLOCK, PAGLOCK, etc. (NOLOCK is supported,
but is quietly ignored.)
• Isolation level hints: READUNCOMMITTED, READCOMMITTED and
READCOMMITTEDLOCK.
• Other table hints: IGNORE_CONSTRAINTS, IGNORE_TRIGGERS, NOWAIT,
READPAST, SPATIAL_WINDOW_MAX_CELLS.
46
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Prior to SQL Server 2016, you won't see parallel plans for operations on memory-optimized
tables. The XML plan for the query will indicate that the reason for no parallelism is because
one of the tables is a memory-optimized table. Queries run in interpreted Transact-SQL can
be run in parallel in SQL Server 2016. However, this only applies to SELECT statements
and parallelism will not be used for any data modification operations on memory-optimized
tables.
In SQL Server 2016, although parallelism can be used for both index scans and table scans, it
is not supported for any operations inside a natively compiled module.
47
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
48
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
SQL Server allocates it from a page dedicated to sizes equal to or greater than the required
number of bytes. So, for example, a request for 100 bytes would be allocated from a 128-byte
allocation, on a page dedicated to allocations of that size.
Each memory-optimized table has space allocated from its own varheap. We can examine
the metadata for each varheap, and the other memory consumers, in a Dynamic Management
View (DMV) called sys.dm_db_xtp_memory_consumers. Each memory-optimized
table has a row in this view for each varheap, and for each hash index (allocated from a
hash memory consumer). We will see one more type of memory consumer for columnstore
indexes, called HkCS Allocator, in Chapter 4.
To see this in action, Listing 2-8 creates a OrderDetails table with both hash and
range indexes.
USE HKDB;
GO
DROP TABLE IF EXISTS dbo.OrderDetails;
GO
CREATE TABLE dbo.OrderDetails
(
OrderID INT NOT NULL,
ProductID INT NOT NULL,
UnitPrice MONEY NOT NULL,
Quantity SMALLINT NOT NULL,
Discount REAL NOT NULL,
Description VARCHAR(2000),
INDEX IX_OrderID NONCLUSTERED ( OrderID ),
INDEX IX_ProductID NONCLUSTERED ( ProductID ),
CONSTRAINT PK_Order_Details PRIMARY KEY NONCLUSTERED HASH
( OrderID, ProductID )
WITH ( BUCKET_COUNT = 1048576 )
)
WITH (
MEMORY_OPTIMIZED =
ON ,
DURABILITY =
SCHEMA_AND_DATA );
GO
Listing 2-8: Create a table with two kinds of indexes to use in examining
memory consumers.
49
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
The results are shown in Figure 2-1, although your values for xtp_object_id and
consumer_id (which is just an internal ID for the memory consumer) will most likely
be different than mine.
Each table has a unique object_id and xtp_object_id value, which are used by all
the indexes. The minor_id indicates the column number in the table, and is used for large
object columns that have their own storage. In this example there are no memory consumers
for individual columns, so all the minor_id values are 0.
50
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
The four rows for the table (with type_desc = 'USER_TABLE') show three rows for the
indexes (two range indexes and one hash index). The range indexes and the table are stored
in VARHEAP structures and the hash index uses a HASH structure. Note in the fourth row
that there is no space preallocated for the table heap. It will not have any size until we start
inserting rows. The preallocated space for indexes will be discussed in Chapter 4.
SQL Server In-Memory OLTP also supports large object, or LOB, columns, specified with
the MAX qualifier in the datatype definition, as well as large, variable length columns, like the
row-overflow columns for disk-based tables. Each LOB column is stored in its own varheap.
For large, variable length columns, on memory-optimized tables, SQL Server decides,
when the table is created, which of your variable length columns will be stored in the table's
varheap and which will have their own varheap. This is different than for disk-based tables,
where the determination of whether a variable length column is stored in the actual row or on
a row_overflow page is made when the data values are inserted or updated, based on the
total row size. You can think of LOB and row-overflow columns as being stored in their own
internal tables.
Let's alter our OrderDetails table to add a large, variable length column, and a LOB
column. The column in the table called description_medium will be row_overflow
and description_full will be LOB.
ALTER TABLE dbo.OrderDetails
ADD
Description_medium VARCHAR(8000),
Description_full VARCHAR(MAX);
GO
Listing 2-10: Altering the OrderDetails table to add large and LOB columns.
51
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Each additional varheap, for the row_overflow and LOB columns will have its own xtp_
object_id. Notice that having altered the OrderDetails table, its xtp_object_
id changed, but its object_id did not. We can also see two new xtp_object_id
values, one for the LOB column (minor_id = 8) and one for the row_overflow column
(minor_id = 7). The row_overflow column has two varheaps, and the LOB column
has three.
For the row_overflow column, the actual off-row column data is stored in a Table heap
structure, a specific usage of a varheap, for the column. Only indexes have space preallo-
cated, which explains why it currently takes up 0 bytes.
The range index heap on the off-row data allows SQL Server to find the specific column
values it needs. For the LOB column, the LOB data is stored in the LOB Page Allocator
and the Table heap for the LOB column just contains pointers. Like for row_overflow,
the range index heap allows fast access to various parts of the LOB data.
Summary
This chapter covered the basics of creating database, tables and indexes to store memory-
optimized data. In creating the database, we must define a special memory-optimized file-
group, which is built on the FILESTREAM technology. When creating a memory-optimized
table, we just have to specify a special MEMORY_OPTIMIZED = ON clause, and create at
least one index on the table. It sounds simple, and it is, but we must remember that there are
currently many restrictions on the data types, indexes, constraints, and other options, that
memory-optimized tables can support.
Memory-optimized tables and indexes use special memory allocators to keep track of the
information in memory. There is metadata available to show us the different allocators
needed for the tables, for their indexes, and for the large object columns (if any). Fortunately,
understanding the exact mechanics of these structures is not crucial for making optimum use
of the in-memory OLTP technology.
We can access memory-optimized data structures with T-SQL, either in interop mode or via
natively compiled stored procedures. In the former case, we can use more or less the full
T-SQL surface area, but in the latter case, there is a longer list of restrictions (but a shorter
list than for SQL Server 2014).
52
Chapter 2: Creating and Accessing In-Memory OLTP Databases and Tables
Additional Resources
• Details of supported query constructs in natively compiled procedures:
http://tinyurl.com/y7po8m7z.
• Details of T-SQL Constructs Not Supported by In-Memory OLTP:
http://preview.tinyurl.com/y8tugw8k.
• White paper discussing SQL Server Filestream storage, explaining how files
in the filegroups containing memory-optimized data are organized and managed
internally: http://preview.tinyurl.com/o3zrnp2.
53
Chapter 3: Row Structure and
Multi-Versioning
In the previous two chapters, we discussed how the storage structures for in-memory
tables and indexes are very different from their disk-based counterparts. In-memory data
structures are optimized for byte-addressable memory instead of block-addressable disk.
SQL Server does not store the data rows on pages like it does for disk-based tables, nor
does it pre-allocate space for the storage from extents. Instead, it stores the data rows to
memory, written sequentially in the order the transactions occurred, linked by pointers in
an "index chain."
Each in-memory table must have at least one index, as this index provides structure for each
table. An in-memory table can have up to eight nonclustered indexes, comprising a mixture
of both hash and range indexes, plus an optional clustered columnstore index (all covered in
Chapter 4).
The structure of a data row within a memory-optimized data structure reflects the fact that
the in-memory OLTP engine supports a truly optimistic concurrency model, called a multi-
version concurrency control (MVCC) model, which is based on in-memory row versioning.
For memory-optimized tables, SQL Server never modifies any existing row. Instead, any
UPDATE operation is a two-step process that marks the current version of the row as invalid
and then creates a new version of the row. If a row is subject to multiple updates, then many
versions of the same row will coexist simultaneously. SQL Server displays the correct
version to each transaction that wishes to access a row by examining timestamps stored in
the row header, and comparing them to the time the accessing transaction started.
In this chapter, we're going to explore the row structure that enables this row versioning, and
then take a high-level view of how the new MVCC model works.
Row Structure
The data rows that comprise in-memory tables have a structure very different than the row
structures used for disk-based tables. Each row consists of a row header, and a payload
containing the row attributes (the actual data). Figure 3-1 shows this structure, as well as
expanding on the content of the header area.
54
Chapter 3: Row Structure and Multi-Versioning
Row header
The row header for every data row consists of the following fields:
• Begin-Ts – the "insert" timestamp. It reflects the time that the transaction that
inserted a row issued its COMMIT.
• End-Ts – the "delete" timestamp. It reflects the time that the transaction that
deleted a row issued its COMMIT.
• StmtId – every statement within a transaction has a unique StmtId value,
which identifies the statement that created the row. If the same row is then
accessed again by the same statement, it can be ignored. This can provide
Halloween protection within transactions on memory-optimized tables.
• IdxLinkCount – a reference count that indicates the number of indexes that
reference this row.
• Padding – extra bytes added (and not used) so the row will be a multiple of
8 bytes in length.
• Index Pointers – these are C language pointers to the next row in the index
chain. There is a pointer for each nonclustered index on the table. It is the index
pointers, plus the index data structures, that connect the rows of a table.
55
Chapter 3: Row Structure and Multi-Versioning
Payload area
The payload is the row data itself, containing the index key columns plus all the other
columns in the row, meaning that all indexes on a memory-optimized table can be thought of
as covering indexes. The payload format can vary depending on the table, and based on the
table's schema. As described in Chapter 1, the in-memory OLTP compiler generates the DLLs
for table operations. These DLLs contain code describing the payload format, and so can also
generate the appropriate commands for all row operations.
56
Chapter 3: Row Structure and Multi-Versioning
57
Chapter 3: Row Structure and Multi-Versioning
In this example, the "white" version of the row has a validity interval of 5 to 10 and the
"brown" row has a validity interval of 10 to infinity. An active transaction with a logical read
time greater than or equal to 5, and less than 10, such as 5 or 9, should see the "white" row,
whereas one that started at time 10 or higher should see the "brown" row version.
After a transaction issues a COMMIT, SQL Server performs some validation checks (more
on this shortly). Having determined the transaction is valid, it hardens it to disk and writes
the commit timestamp into the row header of all affected rows. If the transaction was
an INSERT, it writes the commit timestamp to Begin-Ts and, if it was a DELETE, to
End-Ts. An UPDATE is simply an atomic operation consisting of a DELETE followed
by an INSERT.
We can see that a transaction inserted the rows <Greg, Beijing> and <Susan, Bogota> at
timestamp 20. Notice that SQL Server uses a special value, referred to as "infinity," for the
End-Ts value for rows that are active (not yet marked as invalid).
58
Chapter 3: Row Structure and Multi-Versioning
Processing phase
During the processing stage, SQL Server processes the transaction, creating new row
versions (and linking them into index structures – covered in Chapter 4), and marking rows
for deletion as necessary, as shown in Figure 3-3.
Figure 3-3: Row versions during an in-flight data modification transaction, Tx1.
During processing, SQL Server uses the Transaction-ID for the Begin-Ts value of
any row it needs to insert, and for the End-Ts value for any row that it needs to mark for
deletion. SQL Server uses an extra bit flag to indicate to other transactions that these are
transaction IDs, not timestamps.
So, to delete the <Susan, Bogota> row (remember, the row isn't removed during processing;
it's more a case of marking it as deleted), transaction Tx1 first locates the row, via one of the
indexes, and then sets the End-Ts value to the Transaction-ID Tx1.
59
Chapter 3: Row Structure and Multi-Versioning
The update of <Greg, Beijing> occurs in an atomic step consisting of two separate opera-
tions that will delete the original row, and insert a completely new row. Tx1 constructs the
new row <Greg, Lisbon>, storing the transaction-ID, Tx1, in Begin-Ts, and then setting
End-Ts to ∞ (infinity). As part of the same atomic action, Tx1 deletes the <Greg, Beijing>
row, just as described previously. Finally, it inserts the new <Jane, Helsinki> row.
At this stage our transaction, Tx1, issues a COMMIT. SQL Server generates the commit
timestamp, at 120, say, and stores this value in the global transaction table. This timestamp
identifies the point in the serialization order of the database where this transaction's updates
have logically all occurred. It does not yet write this timestamp to the row header because
SQL Server has yet to validate the transaction (more on this shortly), and so has not hardened
the transaction to the log, on disk. As such, the transaction is not yet "guaranteed;" it could
still abort and roll back, and SQL Server will not acknowledge the commit to the user until
validation completes. However, SQL Server will optimistically assume that the transaction
will actually commit, and makes the row available to other transactions as soon as it receives
the COMMIT.
60
Chapter 3: Row Structure and Multi-Versioning
pointer in the header back to the previous row version, and instead return the row version
<Greg, Beijing>. Likewise, Tx2 will not return the row <Jane, Helsinki>.
However, what if, instead, we assume Tx2 started at timestamp 121, after Tx1 issued the
commit, but before SQL Server completed validation of Tx1? If Tx2 started at timestamp
121, then it will be able to access data rows that have a commit timestamp of less than or
equal to 121 for Begin-Ts and greater than 121 for End-Ts.
Tx2 reads the <Susan, Bogota> row, finds Tx1 in End-Ts indicating it may be deleted,
locates Tx1 in the global transaction table and checks the internal transaction table, where
this time it will find the commit timestamp (the "prospective" Begin-Ts value) of 120 for
Tx1. The commit for Tx1 is issued but not confirmed (since it hasn't completed validation),
but SQL Server optimistically assumes that Tx1 will commit, and therefore that the <Susan,
Bogota> row is deleted, and Tx2 will not return this row. By a similar argument, it will return
the rows <Greg, Lisbon> and <Jane, Helsinki>, since their prospective Begin-Ts are 120
(=<121) and End-Ts are infinity (>121).
However, since SQL Server has yet to validate transaction Tx1, it registers a commit depen-
dency between Tx2 and Tx1. This means that SQL Server will not complete validation of
Tx2, nor acknowledge the commit of Tx2 to the user, until it completes validation of Tx1.
In other words, while a transaction will never be blocked waiting to acquire a lock, it may
need to wait a short period for commit dependencies to resolve, during validation. However,
generally, any blocking waits that arise from the resolution of commit dependencies will be
minimal. Of course, a "problematic" (e.g. long-running) transaction in an OLTP system is
still going to cause some blocking, although never lock-related blocking.
61
Chapter 3: Row Structure and Multi-Versioning
Validation phase
Once our transaction Tx1 issues a commit, and SQL Server generates the commit timestamp,
the transaction enters the validation phase. While SQL Server will immediately detect direct
update conflicts, such as those discussed in the previous section, it is not until the validation
phase that it will detect other potential violations of the properties specified by the transac-
tion isolation level. So, for example, let's say Tx1 was accessing the memory-optimized table
in REPEATABLE READ isolation. It reads a row value and then Tx2 updates that row value,
which it can do because SQL Server acquires no locks in the MVCC model, and issues a
commit before Tx1 commits. When Tx1 enters the validation phase, it will fail the
validation check and SQL Server will abort the transaction. If there are no violations,
SQL Server proceeds with other actions that will culminate in guaranteeing the durability
of the transaction.
The following summarizes the actions that SQL Server will take during the validation phase
(Chapter 5 discusses each of these actions in more detail).
• Validate the changes made by a transaction – for example, it will
perform checks to ensure that there are no violations of the current transaction
isolation level.
• Wait for any commit dependencies to resolve – (i.e. for the dependency count to
reduce to 0).
• Harden the transaction to disk – for durable tables only, SQL Server generates
a "write set" of changes, basically a list of DELETE/INSERT operations
with pointers to the version associated with each operation, and writes it
to the transaction log, on disk.
• Mark the transaction commit as validated in the global transaction table.
• Clear dependencies of transactions that are dependent on the validated transaction
(in our example, once Tx1 validates, Tx2 can now complete validation).
At this stage, Tx1 is validated and it moves to the post-processing stage.
Post-processing
In this stage, SQL Server writes the commit timestamp into the row header of all affected
rows (note this is the timestamp from when Tx1 first issued the commit). Therefore, our final
row versions look as shown in in Figure 3-4.
62
Chapter 3: Row Structure and Multi-Versioning
As noted earlier, the storage engine has no real notion of row "versions." There is no implicit
or explicit reference that relates one version of a given row to another. There are just rows,
connected by the table's nonclustered indexes, as we'll see in the next chapter, and visible to
active transactions, or not, depending on the validity interval of the row version compared to
the logical read time of the accessing transaction.
In Figure 3-4, the rows <Greg, Beijing> and <Susan, Bogota> have a validity interval of 20
to 120 and so any user transaction with a starting timestamp greater than or equal to 20 and
less than 120, will be able to see those row versions. Any transaction with a starting time-
stamp greater than 120 will see <Greg, Lisbon> and <Jane, Helsinki>.
Eventually, there will be no active transactions for which the rows <Greg, Beijing> and
<Susan, Bogota> are still valid, and so SQL Server can delete them permanently (remove the
rows from the index chains and deallocate memory). These "stale" rows may be removed by
user threads or by a separate "garbage collection" thread (we'll cover this in Chapter 5).
63
Chapter 3: Row Structure and Multi-Versioning
Summary
The SQL Server In-Memory OLTP engine supports true optimistic concurrency, via a
MVCC, based on in-memory row versioning. This chapter described the row structure that
underpins the MVCC model, and then examined how SQL Server maintains multiple row
versions, and determines the correct row version that each concurrent transaction should
access. This model means that SQL Server can avoid read-write conflicts without the need
for any locking or latching, and will raise write-write conflicts immediately, rather than after
a delay (i.e. rather than blocking for the duration of a lock-holding transaction).
In the next chapter, we'll examine how SQL Server uses indexes to connect all rows that
belong to a single in-memory table, as well as to optimize row access.
Additional Resources
• Hekaton: SQL Server's Memory-Optimized OLTP Engine – a white paper
by Microsoft Research:
http://preview.tinyurl.com/lcg5m4x.
• Table and Row Size in Memory-Optimized Tables:
http://preview.tinyurl.com/ybjgqa7m.
64
Chapter 4: Indexes on
Memory-Optimized Tables
The previous chapter discussed the data row structure, and how the in-memory OLTP engine
maintains row versions, as part of its optimistic MVCC system.
The row header for each data row contains a set of index pointers, one for each index on
the table to which the row belongs. Each pointer points to the next logical row in that table,
according to the key for that index. As such, it is these indexes that provide order to the rows
in a table.
Beyond this obligatory index, we can choose an additional seven nonclustered indexes,
to a maximum of eight nonclustered indexes on a table consisting of both hash and range
indexes, to optimize access paths for that table. In addition, as of SQL Server 2016, we
can create a clustered columnstore index on a memory-optimized table to allow efficient
analytical queries to be run.
In this chapter, we're going to explore, in a lot more detail, the storage structure of indexes on
memory-optimized tables. We'll start by discussing hash indexes, how SQL Server can use
such an index to join and organize the rows of a table, and then we'll look at the tactical use
of these indexes for query optimization.
We'll then move on to discuss, in depth, the range index and its new Bw-tree internal struc-
ture, and the internal maintenance that SQL Server performs on these structures to maintain
optimum query performance.
Finally, we'll look at columnstore indexes which are a separate structure. I'll give a little
background about columnstore indexes in general before discussing how they are created,
stored and managed with memory-optimized tables.
65
Chapter 4: Indexes on Memory-Optimized Tables
Note
The Microsoft documentation (and the syntax for creating indexes on memory-optimized
tables) does not use the word "range" when referring to the indexes built with Bw-trees. It
simply calls them "nonclustered indexes." However, since hash indexes are referred to as "non-
clustered hash indexes," I find it useful to have comparable terms and avoid confusion,
by calling this type of index "nonclustered range index" or simply "range index."
The following summarizes some of what we've discussed about the "rules" governing the use
of nonclustered indexes on memory-optimized tables.
• All memory-optimized tables must have at least one nonclustered index, either
range or hash.
• A maximum of 8 nonclustered indexes per table are allowed, including the index
supporting the PRIMARY KEY.
• We must define all indexes at the time we create the memory-optimized table
– SQL Server writes the number of index pointers, and therefore the number of
indexes, into the row header on table creation.
• To change the number of indexes we use the ALTER TABLE command which will
completely rebuild the table internally.
• Nonclustered indexes on memory-optimized tables are entirely in-memory struc-
tures – SQL Server never logs any changes made to data rows in nonclustered
indexes, during data modification.
• During database recovery SQL Server recreates all nonclustered indexes based on
the index definitions. We'll go into detail in Chapter 6.
With a maximum limit of 8 nonclustered indexes, all of which we must define on table
creation, we must exert even more care than usual to choose the correct and most useful set
of indexes.
We discussed earlier in the book how data rows are not stored on pages, so there is no collec-
tion of pages or extents, and there are no partitions or allocation units. Similarly, although we
do refer to index pages in range indexes, they are very different structures from their disk-
based counterparts.
In disk-based indexes, the pointers locate physical, fixed-size pages on disk. As we modify
data rows, we run into the problem of index fragmentation, as gaps appear in pages during
DELETEs, and page splits occur during INSERTs and UPDATEs. Once this fragmentation
occurs, the I/O overhead associated with reads and writes grows.
66
Chapter 4: Indexes on Memory-Optimized Tables
Hash indexes
A hash index, which is stored as a hash table, consists of an array of pointers, where each
element of the array is called a hash bucket and stores a pointer to the location in memory
of a data row. When we create a hash index on a column (or columns), SQL Server applies a
hash function to the value in the index key column(s) in each row and the result of the func-
tion determines which bucket will contain the pointer for that row.
More on hashing
Hashing is a well-known search algorithm, which stores data based on a hash key generated
by applying a hash function to the search key (in this case, the index key). A hash table can be
thought of as an array of "buckets," one for each possible value that the hash function can gen-
erate, and each data element (in this case, each data row) is added to the appropriate bucket
based on its index key value. When searching, the system will apply the same hash function
to the value being sought, and will only have to look in a single bucket. For more information
about what hashing and hash searching are all about, take a look at the Wikipedia article at:
http://en.wikipedia.org/wiki/Hash_function.
Let's say we insert the first row into a table and the index key value hashes to the value 4.
SQL Server stores a pointer to that row in hash bucket 4 in the array. If a transaction inserts
a new row into the table, where the index key value also hashes to 4, it becomes the first row
in the chain of rows accessed from hash bucket 4 in the index, and the new row will have a
pointer to the original row.
In other words, the hash index accesses, from the same hash bucket, all key values that hash
to the same value (have the same result from the hash function), with subsequent rows linked
together in a chain, with one row pointing to the next. If there is duplication of key values,
the duplicates will always generate the same function result and thus will always be in the
same chain.
67
Chapter 4: Indexes on Memory-Optimized Tables
Ideally, there shouldn't be more than one key value in each hash chain. If two different key
values hash to the same value, which means they will end up in the same hash bucket, then
this is a hash collision.
Row organization
As discussed previously, SQL Server stores these index pointers in the index pointer array
in the row header. Figure 4-1 shows two rows in a hash index on a Name column. For this
example, assume there is a very simple hash function that results in a value equal to the
length of the string in the index key column. The first value of Jane will then hash to 4, and
Susan to 5, and so on. In this simplified illustration, different key values (Jane and Greg, for
example) will hash to the same bucket, so we have a hash collision. Of course, the real hash
function is much more random and unpredictable, but I am using the length example to make
it easier to illustrate.
The figure shows the pointers from the 4 and 5 entries in the hash index to the rows
containing Jane and Susan, respectively. Neither row points to any other rows, so the index
pointer in each of the row headers is NULL.
68
Chapter 4: Indexes on Memory-Optimized Tables
In Figure 4-1, we can see that the <Jane, Helsinki> and <Susan, Vienna> rows have a
Begin-Ts timestamp of 50 and 70 respectively, and each is the current, active version of
that row.
In Figure 4-2, a transaction, which committed at timestamp 100, has added to the same table
a row with a Name value of Greg. Using our string length hash function, Greg hashes to 4,
and so maps to the same bucket as Jane, and the row is linked into the same chain as the row
for Jane. The <Greg, Beijing> row has a pointer to the <Jane, Helsinki> row and SQL Server
updates the hash index to point to Greg. The <Jane, Helsinki> row needs no changes.
Finally, what happens if another transaction, which commits at timestamp 200, updates
<Greg, Beijing> to <Greg, Lisbon>? The new version of Greg's row is simply linked in as
any other row, and will be visible to active transactions depending on their timestamps, as
described in Chapter 3. Every row has at least one pointer to it for the index on Name, either
directly from a hash index bucket or from another row. In this manner, each index provides
an access path to every row in the table, in the form of a singularly-linked list joining every
row in the table.
69
Chapter 4: Indexes on Memory-Optimized Tables
Of course, this is just a simple example with one index, in this case a hash index, which is the
minimum required to link the rows together. However, for query performance purposes, we
may want to add other hash indexes (as well as range indexes).
For example, if equality searches on the City column are common, and if it were quite a
selective column (small number of repeated values), then we might decide to add a hash
index to that column, too. This creates a second index pointer field. Each row in the table
now has two pointers pointing to it, and can point to two rows, one for each index. The first
pointer in each row points to the next value in the chain for the Name index; the second
pointer points to the next value in the chain for the City index.
Figure 4-4 shows the same hash index on Name, this time with three rows that hash to 4, and
two rows that hash to 5, which uses the second bucket in the Name index. The second index
on the City column uses three buckets. The bucket for 6 has three values in the chain, the
bucket for 7 has one value in the chain, and the bucket for 8 also has one value.
70
Chapter 4: Indexes on Memory-Optimized Tables
Now we have another access path through the rows, using the second hash index.
71
Chapter 4: Indexes on Memory-Optimized Tables
Another reason a hash index is less effective when there are lots of duplicates is because of
the likelihood of hash collisions. If both 'Smith' and 'Jones' end up in the same bucket,
even if we just need all the 'Smith' rows, we will encounter all the 'Jones' rows as
well, and then need to eliminate them from the possible results.
When defining a hash index, bear in mind that the hash function used is based on all the
key columns. This means that if you have a hash index on the columns: lastname,
firstname in an employees table, a row with the values <Harrison, Josh> will have a
different value returned from the hash function than <Harrison, John>. This means that a
query that just supplies a lastname value, i.e. Harrison, will not be able to use the index
at all, since Harrison may appear in many hash buckets. Therefore, in order to "seek" on
hash indexes the query needs to provide equality predicates for all of the key columns.
SQL Server rounds up the number we supply for the BUCKET_COUNT to the next power of
two, so it will round up a value of 100,000 to 131,072.
The number of buckets for each hash index should be determined based on the characteris-
tics of the column on which we are building the index. It is recommended that you choose a
number of buckets equal to or greater than the expected cardinality (i.e. the number of unique
72
Chapter 4: Indexes on Memory-Optimized Tables
values) of the index key column, so that there will be a greater likelihood that each bucket's
chain will point to rows with the same value for the index key column. In other words, we
want to try to make sure that two different values will never end up in the same bucket.
If there are fewer buckets than possible values, multiple values will have to use the same
bucket, i.e. a hash collision.
This can lead to long chains of rows and significant performance degradation of all DML
operations on individual rows, including SELECT and INSERT. On the other hand, be
careful not to choose a number that is too big because each bucket uses memory (8 bytes per
bucket). Having extra buckets will not improve performance but will simply waste memory.
As a secondary concern, it might also reduce the performance of index scans, which will
have to check each bucket for rows.
A DMV, sys.dm_db_xtp_hash_index_stats, provides information on the number
of buckets and hash chain lengths, which is useful for understanding and tuning the bucket
counts. We can also use the view to detect cases where the index key has many duplicates.
If this DMV returns a large average chain length, it indicates that many rows are hashed to
the same bucket. This could happen for the following reasons:
• If the number of empty buckets is low or the average and maximum chain lengths
are similar, it is likely that the total BUCKET_COUNT is too low. This causes many
different index keys to hash to the same bucket.
• If the BUCKET_COUNT is high or the maximum chain length is high relative to the
average chain length, it is likely that there are many rows with duplicate index key
values or there is a skew in the key values. All rows with the same index key value
hash to the same bucket, hence there is a long chain length in that bucket.
Conversely, short chain lengths along with a high empty bucket count are an indication of a
BUCKET_COUNT that is too high.
Range indexes
Hash indexes are useful for relatively unique data that we can query with equality predicates.
However, if you don't know the cardinality, and so have no idea of the number of buckets
you'll need for any column, or if you know you'll be searching your data based on a range of
values, you should consider creating a range index instead of a hash index.
73
Chapter 4: Indexes on Memory-Optimized Tables
With a range index, every row in a memory-optimized table will be accessible by a pointer
in the leaf. Range indexes are implemented using a new data structure called a Bw-tree,
originally envisioned and described by Microsoft Research in 2011. A Bw-tree is a lock- and
latch-free variation of a B-tree.
The Bw-tree
The general structure of a Bw-tree is similar to SQL Server's regular B-trees, except that
the index pages are not a fixed size, and once they are built they cannot be changed. Like a
regular B-tree page, each index page contains a set of ordered key values, and for each value
there is a corresponding pointer. At the upper levels of the index, on what are called the
internal pages, the pointers point to an index page at the next level of the tree, and at the leaf
level, the pointers point to a data row. Just like for in-memory OLTP hash indexes, multiple
data rows can be linked together. In the case of range indexes, rows that have the same value
for the index key will be linked.
One big difference between Bw-trees and SQL Server's B-trees is that, in the former, a page
pointer is a logical page ID (PID), instead of a physical page address. The PID indicates a
position in a mapping table, which connects each PID with a physical memory address. Index
pages are never updated; instead, they are replaced with a new page and the mapping table is
updated so that the same PID indicates a new physical memory address.
Figure 4-5 shows the general structure of a Bw-tree, plus the Page Mapping Table.
Each index row in the internal index pages contains a key value, and a PID of a page at the
next level down. The index pages show the key values that the index references. Not all the
PID values are indicated in Figure 4-5, and the mapping table does not show all the PID
values that are in use.
The key value is the highest value possible on the page referenced. Note that this is different
than a regular B-tree index, for which the index rows stores the minimum value on the page
at the next level down. The leaf level index pages also contain key values but, instead of a
PID, they contain an actual memory address of a data row, which could be the first in a chain
of data rows, all with the same key value (these are the same rows that might also be linked
using one or more hash indexes).
74
Chapter 4: Indexes on Memory-Optimized Tables
Another big difference between Bw-trees and SQL Server's B-trees is that, at the leaf level,
SQL Server keeps track of data changes using a set of delta values. As noted above, index
pages are never updated, they are just replaced with a new page. However, the leaf pages
themselves are not replaced for every change. Instead, each update to an index page, which
can be an INSERT or DELETE of a key value on that page, produces a page containing a
delta record describing the change.
An UPDATE is represented by two new delta records, one for the DELETE of the original
value, and one for the INSERT of the new value. When SQL Server adds each delta record,
it updates the mapping table with the physical address of the page containing the newly
added delta record for the INSERT or DELETE operation.
75
Chapter 4: Indexes on Memory-Optimized Tables
Figure 4-6 illustrates this behavior. The mapping table is showing only a single page with
logical address P. The physical address in the mapping table originally was the memory
address of the corresponding leaf level index page, shown as page P. After we insert a new
row into the table, with index key value 50 (which we'll assume did not already occur in the
table's data), in-memory OLTP adds a delta record linked to page P, indicating the insert of
the new key, and the physical address of page P is updated to indicate the address of this first
delta record page.
Assume, then, that a separate transaction deletes from the table the only row with index
key value 48. In-memory OLTP must then remove the index row with key 48, so it creates
another delta record, and once again updates the physical address for page P.
When searching through a range index, SQL Server must combine the delta records with
the base page, making the search operation a bit more expensive. However, not having to
completely replace the leaf page for every change gives us a performance saving. As we'll
see in the later section, Consolidating delta records, eventually SQL Server will combine the
original page and chain of delta pages into a new base page.
76
Chapter 4: Indexes on Memory-Optimized Tables
77
Chapter 4: Indexes on Memory-Optimized Tables
78
Chapter 4: Indexes on Memory-Optimized Tables
Figure 4-7: Attempting to insert a new row into a full index page.
Assume we have executed an INSERT statement that inserts a row with key value of 5 into
this table, so that 5 now needs to be added to the range index. The first entry in page Pp is a 5,
which means 5 is the maximum value that could occur on the page to which Pp points, which
is Ps. Page Ps doesn't currently have a value 5, but page Ps is where the 5 belongs. However,
the page Ps is full, so it is unable to add the key value 5 to the page, and it has to split.
The split operation occurs in one atomic operation consisting of two steps, as described in the
next two sections.
79
Chapter 4: Indexes on Memory-Optimized Tables
In the same atomic operation as splitting the page, SQL Server updates the page mapping
table to change the pointer to point to P1 instead of Ps. After this operation, page Pp points
directly to page P1; there is no pointer to page Ps, as shown in Figure 4-9.
Figure 4-9: The pointer from the parent points to the first new child page.
80
Chapter 4: Indexes on Memory-Optimized Tables
In the same atomic operation as creating the new pointer, SQL Server then updates the page
mapping table to change the pointer from Pp to Ppp, as shown in Figure 4-11.
81
Chapter 4: Indexes on Memory-Optimized Tables
82
Chapter 4: Indexes on Memory-Optimized Tables
The merge operation occurs in three atomic steps, as described over the following sections.
83
Chapter 4: Indexes on Memory-Optimized Tables
Figure 4-13: The delta page and the merge-delta page are added to indicate a deletion.
84
Chapter 4: Indexes on Memory-Optimized Tables
Figure 4-14: Pointers are adjusted to get ready for the merge.
85
Chapter 4: Indexes on Memory-Optimized Tables
Columnstore Indexes
Columnstore indexes were first introduced in SQL Server 2012, and SQL Server 2014 added
updateable clustered columnstore indexes, as well as making many other improvements to
the technology. The SQL Server 2014 version of in-memory OLTP did not support column-
store indexes, but in SQL Server 2016, for the first time, we can create a clustered column-
store index on a memory-optimized table.
Columnstore indexes were intended for analytics, including reports and analysis, and
in-memory OLTP, as the name "OLTP" implies, was intended for operational data that was
very quickly growing and changing. So surely the two technologies are "incompatible"? In
fact, in many systems the line is blurred as to what data is operational and what is used for
analytics, and the same data may need to be available for both purposes.
Instead of trying to "bend" memory-optimized tables to be better with analytic queries, and
columnstore indexes to be better with OLTP data modification, SQL Server 2016 tries to offer
a solution that uses the strength of both technologies, and hides the seams and storage details
from the users.
86
Chapter 4: Indexes on Memory-Optimized Tables
SQL Server will attempt to put the full 220 values in each rowgroup, and place however many
rows are left over in a final rowgroup. For example, if there are exactly 10 million rows in a
table, there would be 9 rowgroups of 1,048,576 values and one of 562,816 values. However,
because the index is usually built in parallel, with each thread processing its own subset of
rows, there may be multiple rowgroups with fewer than the full 1,048,576 values.
Within each rowgroup, SQL Server applies its Vertipaq compression technology which
encodes the values and then rearranges the rows within the rowgroup to give the best
compression results.
Figure 4-16 represents a table of about 3 million rows, which has been divided into three row
groups. All four columns from the table are defined as part of the columnstore index, so we
end up with 12 compressed column segments, three segments for each of the four columns.
87
Chapter 4: Indexes on Memory-Optimized Tables
88
Chapter 4: Indexes on Memory-Optimized Tables
• The "deleted bitmap" – a separate, internal table, which stores the Row IDs of all
rows that have been deleted.
Remember that on memory-optimized tables, "new rows" included both newly-inserted
rows and updated rows, since UPDATE is always performed by deleting the old row and
inserting a new one. In other words, the delta rowgroup stores both newly-inserted rows
and updated row values, and the deleted bitmap indicates deleted rows and the old versions
of updated rows.
You can think of the delta rowgroup as similar to the delta store in a disk-based table with
a clustered columnstore index, but it's not exactly the same because rows in a delta rowgroup
are part of the memory-optimized table and not technically part of the columnstore index.
As new rows are added to a memory-optimized table with a columnstore index, instead of
being immediately added to the compressed rowgroups of the columnstore index, they are
initially only available as regular rows, accessible through any of the other memory-opti-
mized table's indexes, but are maintained separately from the rest of the table, in the delta
rowgroup, as shown in Figure 4-17.
89
Chapter 4: Indexes on Memory-Optimized Tables
If a SQL Server 2016 memory-optimized table has a clustered columnstore index, there will
be two varheaps in addition to the varheap for the table itself. One of the varheaps will be for
the compressed rowgroups, and the new rows in the delta rowgroup will be allocated from a
separate varheap. This separate storage allows SQL Server to quickly identify the rows that
have not yet been compressed into segments of the columnstore index.
A background thread wakes up every 2 minutes, by default, and examines the rows that
have been added to delta rowgroup. If the count of such rows exceeds 1 million, the thread
performs the following two operations:
1. The rows are copied into one or more rowgroups, from which each of the segments
will compressed and encoded to become part of the clustered columnstore index.
2. The rows will be moved from the special delta rowgroup varheap to the varheap
that the rest of the rows from the table use.
SQL Server does not actually count the rows; it uses a best guess estimate. but the number
of rows in a columnstore index rowgroup can never exceed 1,048,576. If there are more
than 100,000 leftover rows after the rowgroup is compressed, another rowgroup will be
created and compressed. If there are less than 100,000 rows left over, those rows will not be
compressed and will only be available through the delta rowgroup in the memory-optimized
table itself.
If you would like to delay the compression of newly added rows, because UPDATE opera-
tions might be performed on them, or they might even be quickly removed due to your
application logic, you can configure a waiting period. When a memory-optimized table with
a clustered columnstore index is created, you can add an option called COMPRESSION_
DELAY that specifies how many minutes a row must exist in the delta rowgroup before it is
considered for compression. Only when the delta rowgroup accumulates a million rows that
have been there for more than this number of minutes will those rows be compressed into
the regular columnstore index rowgroups. This allows the "hot" data, that is the most likely
data to be updated, to be updated more efficiently. If we assume most data manipulations
happen within a short time window, after insertion, we can delay conversion to compressed
rowgroups until after most of the manipulations have been done.
After rows have been converted to compressed rowgroups, all DELETE operations are
indicated by marking the row as deleted in the "deleted bitmap," the same as for a clustered
columnstore index on a disk-based table.
Lookups can be very inefficient once you have many deleted rows. There is no way to do
any kind of reorganization on a columnstore index, other than dropping and rebuilding the
90
Chapter 4: Indexes on Memory-Optimized Tables
index. However, once 90% of the rows in a rowgroup have been deleted, the remaining 10%
are automatically reinserted into the uncompressed varheap, in the delta rowgroup of the
memory-optimized table.
91
Chapter 4: Indexes on Memory-Optimized Tables
92
Chapter 4: Indexes on Memory-Optimized Tables
Figure 4-18: Memory consumers for memory-optimized table with columnstore index.
The output in Figure 4-18 shows six rows for the table itself. There is one memory consumer
for the compressed rowgroups of the columnstore index (the HKCS_COMPRESSED
consumer, with a consumer description of HkCS Allocator), one for the range index, two for
the hash indexes and two for the table heap, sometimes called a rowstore to differentiate it
from a columnstore.
One of the rowstore varheaps is for most of the table rows, and the second one is for
the delta rowgroup, which will contain the newly added rows that are not yet also part
of the compressed rowgroups. There are also four internal tables for any table with a
columnstore index; note that they all have different xtp_object_id values. Each of
these internal tables has at least one index to help SQL Server access the information in the
internal table efficiently.
The four internal tables, which you can see in Figure 4-18, are ROW_GROUPS_INFO_
TABLE (plus hash index), SEGMENTS_TABLE (plus two hash indexes), DICTIONARIES_
TABLE (plus hash index) and DELETED_ROWS_TABLE (plus range index). The details of
what these internal tables are used for is not specifically an in-memory OLTP topic, so it is
out of scope for this book.
Besides looking at the DMVs for memory consumers, another DMV that you might want
to examine is sys.dm_db_column_store_row_group_physical_stats.
This view not only tells you how many rows are in each compressed and delta rowgroup
in a columnstore index, but for rowgroups with less than the maximum number of rows, it
tells you why there isn't the maximum number. If you would like to see this information for
yourself, run the code in Listing 4-3, which inserts 10,000,000 rows into the table created
above, and examines several values in sys.dm_db_column_store_row_group_
physical_stats.
93
Chapter 4: Indexes on Memory-Optimized Tables
USE IMDB
GO
SET NOCOUNT ON
GO
BEGIN TRAN
DECLARE @i int = 0
WHILE (@i < 10000000)
BEGIN
INSERT INTO dbo.OrderDetailsBig VALUES (@i, @i % 1000000, @i %
57, @i % 10, 0.5)
SET @i = @i + 1
IF (@i % 264 = 0)
BEGIN
COMMIT TRAN;
BEGIN TRAN;
END
END
COMMIT TRAN;
GO
--
SELECT row_group_id, state_desc, total_rows, trim_reason_desc
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = object_id('dbo.OrderDetailsBig')
ORDER BY row_group_id;
My results are shown in Figure 4-19. Note that the actual numbers you get may vary.
In the output in Figure 4-19 you can see three different values in the trim_reason_desc
column, which is an explanation of the reason why a COMPRESSED rowgroup has fewer than
1,048.576 rows. Obviously, for a rowgroup with the maximum number of rows, we don't
need any explanation, so the value is NO_TRIM. The OPEN rowgroup is the delta rowgroup
and is not compressed, so its value is always NULL. The value STATS_MISMATCH indicates
the estimate of the number of rows was too low, and there actually weren't the full number
when the compression was performed. The fourth value, SPILLOVER, is used for rowgroups
that contain the leftover rows after a full rowgroup is created.
94
Chapter 4: Indexes on Memory-Optimized Tables
95
Chapter 4: Indexes on Memory-Optimized Tables
In addition, the view sys.hash_indexes, which contains all the columns from
sys.indexes, but only the rows where type = 7, has one additional column:
bucket_count.
Storage space used by your memory-optimized tables and their indexes is shown in the DMV
sys.dm_db_xtp_table_memory_stats. Listing 4-5 lists each memory-optimized
table and the space allocated and used, both for the data and for the indexes.
Listing 4-5: Examine space allocated and used for memory-optimized tables.
You might also want to inspect the following dynamic management objects to help manage
your memory-optimized tables:
• dm_db_xtp_index_stats – This view reports on the number of rows in each
index, as well as the number of times each index has been accessed and the number
of rows that are no longer valid, based on the oldest timestamp in the database.
• dm_db_xtp_hash_index_stats – This view can help you manage and
tune the bucket counts of your hash indexes, as it reports on the number of empty
buckets, and the average and maximum chain length.
• dm_db_xtp_nonclustered_index_stats – This view can help you
manage your range indexes. It includes information about the operation on the
Bw-tree including page splits and merges.
Other DMVs will be discussed as they become relevant to particular topics.
96
Chapter 4: Indexes on Memory-Optimized Tables
Summary
Memory-optimized tables comprise individual rows connected by indexes. This chapter
described the two special index structures available for memory-optimized tables: hash
indexes and range indexes.
Hash indexes have a fixed number of buckets, each of which holds a pointer to a chain of
rows. Ideally, all the rows in a single bucket's chain will have the same key value, and the
correct choice for the number of buckets, which is declared when the table is created, can
help ensure this.
Range indexes are stored as Bw-trees, which are like SQL Server's traditional B-trees in
some respects, but very different in others. The internal pages in Bw-trees contain key values
and pointers to pages and the next level. The leaf level of the index contains pointers to
97
Chapter 4: Indexes on Memory-Optimized Tables
chains of rows with matching key values. Just like for our data rows, index pages are never
updated in place. If an index page needs to add or remove a key value, a new page is created
to replace the original.
This chapter also discussed the SQL Server 2016 feature that allows us to build clustered
columnstore indexes on memory-optimized tables. We looked at a brief overview of column-
store indexes in general, but focused on the specifics of how memory-optimized data can be
stored and managed in a clustered columnstore index.
When choosing the correct set of indexes for a table at table creation time, evaluate each
indexed column to determine the best type of index. If the column stores lots of duplicate
values, or queries need to search the column by a range of values, then a range index is the
best choice. Otherwise, choose a hash index.
In the next chapter, we'll look at how concurrent operations are processed and how transac-
tions are managed and logged.
Additional Resources
• Guidelines for Using Indexes on Memory-Optimized Tables:
http://msdn.microsoft.com/en-gb/library/dn133166.aspx.
• The Bw-Tree: A B-tree for New Hardware Platforms:
http://preview.tinyurl.com/pvd5fdk.
• Comprehensive series of blog posts on columnstore indexes by Niko Neuge-
bauer: http://www.nikoport.com/columnstore/.
• Stairway to SQL Server Columnstore Indexes, by Hugo Kornelis:
http://www.sqlservercentral.com/stairway/121631/.
98
Chapter 5: Transaction Processing
Regardless of whether we access disk-based tables or memory-optimized tables, SQL Server
must manage concurrent transactions against these tables in a manner that preserves the
ACID properties of every transaction. Every transaction runs in a particular transaction isola-
tion level, which determines the degree to which it is isolated from the effects of changes
made by the concurrent transactions of other users.
In this chapter, we'll discuss transaction management for memory-optimized tables, and the
isolation levels supported for operations on memory-optimized tables. We'll also explore the
possible validation errors that can occur, and describe how SQL Server In-Memory OLTP
deals with them.
Transaction Scope
SQL Server supports several different types of transaction, in terms of how we define the
beginning and end of the transaction; and when accessing memory-optimized tables the
transaction type can affect the isolation levels that SQL Server supports. The two default
types of transactions are:
• Explicit transactions – use the BEGIN TRANSACTION statement to indicate the
beginning of the transaction, and either a COMMIT TRANSACTION or a ROLL-
BACK TRANSACTION statement to indicate the end. In between, the transaction
can include any number of statements.
• Autocommit transactions – any single data modification operation. In other words,
any INSERT, UPDATE, or DELETE statement (as well as others, such as MERGE
and BULK INSERT), by itself, is automatically a transaction. If we modify one
row, or one million rows, in a single UPDATE statement, SQL Server will consider
the UPDATE operation to be an atomic operation, and will modify either all the
rows or none of them. With an autocommit transaction, there is no way to force
a rollback, manually. A transaction rollback will only occur when there is a
system failure.
In addition, we can also define a non-default type of transaction called an implicit transac-
tion, invoked under the session option SET IMPLICIT_TRANSACTIONS ON. In implicit
transaction mode, the start of any transaction is implied. In other words, any DML statement
99
Chapter 5: Transaction Processing
(such as INSERT, UPDATE, DELETE and even SELECT) will automatically start a transac-
tion. The end of the transaction must still be explicit, and the transaction is not finished until
we issue either a ROLLBACK TRAN or COMMIT TRAN. However, because this is a non-
default type of transaction, we will not be discussing it further.
100
Chapter 5: Transaction Processing
By contrast, SQL Server regulates all access of data in memory-optimized tables using
completely optimistic MVCC. SQL Server does not use locking or latching to provide
transaction isolation, and so data operations never wait to acquire locks. Instead, SQL Server
assumes that concurrent transactions won't interfere and then performs validation checks
once a transaction issues a commit to ensure that it obeys the required isolation properties.
If it does, then SQL Server will confirm the commit. Otherwise, an error will be generated.
We'll look at more details of these validation checks and possible errors later in this chapter.
SQL Server still supports multiple levels of transaction isolation when assessing memory-
optimized tables, but there are differences in the way the isolation levels are guaranteed when
accessing disk-based versus memory-optimized tables.
First, for comparative purposes, let's review briefly the transaction isolation levels that SQL
Server supports when accessing disk-based tables, and then contrast that to the isolation
levels we can use with memory-optimized tables and how they work.
101
Chapter 5: Transaction Processing
made after that are invisible to it. It does not prevent non-repeatable reads or phan-
toms, but they won't appear in the results, so this level has the outward appearance
of SERIALIZABLE. For disk-based tables, SQL Server implements this level,
using row versioning in tempdb.
• REPEATABLE READ – prevents dirty reads and non-repeatable reads but allows
phantom reads. Transactions take shared locks and exclusive locks until the end of
the transaction to guarantee read stability.
• SERIALIZABLE – prevents all read phenomena. To avoid phantoms, SQL Server
adopts a special locking mechanism, using key-range locks, and holds all locks
until the end of the transaction, so that other transactions can't insert new rows into
those ranges.
102
Chapter 5: Transaction Processing
103
Chapter 5: Transaction Processing
The following sections will explain the restrictions, and the reasons for them, with examples.
CREATE TABLE T1
(
Name varchar(32) not null PRIMARY KEY NONCLUSTERED HASH
WITH (BUCKET_COUNT =
100000),
City varchar(32) null,
State_Province varchar(32) null,
LastModified datetime not null,
104
Chapter 5: Transaction Processing
Open a new query window in SQL Server Management Studio, and start an explicit transac-
tion accessing a memory-optimized table, as shown in Listing 5-2.
USE HKDB;
BEGIN TRAN;
SELECT *
FROM dbo.T1;
COMMIT TRAN;
By default, this transaction will run in the READ COMMITTED isolation level, which is the
standard isolation level for most SQL Server transactions, and guarantees that the transac-
tion will not read any dirty (uncommitted) data. If a transaction running under this default
isolation level tries to access a memory-optimized table, it will generate the following error
message, since READ COMMITTED is unsupported for memory-optimized tables:
Accessing memory optimized tables using the READ COMMITTED isolation level is
supported only for autocommit transactions. It is not supported for explicit
or implicit transactions. Provide a supported isolation level for the memory
optimized table using a table hint, such as WITH (SNAPSHOT).
As the message suggests, the transaction needs to specify a supported isolation level, using
a table hint. For example, Listing 5-3 specifies the snapshot isolation level. This combination,
READ COMMITTED for accessing disk-based tables and SNAPSHOT for memory-optimized,
is the one that most cross-container transactions should use. However, alternatively, we
could also use the WITH (REPEATABLEREAD) or WITH (SERIALIZABLE) table hints,
if required.
USE HKDB;
BEGIN TRAN;
SELECT * FROM dbo.T1 WITH (SNAPSHOT);
COMMIT TRAN;
Listing 5-3: Explicit transaction using a table hint to specify snapshot isolation.
105
Chapter 5: Transaction Processing
SQL Server does support READ COMMITTED isolation level for autocommit (single-
statement) transactions, so we can run Listing 5-4, inserting three rows into our table
T1 successfully.
INSERT dbo.T1
( Name, City, LastModified )
VALUES ( 'Jane', 'Helsinki', CURRENT_TIMESTAMP ),
( 'Susan', 'Vienna', CURRENT_TIMESTAMP ),
( 'Greg', 'Lisbon', CURRENT_TIMESTAMP );
Listing 5-4: READ COMMITTED isolation is supported only for autocommit transactions.
106
Chapter 5: Transaction Processing
For snapshot isolation, all operations need to see the versions of the data that existed as of the
beginning of the transaction. For SNAPSHOT transactions, the beginning of the transaction
is considered to be when the first table is accessed. In a cross-container transaction, however,
since the sub-transactions can each start at a different time, another transaction may have
changed data between the start times of the two sub-transactions. The cross-container trans-
action then will have no one point in time on which to base the snapshot, so using transaction
isolation level SNAPSHOT is not allowed.
Table 5-2 shows an example of running the two cross-container transactions, Tx1 and
Tx2 (both of which we can think of as having two "sub-transactions," one for accessing disk-
based and one for accessing memory-optimized tables). It illustrates why a transaction can't
use REPEATABLE READ or SERIALIZABLE to access both disk-based and memory-
107
Chapter 5: Transaction Processing
optimized tables, and it essentially boils down to the fact that SQL Server implements the
isolation levels in very different ways in memory-optimized tables, without using any locks.
In Table 5-2, RHk# indicates a row in a memory-optimized table, and RSql# indicates a row
in a disk-based table. Transaction Tx1 reads a row from a memory-optimized table first. SQL
Server acquires no locks. Now assume the second transaction, Tx2, starts after Tx1 reads
RHk1. Tx2 reads and updates RSql1 and then reads and updates RHk1, then commits. Now
when Tx1 reads the row from the disk-based table, it would now have a set of values for
the two rows that could never have existed if the transaction were run in isolation, i.e. if the
transaction were truly serializable, and so this combination is not allowed.
1 BEGIN SQL/in-memory
sub-transactions
2 Read RHk1
3 BEGIN SQL/in-memory sub-transactions
4 Read RSql1 and update to RSql2
5 Read RHk1 and update to RHk2
6 COMMIT
7 Read RSql2
108
Chapter 5: Transaction Processing
Listing 5-8: Setting the database option to elevate isolation level to SNAPSHOT.
We can verify whether this option has been set in two ways, shown in Listing 5-9,
either by inspecting the sys.databases catalog view or by querying the
DATABASEPROPERTYEX function.
SELECT is_memory_optimized_elevate_to_snapshot_on
FROM sys.databases
WHERE name = 'HKDB';
GO
SELECT DATABASEPROPERTYEX('HKDB',
'IsMemoryOptimizedElevateToSnapshotEnabled');
Listing 5-9: Verifying if the database has been set to elevate the isolation level
to SNAPSHOT.
Otherwise, as demonstrated earlier, simply set the required isolation level on the fly, using a
table hint. We should also consider that having accessed a table in a cross-container transac-
tion using an isolation level hint, a transaction should continue to use that same hint for all
subsequent access of the table, though this is not enforced. Using different isolation levels for
the same table, whether a disk-based table or memory-optimized table, will usually lead to
failure of the transaction.
109
Chapter 5: Transaction Processing
The output should look similar to that shown in Figure 5-1, with two transactions.
When the first statement accessing a memory-optimized table is executed, SQL Server
obtains a transaction-ID for the T-SQL part of the transaction (transaction_id) and a
transaction-ID for the in-memory OLTP portion (xtp_transaction_id).
The xtp_transaction_id values are generated by the Transaction-ID counter
(described in Chapter 3). It is this value that SQL Server inserts into End-Ts for rows that
an active transaction is deleting, and into Begin-Ts for rows that an active transaction is
inserting. We can also see that both of these transactions have the same value for begin_
tsn, which is the current timestamp for the last committed transaction at the time the trans-
action started.
110
Chapter 5: Transaction Processing
Since both transactions are still active, there is no value for the end_tsn timestamp. The
begin_tsn timestamp is only important while the transaction is running and is never saved
in row versions, whereas the end_tsn, upon COMMIT, is the value written into the Begin-
Ts and End-Ts for the affected rows.
During the processing phase, SQL Server links the new <Jane, Perth> row into the index
structure and marks the <Greg, Lisbon> and <Jane, Helsinki> as deleted. Figure 5-2 shows
what the rows will look at this stage, within our index structure (with hash indexes on Name
and City; see Chapter 4).
111
Chapter 5: Transaction Processing
I've just used Tx1 for the transaction-id, but you can use Listing 5-10 to find the real
values of xtp_transaction_id.
Write-write conflicts
What happens if another transaction, TxU, tries to update Jane's row? (Remember Tx1 is
still active.)
USE HKDB;
BEGIN TRAN TxU;
UPDATE dbo.T1 WITH ( SNAPSHOT )
SET City = 'Melbourne'
WHERE Name = 'Jane';
COMMIT TRAN TxU
Listing 5-12: TxU attempts to update a row while Tx1 is still uncommitted.
112
Chapter 5: Transaction Processing
As discussed in Chapter 3, TxU sees Tx1's transaction-id in the <Jane, Helsinki> row
and, because SQL Server optimistically assumes Tx1 will commit, immediately aborts TxU,
raising a conflict error.
Read-write conflicts
If a query tries to update a row that has already been updated by an active transaction, SQL
Server generates an immediate "update conflict" error. However, SQL Server does not catch
most other isolation level errors until the transaction enters the validation phase. Remember,
no transaction acquires locks so it can't block other transactions from accessing rows. We'll
discuss the validation phase in more detail in the next section, but it is during this phase that
SQL Server will perform checks to make sure that any changes made by concurrent transac-
tions do not violate the specified isolation level. Let's continue our example, and see the sort
of violation that can occur.
Our original Tx1 transaction, which started at timestamp 240, is still active, and let's now
start two other transactions that will read the rows in table T1:
• Tx2 – an autocommit, single-statement SELECT that starts at timestamp 243.
• Tx3 – an explicit transaction that reads a row and then updates another row based
on the value it read in the SELECT; it starts at a timestamp of 246.
Tx2 starts before Tx1 commits, and Tx3 starts before Tx2 commits. Figure 5-3 shows the
rows that exist after each transaction commits.
113
Chapter 5: Transaction Processing
Figure 5-3: Version visibility after each transaction ends, but before validation.
When Tx1 starts at timestamp 240, three rows are visible, and since Tx1 does not commit
until timestamp 250, after Tx2 and Tx3 have started, those are the rows all three of the trans-
actions see. After Tx1 commits, there will only be two rows visible, and the City value for
Jane will have changed. When Tx3 commits, it will attempt to change the City value for
Susan to Helsinki.
In a second query window in SSMS, we can run our autocommit transaction, Tx2, which
simply reads the T1 table.
USE HKDB;
SELECT Name ,
City
FROM T1;
114
Chapter 5: Transaction Processing
Tx2's session is running in the default isolation level, READ COMMITTED but, as described
previously, for a single-statement transaction accessing a memory-optimized table, we can
think of Tx2 as running in snapshot isolation level, which for a single-statement SELECT
will give us the same behavior as READ COMMITTED.
Tx2 started at timestamp 243, so it will be able to read rows that existed at that time. It will
not be able to access <Greg, Beijing>, for example, because that row was valid between
timestamps 100 and 200. The row <Greg, Lisbon> is valid starting at timestamp 200, so
transaction Tx2 can read it, but it has a transaction-id in End-Ts because Tx1 is currently
deleting it. Tx2 will check the global transaction table and see that Tx1 has not committed,
so Tx2 can still read the row. <Jane, Perth> is the current version of the row with "Jane," but
because Tx1 has not committed, Tx2 follows the pointer to the previous row version, and
reads <Jane, Helsinki>.
Tx3 is an explicit transaction that starts at timestamp 246. It will run using REPEATABLE
READ isolation, and read one row and update another based on the value read, as shown in
Listing 5-14 (again, don't commit it yet).
DECLARE @City NVARCHAR(32);
BEGIN TRAN TX3
SELECT @City = City
FROM T1 WITH ( REPEATABLEREAD )
WHERE Name = 'Jane';
UPDATE T1 WITH ( REPEATABLEREAD )
SET City = @City
WHERE Name = 'Susan';
COMMIT TRAN -- commits at timestamp 260
Listing 5-14: Tx3 reads the value of City for "Jane" and updates the "Susan" row
with this value.
In Tx3, the SELECT will read the row <Jane, Helsinki> because that row still is
accessible as of timestamp 243. It will then delete the <Susan, Bogota> and insert the
row <Susan, Helsinki>.
What happens next depends on which of Tx1 or Tx3 commits first. In our scheme from
Figure 5-3, Tx1 commits first. When Tx3 tries to commit after Tx1 has committed, SQL
Server will detect during the validation phase that the <Jane, Helsinki> row has been updated
by another transaction. This is a violation of the requested REPEATABLE READ isolation, so
the commit will fail and transaction Tx3 will roll back.
115
Chapter 5: Transaction Processing
To see this in action, commit Tx1, and then try to commit Tx3. You should see the following
error message:
Msg 41305, Level 16, State 0, Line 0
The current transaction failed to commit due to a repeatable read validation
failure.
So Tx1 commits and Tx3 aborts and, at this stage, the only two rows visible will be <Susan,
Vienna> and <Jane, Perth>.
If Tx3 had committed before Tx1, then both transactions would succeed, and the final rows
visible would be <Jane, Perth> and <Susan, Helsinki>, as shown in Figure 5-3.
Let's now take a look in a little more detail at other isolation level violations that may occur
in the validation stage, and at the other actions SQL Server performs during this phase.
Validation Phase
Once a transaction issues a commit and SQL Server generates the commit timestamp, but
prior to the final commit of transactions involving memory-optimized tables, SQL Server
performs a validation phase. As discussed briefly in Chapter 3, this phase consists broadly of
the following three steps:
1. Validate the changes made by Tx1 – verifying that there are no isolation level
violations.
2. Wait for any commit dependencies to reduce the dependency count to 0.
3. Log the changes.
Once it logs the changes (which are therefore guaranteed), SQL Server marks the transaction
as committed in the global transaction table, and then clears the dependencies of any transac-
tions that are dependent on Tx1.
Note that the only waiting that a transaction on memory-optimized tables will experience is
during this phase. There may be waiting for commit dependencies, which are usually very
brief, and there may be waiting for the write to the transaction log. Logging for memory-
optimized tables is much more efficient than logging for disk-based tables (as we'll see in
Chapter 6), so these waits can also be very short.
The following sections review each of these three steps in a little more detail.
116
Chapter 5: Transaction Processing
117
Chapter 5: Transaction Processing
1 BEGIN TRAN
INSERT INTO [dbo].[T1]
WITH (SNAPSHOT)
2 ( Name, City, LastModified ) BEGIN TRAN
VALUES ( 'Bob', 'Basing-
stoke', CURRENT_TIMESTAMP )
INSERT INTO [dbo].[T1] WITH
(SNAPSHOT)( Name, City, Last-
3 Modified )
VALUES ( 'Bob', 'Bognor',
CURRENT_TIMESTAMP )
4 COMMIT TRAN
COMMIT TRAN
Error 41325: The current trans-
5
action failed to commit due to a
serializable validation failure
During validation, Error 41325 is generated, because we can't have two rows with the same
primary key value, and Tx1 is aborted and rolled back.
118
Chapter 5: Transaction Processing
Error 41305: The current transaction failed to commit due to a repeatable read
validation failure.
The transaction will abort. We saw an example of this earlier, in the section on
read-write conflicts.
1 BEGIN TRAN
SELECT Name FROM
2 Person BEGIN TRAN
WHERE City = 'Perth'
INSERT INTO Person VALUES
3
('Charlie', 'Perth')
4 --- other operations
5 COMMIT TRAN
COMMIT TRAN
During validation, Error
6
41325 is generated and Tx1
is rolled back
119
Chapter 5: Transaction Processing
120
Chapter 5: Transaction Processing
If any of the dependent transactions fails to commit, there is a commit dependency failure.
This means the transaction will fail to commit with the following error:
Error 41301: A previous transaction that the current transaction took a dependency
on has aborted, and the current transaction can no longer commit.
Note that Tx1 can only acquire a dependency on Tx2 when Tx2 is in the validation or post-
processing phase and, because these phases are typically extremely short, commit dependen-
cies will be quite rare in a true OLTP system. If you want to be able to determine if you have
encountered such dependencies, you can monitor two extended events. The event depen-
dency_acquiredtx_event will be raised when Tx1 takes a dependency on Tx2, and
the event waiting_for_dependenciestx_event will be raised when Tx1 has
explicitly waited for a dependency to clear.
SQL Server has a limit of a maximum of 8 dependencies that can be taken on a single
transaction, Tx1. In SQL Server 2014, the same error number, 41301, would be generated
when that number was exceeded as when there was a failure of the Tx1 transaction. In SQL
Server 2016, error 41301 is still used for a failure of the Tx1 transaction, but a different error
number, 41839, is generated when the limit of 8 dependencies on a single Tx1 is exceeded.
In addition, trace flag 9962 was introduced to remove the limit and allow unlimited commit
dependencies. Note that this is an undocumented trace flag and should be thoroughly tested
and used with caution.
121
Chapter 5: Transaction Processing
Figure 5-5 shows the write-set for transaction Tx1, from our previous example in this
chapter, in the green box.
Once the log record has been hardened to storage the state of the transaction is changed to
committed in the global transaction table.
The final step in the validation process is to go through the linked list of dependent transac-
tions and reduce their dependency counters by one. Once this validation phase is finished, the
only reason that this transaction might fail is due to a log write failure. Once the log record
has been hardened to storage, the state of the transaction is changed to committed in the
global transaction table.
122
Chapter 5: Transaction Processing
Post-processing
The final phase is the post-processing, which is sometimes referred to as "commit"
processing, and is usually the shortest. The main operations are to update the timestamps of
each of the rows inserted or deleted by this transaction.
• For a DELETE operation, set the row's End-Ts value to the commit timestamp of
the transaction, and clear the type flag on the row's End-Ts field to indicate it is
really a timestamp, and not a transaction-ID.
• For an INSERT operation, set the row's Begin-Ts value to the commit time-
stamp of the transaction and clear the type flag.
If the transaction failed or was explicitly rolled back, inserted rows will be marked as
garbage and deleted rows will have their end-timestamp changed back to infinity.
The actual unlinking and deletion of old row versions is handled by the garbage collection
system. This final step of removing any unneeded or inaccessible rows is not always done
immediately and may be handled either by user threads, once a transaction completes, or by a
separate garbage collection thread.
123
Chapter 5: Transaction Processing
point in time, in other words any rows with an End-Ts value that is earlier than this time,
are considered stale. Stale rows can be removed and their memory can be released back to
the system.
The garbage collection system is designed to be non-blocking, cooperative, and scalable. Of
particular interest is the "cooperative" attribute. Although there is a dedicated system thread
for the garbage collection process, called the idle worker thread, user threads actually do
most of the work.
If, while scanning an index during a data modification operation, (all index access on
memory-optimized tables is considered to be scanning), a user thread encounters a stale row
version, it will either mark the row as expired, or unlink that version from the current
chain and adjust the pointers. For each row it unlinks, it will also decrement the reference
count in the row header area (reflected in the IdxLinkCount value).
When a user thread completes a transaction, it adds information about the transaction to
a queue of transactions to be processed by the idle worker thread. Each time the garbage
collection process runs, it processes the queue of transactions, and determines whether the
oldest active transaction has changed.
It moves the transactions that have committed into one or more "worker" queues, sorting the
transactions into "generations" according to whether or not it committed before or after the
oldest active transaction. We can view the transactions in each generation using the sys.
dm_db_xtp_gc_cycle_stats DMV (for which see Chapter 8.) It groups the rows asso-
ciated with transactions that committed before the oldest active transaction into "work items,"
each consisting of a set of 16 "stale" rows that are ready for removal. The final act of a user
thread, on completing a transaction, is to pick up one or more work items from a worker
queue and perform garbage collection, i.e. free the memory used by the rows making up the
work items.
124
Chapter 5: Transaction Processing
The idle worker thread will dispose of any stale rows that were eligible for garbage collec-
tion, but not accessed by a user transaction, as part of what is termed a "dusty corner" scan.
Every row starts with a reference value of 1, so the row can be referenced by the garbage
collection mechanism even if the row is no longer connected to any indexes. The garbage
collector process is considered the "owner" of the initial reference.
The garbage collection thread processes the queue of completed transactions about once a
minute, but the system can adjust the frequency internally, based on the number of completed
transactions waiting to be processed. As noted above, each work item it adds to the worker
queue currently consists of a set of 16 rows, but that number is subject to change in future
versions. These work items are distributed across multiple worker queues, one for each CPU
used by SQL Server.
The DMV sys.dm_db_xtp_index_stats has a row for each index on each memory-
optimized table, and the column rows_expired indicates how many rows have been
detected as being stale during scans of that index. There is also a column called rows_
expired_removed that indicates how many rows have been unlinked from that index.
As mentioned above, once a row has been unlinked from all indexes on a table, that row can
be removed by the garbage collection thread. So you will not see the rows_expired_
removed value going up until the rows_expired counters have been incremented for
every index on a memory-optimized table.
The query in Listing 5-15 allows us to observe these values. It joins the sys.dm_db_xtp_
index_stats DMV with the sys.indexes catalog view to be able to return the name
of the index.
SELECT name AS 'index_name' ,
s.index_id ,
scans_started ,
rows_returned ,
rows_expired ,
rows_expired_removed
FROM sys.dm_db_xtp_index_stats s
JOIN sys.indexes i ON s.object_id = i.object_id
AND s.index_id = i.index_id
WHERE OBJECT_ID('<memory-optimized table name>') = s.object_id;
GO
125
Chapter 5: Transaction Processing
Depending on the volume of data changes and the rate at which new versions are generated,
SQL Server can be using a substantial amount of memory for old row versions and we need
to make sure that our system has enough memory available. I'll tell you more about memory
management for a database supporting memory-optimized tables in Chapter 8.
Summary
This chapter contains a lot of detail on the transaction isolation levels that SQL Server
supports when accessing memory-optimized tables, and also on the valid combination of
levels for cross-container transactions, which can access both disk-based and memory-
optimized tables. In most cases, our cross-container transactions will use standard READ
COMMITTED for accessing disk-based tables, and SNAPSHOT isolation for memory-
optimized tables, set either via a table hint or using the MEMORY_OPTIMIZED_ELEVATE_
TO_SNAPSHOT database property for that database.
In the MVCC model, no transaction acquires locks, and no transaction can prevent another
transaction reading, or attempting to modify, rows that it is currently accessing. Due to the
optimistic model, SQL Server will raise an immediate conflict if one transaction tries to
modify a row that another active transaction is already modifying. However, it will only
detect other read-write conflicts during a validation phase which occurs after a transaction
issues a commit. We investigated the sort of violations that can occur during validation,
depending on the isolation levels being used, and we also considered what happens during
other phases of the validation cycle, such as resolving commit dependencies and hardening
the log records to disk. Finally, we discussed the cooperative garbage collection system that
disposes of stale rows which are no longer visible to any transactions.
We're now ready to take a closer look at the processes by which in-memory OLTP writes to
durable storage, namely the CHECKPOINT process and the transaction logging process.
126
Chapter 5: Transaction Processing
Additional Resources
• General background on isolation levels:
http://en.wikipedia.org/wiki/Isolation_(database_systems).
• A Critique of ANSI SQL Isolation Levels:
http://research.microsoft.com/apps/pubs/default.aspx?id=69541.
• Understanding Transactions on Memory-Optimized Tables:
http://msdn.microsoft.com/en-us/library/dn479429.aspx.
• Transaction dependency limits with memory-optimized tables – Error 41839:
http://preview.tinyurl.com/y87yzpkg.
• System-Versioned Temporal Tables with Memory-Optimized Tables:
https://msdn.microsoft.com/en-us/library/mt590207.aspx.
127
Chapter 6: Logging, Checkpoint,
and Recovery
SQL Server must ensure transaction durability for memory-optimized tables, so that it
can guarantee to recover to a known state after a failure. In-memory OLTP achieves this
by having both the checkpoint process and the transaction logging process write to
durable storage.
The information that SQL Server writes to disk consists of transaction log streams and
checkpoint streams.
• Log streams contain the changes made by committed transactions logged as
insertion and deletion of row versions.
• Checkpoint streams come in three varieties:
• data streams contain all versions inserted during a timestamp interval
• delta streams are associated with a particular data stream and contain a
list of integers indicating which versions in its corresponding data stream
have been deleted
• large data streams contain data from the columnstore index compressed
rowgroups for memory-optimized tables.
The combined contents of the transaction log and the checkpoint streams are sufficient to
allow SQL Server to recover the in-memory state of memory-optimized tables to a transac-
tionally-consistent point in time.
Although the overall requirement for the checkpoint and transaction logging process to write
to durable storage is no different than for normal disk-based tables, for in-memory tables
the mechanics of these processes are rather different, and often much more efficient, as we'll
discuss throughout this chapter.
Though not covered in this book, in-memory OLTP is also integrated with the
AlwaysOn Availability Group feature, and so supports fail-over and recovery to
highly available replicas.
128
Chapter 6: Logging, Checkpoint, and Recovery
Transaction Logging
The combined contents of the transaction log and the checkpoint streams are sufficient to
recover the in-memory state of memory-optimized tables to a transactionally-consistent point
in time. Before we go into more detail of how the log and the checkpoint files are generated
and used, here are a few crucial points to keep in mind:
• Log streams are stored in the regular SQL Server transaction log.
• Checkpoint streams are stored in SQL Server FILESTREAM files which are
sequential files fully managed by SQL Server.
• The transaction log contains enough information about committed transactions to
redo the transaction. The changes are recorded as INSERTs and DELETEs of row
versions marked with the table they belong to. The transaction log stream for the
changes to memory-optimized tables is generated at the transaction commit time.
This is different than disk-based tables where each change is logged at the time of
operation irrespective of the final outcome of the transaction. No undo information
is written to the transaction log for operations on memory-optimized tables.
• Index operations on memory-optimized tables are not logged. With the exception
of compressed segments for columnstore indexes on memory-optimized tables, all
indexes are completely rebuilt on recovery.
129
Chapter 6: Logging, Checkpoint, and Recovery
130
Chapter 6: Logging, Checkpoint, and Recovery
By contrast, for in-memory OLTP, all data modifications are in-memory; it has no concept
of a "dirty page" that needs to be flushed to disk and, since it generates log records only at
commit time, checkpoint will never write to disk log records related to uncommitted trans-
actions. So, while the transaction log contains enough information about committed trans-
actions to redo the transaction, no undo information is written to the transaction log, for
memory-optimized tables.
In order to demonstrate the greatly reduced logging for memory-optimized tables over disk-
based tables, the simple script in Listing 6-1 creates a database, with a single memory-opti-
mized filegroup holding a single container. As always, you may need to edit the file paths to
reflect drives and folders available to you, or you may need to create a new folder. I am using
a single data folder, C:\HKData\.
USE master
GO
DROP DATABASE IF EXISTS LoggingDemo;
GO
CREATE DATABASE LoggingDemo ON
PRIMARY (NAME = [LoggingDemo_data],
FILENAME = 'C:\HKData\LoggingDemo_data.mdf'),
FILEGROUP [LoggingDemo_FG] CONTAINS MEMORY_OPTIMIZED_DATA
(NAME = [LoggingDemo_container1],
FILENAME = 'C:\HKData\LoggingDemo_container1')
LOG ON (name = [LoggingDemo_log],
Filename='C:\HKData\LoggingDemo.ldf', size= 100 MB);
GO
Listing 6-2 creates one memory-optimized table, and the equivalent disk-based table, in the
LoggingDemo database.
USE LoggingDemo
GO
131
Chapter 6: Logging, Checkpoint, and Recovery
Next, Listing 6-3 populates the disk-based table with 100 rows, and examines the contents of
the transaction log using the undocumented (and unsupported) function fn_dblog(). You
should see 200 log records for operations on t1_disk.
SET NOCOUNT ON;
GO
BEGIN TRAN
DECLARE @i INT = 0
WHILE ( @i < 100 )
BEGIN
INSERT INTO dbo.t1_disk
VALUES ( @i, REPLICATE('1', 100) );
SET @i = @i + 1;
END;
COMMIT TRAN;
GO
-- you will see that SQL Server logged 200 log records
SELECT *
FROM sys.fn_dblog(NULL, NULL)
132
Chapter 6: Logging, Checkpoint, and Recovery
Listing 6-3: Populate the disk-based table with 100 rows and examine the log.
Listing 6-4 runs a similar INSERT on the memory-optimized table. Note that, since the
partition_id is not shown in the output for memory-optimized tables, we cannot filter
based on the specific object. Instead, we need to look at the most recent log records, so the
query performs a descending sort based on the LSN.
BEGIN TRAN
DECLARE @i INT = 0
WHILE ( @i < 100 )
BEGIN
INSERT INTO t1_inmem
VALUES ( @i, REPLICATE('1', 100) );
SET @i = @i + 1;
END;
COMMIT TRAN;
-- look at the log
SELECT *
FROM sys.fn_dblog(NULL, NULL)
ORDER BY [Current LSN] DESC;
GO
Listing 6-4: Examine the log after populating the memory-optimized tables with 100 rows.
You should see only three log records related to this transaction, as shown in Figure 6-1, one
marking the start of the transaction, one for the COMMIT, and then just one log record for
inserting all 100 rows.
Figure 6-1: SQL Server transaction log showing one log record for a 100-row transaction.
133
Chapter 6: Logging, Checkpoint, and Recovery
The output implies that all 100 inserts have been logged in a single log record, using an
operation of type LOP_HK, with LOP indicating a "logical operation" and HK being an arti-
fact from the project codename, Hekaton.
We can use another undocumented, unsupported function to break apart a LOP_HK
record, as shown in Listing 6-5 (replace the current LSN value with the LSN for
your LOP_HK record).
SELECT [current lsn] ,
[transaction id] ,
operation ,
operation_desc ,
tx_end_timestamp ,
total_size ,
(SELECT OBJECT_NAME(object_id) FROM sys.dm_db_xtp_
object_stats
WHERE xtp_object_id = -1 * tlog.
xtp_object_id)
AS TableName
FROM sys.fn_dblog_xtp(NULL, NULL) AS tlog
WHERE [Current LSN] = '00000022:00000241:0002';
The first few rows and columns of output should look similar to those shown in Figure 6-2.
It should return 102 rows, including one *_INSERT_ROW operation for each of the 100
rows inserted.
Figure 6-2: Breaking apart the log record for the inserts on the memory-optimized table.
134
Chapter 6: Logging, Checkpoint, and Recovery
The single log record for the entire transaction on the memory-optimized table, plus the
reduced size of the logged information, can help to make transactions on memory-optimized
tables much more efficient. This is not to say, however, that transactions on memory-opti-
mized tables are always going to be more efficient, in terms of logging, than operations on
disk-based tables. For very short transactions particularly, disk-based and memory-optimized
will generate about the same amount of log. However, transactions on memory-optimized
tables should never be any less efficient than on their disk-based counterparts.
Checkpoint
The two main purposes of the checkpoint operation for disk-based tables, are to improve
performance by batching up I/O rather than continually writing a page to disk every time it
changes, and to reduce the time required to run recovery. If checkpoint ran only very infre-
quently then, during recovery, there could be a huge number of data rows to which SQL
Server needed to apply redo, as a result of committed transactions hardened to the log but
where the data pages were not hardened to disk before SQL Server entered recovery.
Similarly, one of the main reasons for checkpoint operations for memory-optimized tables, is
to reduce recovery time. The checkpoint process for memory-optimized tables is designed to
satisfy two important requirements:
• Continuous checkpointing – Checkpoint-related I/O operations occur incremen-
tally and continuously as transactional activity accumulates. This is in contrast to
the hyper-active checkpoint scheme for disk-based tables, defined as checkpoint
processes which sleep for a while, after which they wake up and work as hard as
possible to finish up the accumulated work, and which can potentially be disruptive
to overall system performance.
• Streaming I/O – Checkpointing for memory-optimized tables relies on streaming
I/O rather than random I/O for most of its operations. Even on SSD devices
random I/O is slower than sequential I/O and can incur more CPU overhead due to
smaller individual I/O requests. In addition, SQL Server 2016 can now read the log
in parallel using multiple serializers, which will be discussed below.
Since checkpointing is a continuous process for memory-optimized tables, when we talk
about a checkpoint "event," we're actually talking about the closing of a checkpoint. The
later section, The checkpoint event, describes exactly what happens during the checkpoint
closing process.
135
Chapter 6: Logging, Checkpoint, and Recovery
136
Chapter 6: Logging, Checkpoint, and Recovery
A checkpoint delta file stores information about which rows contained in its partner data file
have been subsequently deleted. When we delete rows, the checkpoint thread will append
a reference to the deleted rows (their IDs) to the corresponding delta files. Delta files are
append-only for the lifetime of the data file to which they correspond. At recovery time, the
delta file is used as a filter to avoid reloading deleted versions into memory. Because each
data file is paired with exactly one delta file, the smallest unit of work for recovery is a
data/delta file pair. This allows the recovery process to be highly parallelizable. For some
operations, all we're concerned about are the data and delta files, and since they always
come in pairs, they can be referenced together as checkpoint file pairs, or CFPs.
A large data file stores your large column values or the contents of one rowgroup for a
columnstore index. If you have no large columns or columnstore indexes, there will be
several PRECREATED data files of the size usually used for large data, but they will keep
the file type FREE.
A root file keeps track of the files generated for each checkpoint event, and a new active root
file is created each time a checkpoint event occurs.
As mentioned, the data and delta files are the main types, because they contain the informa-
tion about all the transactional activity against memory-optimized tables. Because the data
and delta files always exist in a 1:1 relationship (once they actually contain data), a data file
and its corresponding delta file are sometimes referred to as a checkpoint file pair or CFP.
As soon as your first memory-optimized table is created, SQL Server will create about 16
files of various sizes. The sizes are always a power of two megabytes, from 8 MB to 1 GB.
When a file of one of the other types in this list is needed, SQL Server takes one of the FREE
files that is as least as big as the size needed. If a file bigger than 1 GB is required, SQL
Server must use a 1 GB file and enlarge it to the appropriate size.
Initially SQL Server will create at least three FREE files of sufficient size for each of the
four types mentioned above, plus a few 1 MB files that can grow as needed. The initial sizes
for each of the file types depend on the specification of the computer that your SQL Server
instance is running on. A machine with more than 16 cores and 128 GB of memory is consid-
ered high-end. We can also consider a system as low-end if it has less than 16 GB memory.
Table 6-1 shows the sizes of the FREE files created for each of the four file types, depending
on the size of your machine. (Note that, although the sizes are determined in anticipation of
being used for a particular type of file, there is no actual requirement that a FREE file be used
for a specific purpose.) In addition to the FREE files, SQL Server may create one or more
CFPs (with both data and delta files) in the PRECREATED state when the first memory-
optimized table is created.
137
Chapter 6: Logging, Checkpoint, and Recovery
Let's take an initial look, at the file system level, at how SQL Server creates these checkpoint
files. First, create a CkptDemo database, with a single container, as shown in Listing 6-6.
USE master
GO
DROP DATABASE IF EXISTS CkptDemo;
GO
CREATE DATABASE CkptDemo ON
PRIMARY (NAME = [CkptDemo_data],
FILENAME = 'C:\HKData\CkptDemo_data.mdf'),
FILEGROUP [CkptDemo_FG] CONTAINS MEMORY_OPTIMIZED_DATA
(NAME = [CkptDemo_container1],
FILENAME = 'C:\HKData\CkptDemo_mod')
LOG ON (name = [CkptDemo_log],
Filename='C:\HKData\CkptDemo.ldf', size= 100 MB);
GO
ALTER DATABASE CkptDemo SET RECOVERY FULL;
GO
At this point, you might want to look in the folder containing the memory-optimized data
files, in this example HKData\CkptDemo_mod. Within that folder is one subfolder called
$FSLOG and another $HKv2. Open the $HKv2folder, and you will find it is empty. It will
remain empty until we create a memory-optimized table, as shown in Listing 6-7.
USE CkptDemo;
GO
-- create a memory-optimized table with each row of size > 8KB
CREATE TABLE dbo.t_memopt (
c1 int NOT NULL,
c2 char(40) NOT NULL,
c3 char(8000) NOT NULL,
CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1)
138
Chapter 6: Logging, Checkpoint, and Recovery
At this point, if I re-examine the previously empty folder, I find that it now contains 17 files,
as shown in Figure 6-3. The larger ones are for use as data files, and the smaller ones are for
use as delta files, large data files, or root files. Your files may be different sizes if you are
using a high-end or standard machine configuration. Mine is low-end, with less than 16 GB
of memory.
Figure 6-3: The checkpoint files in the container for our memory-optimized tables.
139
Chapter 6: Logging, Checkpoint, and Recovery
Multiple containers can be used to parallelize data load. Basically, if creating a second
container reduces data load time (most likely because it is on a separate hard drive), then use
it. If the second container does not speed up data transfer (because it is just another directory
on the same hard drive), then don't do it. The basic recommendation is to create one container
per spindle (or I/O bus).
140
Chapter 6: Logging, Checkpoint, and Recovery
Finally, assuming a log backup (which requires at least one database backup) has occurred,
these CFPs are no longer required and can be removed by the checkpoint file garbage
collection process.
Let's take a more detailed look at the various states through which checkpoint files transition.
141
Chapter 6: Logging, Checkpoint, and Recovery
• MERGE TARGET – These files are in the process of being constructed by merging
ACTIVE files that have adjacent transaction ranges. Since these files are in the
process of being constructed and they duplicate information in the ACTIVE files,
they will not be used for crash recovery. Once the merge operation is complete, the
MERGE TARGET files will become ACTIVE.
• WAITING FOR LOG TRUNCATION – Once a merge operation is complete, the
old ACTIVE files, which were the source of the MERGE operation, will transition
to WAITING FOR LOG TRUNCATION. CFPs in this state are needed for the
operational correctness of a database with memory-optimized tables. For example,
these files would be needed to recover from a durable checkpoint to restore to
a previous point in time during a restore. A CFP can be marked for garbage
collection once the log truncation point moves beyond its transaction range.
As described, files can transition from one state to another, but only a limited number of
transitions are possible. Figure 6-4 shows the possible transitions.
Figure 6-4: Possible state transitions for checkpoint files in SQL Server 2016.
142
Chapter 6: Logging, Checkpoint, and Recovery
143
Chapter 6: Logging, Checkpoint, and Recovery
Listing 6-8 returns the following metadata columns (other columns are available; see the
documentation for a full list):
• file_type_desc
Identifies the file as FREE, DATA, DELTA, LARGE OBJECT or ROOT.
• state_desc
The state of the file (see previous bullet list).
• internal_storage_slot
This value is the pointer to an internal storage array (described below), but is not
populated until a file becomes ACTIVE.
• file_size_in_MB
The DMV contains a size in bytes, but my code converts it to megabytes.
• logical_row_count
This column contains the number of rows inserted for DATA files and the
number of rows deleted for DELTA files.
• lower_bound_tsn
This is the timestamp for the earliest transaction covered by this checkpoint file.
• upper_bound_tsn
This is the timestamp for the last transaction covered by this checkpoint file.
• checkpoint_file_id
This is the internal identifier for the file.
• relative_file_path
The location of the file relative to the checkpoint file container. This value allows
you to map the rows in this DMV to the rows seen in the CkptDemo_mod
container. Also note that the final component of the path, once the brackets are
removed, is the checkpoint_file_id.
The output from the code in Listing 6-9 shows me 16 PRECREATED files: 12 of type FREE,
and two each of DATA and DELTA. There is one file in the ACTIVE state, of type ROOT.
We'll see some actual output in an example below.
The metadata of all checkpoint file pairs that exist on disk is stored in an internal array
structure referred to as the storage array in which each entry refers to a CFP. As of
SQL Server 2016 the number of entries (or slots) in the array is dynamic. The entries
in the storage array are ordered by timestamp and, as mentioned, each CFP contains
transactions in a particular range.
144
Chapter 6: Logging, Checkpoint, and Recovery
The CFPs referenced by the storage array (along with the tail of the log) represent all the
on-disk information required to recover the memory-optimized tables in a database. The
internal_storage_slot value in the sys.dm_db_xtp_checkpoint_files
DMV refers to a location in the storage array.
Let's see an example of some of these checkpoint file state transitions in action. At this stage,
our CkptDemo database has one empty table, and we've seen that SQL Server has created
17 files. If we're not interested in all the details of the checkpoint files, we can take a look at
some of the basic checkpoint file metadata, using the query in Listing 6-9. In this case, we
just return the file type, the state of each file, and the relative path to each file.
SELECT file_type_desc ,
state_desc ,
relative_file_path
FROM sys.dm_db_xtp_checkpoint_files
ORDER BY file_type_desc
GO
Listing 6-9: Examine the basic metadata for your checkpoint files.
Figure 6-5 shows that of the 17 checkpoint files (2 DATA files, 2 DELTA files, 12 FREE files
and 1 ROOT), 16 have the state PRECREATED, and only one is ACTIVE.
145
Chapter 6: Logging, Checkpoint, and Recovery
Let's now put some rows into the t_memopt table, as shown in Listing 6-10. The script also
backs up the database so that we can make log backups later (although the full backup does
not affect what we will shortly see in the metadata).
-- INSERT 8000 rows.
-- This should load 5 16MB data files on a machine with <= 16GB of
memory.
Listing 6-10: Populate the memory-optimized tables with 8000 rows and
back up the database.
If I look again into the CkptDemo_mod\$HKv2 subfolder in the file system browser, I now
see 33 files.
Now let's return to look at the checkpoint file metadata in a little more detail by rerunning
the query in Listing 6-8. Figure 6-6 shows the 33 checkpoint files returned, and the property
values for each file.
146
Chapter 6: Logging, Checkpoint, and Recovery
Notice that there are still no ACTIVE data files because there has been no checkpoint event
yet. However, I now have seven UNDER CONSTRUCTION CFPs and, because of the contin-
uous checkpointing, the data files of these CFPs contain almost the full 8000 data rows (five
files have 1178 rows, one has 1178 and one has 917, as we can see from the logical_
row_count column). If SQL Server needed to recover this table's data at this point, it
would do it completely from the transaction log.
However, let's see what happens when a checkpoint event occurs, also referred to as
"closing a checkpoint."
CHECKPOINT;
GO
In the output, we'll see one or more CFPs (in this case, five) with the state ACTIVE and with
non-NULL values for the internal_storage_slot, as shown in Figure 6-7.
147
Chapter 6: Logging, Checkpoint, and Recovery
148
Chapter 6: Logging, Checkpoint, and Recovery
Controller thread
This thread scans the transaction log to find ranges of transactions that can be given to
sub-threads, called "serializers." Each range of transactions is referred to as a "segment."
Segments are identified by a special segment log record which has information about the
range of transactions within the segment. When the controller sees such a log record, the
referenced segment is assigned to a serializer thread.
Segment generation
A set of transactions is grouped into a segment when a user transaction (with transac-
tion_id T) generates log records that cause the log to grow and cross the 1 MB boundary
from the end of the previous segment end point. The user transaction T will then close
the segment. Any transactions with a transaction_id less than T will continue to be
associated with this segment and their log records will be part of the segment. The last
transaction will write the segment log record when it completes. Newer transactions, with
a transaction_id greater than T, will be associated with subsequent segments.
Serializer threads
As each segment log record, representing a unique transaction range, is encountered by the
controller thread, it is assigned to a different serializer thread. The serializer thread processes
all the log records in the segment, writing all the inserted and deleted row information to the
data and delta files.
Timer task
A special timer task wakes up at regular intervals to check if the active log has exceeded
1.5 GB since the last checkpoint event. If so, an internal transaction is created which closes
the current open segment in the system and marks it as a special segment that should close
149
Chapter 6: Logging, Checkpoint, and Recovery
a checkpoint. Once all the transactions associated with the segment have completed, the
segment definition log record is written. When this special segment is processed by the
controller thread, it wakes up a "close thread."
Close thread
The close thread generates the actual checkpoint event by generating a new root file which
contains information about all the files that are active at the time of the checkpoint. This
operation is referred to as "closing the checkpoint."
With a completed checkpoint (i.e. the ACTIVE checkpoint files that a checkpoint event
creates), combined with the tail of the transaction log, SQL Server can recover any memory-
optimized table. A checkpoint event has a timestamp, which indicates that the effects of all
transactions before the checkpoint timestamp are recorded in files created by the checkpoint
and thus the transaction log is not needed to recover them. Of course, just as for disk-based
tables, even though that section of the log has been covered by a checkpoint it can still not be
truncated till we've had a log backup.
As discussed previously, the ACTIVE checkpoint files created by a checkpoint event are
"closed" in the sense that the continuous checkpointing process no longer writes new rows to
these data files, but it will still need to write to the associated delta files, to reflect deletion of
existing row versions.
150
Chapter 6: Logging, Checkpoint, and Recovery
Merging can also occur when two adjacent data files are each less than 50% full. Data files
can end up only partially full if a manual checkpoint has been run, which closes the currently
open (UNDER CONSTRUCTION) checkpoint data file and starts a new one.
Automatic merge
To identify a set of files to be merged, a background task periodically looks at all ACTIVE
data/delta file pairs and identifies zero or more sets of files that qualify.
Each set can contain two or more data/delta file pairs that are adjacent to each other such
that the resultant set of rows can still fit in a single data file of size 128 MB (or 16 MB for
machines with 16 GB memory or less). Table 6-2 shows some examples of files that will be
chosen to be merged under the merge policy.
Table 6-2: Examples of files that can be chosen for file merge operations.
It is possible that two adjacent data files are 60% full. They will not be merged and 40%
of storage is unused. So the total disk storage used for durable memory-optimized tables is
effectively larger than the corresponding memory-optimized size. In the worst case, the size
of storage space taken by durable tables could be two times larger than the corresponding
memory-optimized size.
In continuing our previous example, let's now delete half the rows in the t_memopt table as
shown in Listing 6-11.
SET NOCOUNT ON;
DECLARE @i INT = 0;
WHILE ( @i <= 8000 )
BEGIN
DELETE t_memopt
WHERE c1 = @i;
SET @i += 2;
END;
151
Chapter 6: Logging, Checkpoint, and Recovery
GO
CHECKPOINT;
GO
The metadata will now look something like that shown in Figure 6-8, with one additional
CFP and with DELTA files showing logical_row_count values which indicate the
deleted rows.
Figure 6-8: The checkpoint file metadata after deleting half the rows in the table.
The number of deleted rows, spread across seven files adds up to 4000, as expected.
152
Chapter 6: Logging, Checkpoint, and Recovery
After a while, SQL Server will detect that files can be merged, and you will see files with the
state MERGE TARGET, as shown in Figure 6-9.
Once the actual merge has taken place, the original source files for the merge can
be removed.
153
Chapter 6: Logging, Checkpoint, and Recovery
Notice in Figure 6-10 that the total of the rows in the ACTIVE DATA files is 4000 and
there are no rows in any DELTA files. In addition, note that the lower and upper transaction
ranges stay the same. In Figure 6-9, There are 10 ACTIVE DATA files, with the first having
a lower_bound_tsn of 0 and the last having an upper_bound_tsn of 12035. In
Figure 6-10, after the merge, there are only two ACTIVE DATA files but lower_bound_
tsn is still 0 and the last upper_bound_tsn is 12035.
154
Chapter 6: Logging, Checkpoint, and Recovery
Once the merge operation is complete, the files in the state WAITING FOR LOG TRUNCA-
TION can be removed by a garbage collection process as long as the log is being regularly
truncated. Truncation will happen if regular log backups are taken, or if the database is in
auto_truncate mode. Before a checkpoint file can be removed, the in-memory OLTP
engine must ensure that it will not be further required. The garbage collection process is
automatic, and does not require any intervention.
Recovery
Recovery on in-memory OLTP tables starts after the location of the most recent checkpoint
inventory has been recovered during a scan of the tail of the log. Once the SQL Server host
has communicated the location of the checkpoint inventory to the in-memory OLTP engine,
SQL Server and in-memory OLTP recovery proceed in parallel. The Global Transaction
Timestamp is initialized during the recovery process with the highest transaction timestamp
found among the transactions recovered.
In-memory OLTP recovery itself is parallelized. Each delta file represents a filter to eliminate
rows that don't have to be loaded from the corresponding data file. This data/delta file pair
arrangement means that checkpoint file loading can proceed in parallel across multiple I/O
streams with each stream processing a single data file and delta file. The in-memory OLTP
engine creates one thread per core to handle parallel insertion of the data produced by the I/O
streams. The insert threads load into memory all active rows in the data file after removing
the rows that have been deleted. Using one thread per core means that the load process is
performed as efficiently as possible.
As the data rows are loaded they are linked into each index defined on the table the row
belongs to. For each hash index, the row is added to the chain for the appropriate hash
bucket. For each range index, the row is added to the chain for the row's key value, or a
new index entry is created if the key value doesn't duplicate one already encountered during
recovery of the table.
Finally, once the checkpoint file load process completes, the tail of the transaction log is
replayed from the timestamp of the last checkpoint, with the goal of bringing the database
back to the state that existed at the time of the crash.
155
Chapter 6: Logging, Checkpoint, and Recovery
Summary
In this chapter, we looked at how the logging process for memory-optimized tables is more
efficient than that for disk-based tables, providing additional performance improvement for
your in-memory operations. We also looked at how your data changes are persisted to disk
using streaming checkpoint files, so that your data is persisted and can be recovered when
the SQL Server service is restarted, or your databases containing memory-optimized tables
are restored.
Additional Resources
• A white paper describing FILESTREAM storage and management:
http://msdn.microsoft.com/en-us/library/hh461480.aspx.
• Durability for memory-optimized tables:
http://preview.tinyurl.com/ks4lxd4.
• State transitions during merging of checkpoint files:
http://preview.tinyurl.com/mgmrpfr.
156
Chapter 7: Native Compilation of Tables
and Native Modules
In-memory OLTP introduced the concept of native compilation of stored procedures into
SQL Server 2014. SQL Server 2016 allows you to also create natively compiled user-defined
functions (scalar or inline table-valued functions) and triggers. SQL Server can natively
compile modules that access memory-optimized tables and, in fact, memory-optimized
tables themselves are natively compiled. In many cases, native compilation allows faster data
access and more efficient query execution than traditional interpreted T-SQL. Some of the
features and limitations discussed in this section will not apply to natively compiled functions
or triggers, but in general that is due to limitations of those objects, whether they are natively
compiled or not.
The performance benefit of using natively compiled modules increases with the number of
rows and the complexity of the module's code. If a module needs to process just a single
row, it's unlikely to benefit from native compilation, but it will almost certainly exhibit better
performance, compared to interpreted modules, if it uses one or more of the following:
• aggregation
• nested-loops joins
• multi-statement SELECT, INSERT, UPDATE, and DELETE operations
• complex expressions
• procedural logic, such as conditional statements and loops.
You should consider using natively compiled modules for the most performance-critical parts
of your applications, including modules that you execute frequently, that contain logic such
as that described above, and that need to be extremely fast.
157
Chapter 7: Native Compilation of Tables and Native Modules
The T-SQL language consists of high-level constructs such as CREATE TABLE and
SELECT…FROM. The in-memory OLTP compiler takes these constructs, and compiles them
down to native code for fast runtime data access and query execution. The in-memory OLTP
compiler in SQL Server 2016 takes the table and module definitions as input. It generates
C code, and leverages the Visual C compiler to generate the native code. The result of the
compilation of tables and modules is DLLs that are loaded into memory and linked into the
SQL Server process.
SQL Server compiles both memory-optimized tables and natively compiled modules to
native DLLs at the time of creation. Following a SQL Server instance restart or a fail-over,
table and module DLLs are recompiled on first access or execution. The information
necessary to recreate the DLLs is stored in the database metadata; the DLLs themselves
are not part of the database and are not included as part of database backups.
Maintenance of DLLs
The DLLs for memory-optimized tables and natively compiled stored procedures are stored
in the file system, along with other generated files, which are kept for troubleshooting and
supportability purposes.
The query in Listing 7-1 shows all table and module DLLs currently loaded in memory on
the server.
SELECT name ,
description
FROM sys.dm_os_loaded_modules
WHERE description = 'XTP Native DLL'
Listing 7-1: Display the list of all table and procedure DLLs currently loaded.
Database administrators do not need to maintain the files that native compilation generates.
SQL Server automatically removes generated files that are no longer needed, for example on
table and module deletion and on dropping a database, but also on server or database restart.
158
Chapter 7: Native Compilation of Tables and Native Modules
USE NativeCompDemo
GO
CREATE TABLE dbo.t1
(
c1 INT NOT NULL
PRIMARY KEY NONCLUSTERED ,
c2 INT
)
WITH (MEMORY_OPTIMIZED=ON);
GO
159
Chapter 7: Native Compilation of Tables and Native Modules
The table creation results in the compilation of the table DLL, and also in loading that DLL
in memory. The DMV query immediately after the CREATE TABLE statement retrieves the
path of the table DLL. My results are shown in Figure 7-1.
The name of the DLL contains several components. The "xtp" indicates it is the DLL for a
memory-optimized object. The "t_" indicates it's a table (modules will have a "p_"). Next is
the database_id (in my case, 16) and then the object_id. Finally, the last component
is the xtp_object_id which was mentioned in Chapter 2. This value will change every
time an object is altered, while the object_id will stay the same. (Note that filenames
in SQL Server 2014 were slightly different, and did not include the xtp_object_id
component.)
The table DLL for t1 understands the index structures and row format of the table. SQL
Server uses the DLL for traversing indexes and retrieving rows, as well as for determining
the data contents of the rows.
160
Chapter 7: Native Compilation of Tables and Native Modules
Consider the stored procedure in Listing 7-3, which inserts a million rows into the table t1
from Listing 7-2.
CREATE PROCEDURE dbo.p1
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL=snapshot, LANGUAGE=N'us_english')
DECLARE @i INT = 1000000;
WHILE @i > 0
BEGIN
INSERT dbo.t1
VALUES ( @i, @i + 1 );
SET @i -= 1;
END
END;
GO
EXEC dbo.p1;
GO
Two other requirements when creating a natively compiled module are as follows:
• Use of WITH SCHEMABINDING – guarantees that the tables accessed by the
module are not dropped or altered.
• Use of BEGIN ATOMIC – natively compiled modules must start by defining an
ATOMIC block. This is an ANSI construct that SQL Server allows only inside
natively compiled modules. An ATOMIC block guarantees that the enclosed opera-
tions are processed within a transaction. It starts a transaction if one is not open,
and otherwise it creates a SAVEPOINT.
The DLL for the procedure p1 can interact directly with the DLL for the table t1, as well as
the in-memory OLTP storage engine, to insert the rows very quickly.
The in-memory OLTP compiler leverages the query optimizer to create an efficient execu-
tion plan for each of the queries in the stored procedure. However, due to the limitations of
what can be included in natively compiled modules, the optimizer does not have as many
options to choose from in the plans it develops, when compared to the plans for interpreted
T-SQL code. Note that, for natively compiled stored procedures, the query execution plan is
compiled into the DLL.
161
Chapter 7: Native Compilation of Tables and Native Modules
SQL Server 2016 does not support automatic recompilation of natively compiled modules, so
if you make changes to table data that result in the statistics being updated, you can use the
sp_recompile procedure to force a natively compiled module to be recompiled at its next
execution. Alternatively, you can ALTER affected modules which will cause all the module's
queries to be recompiled. SQL Server also recompiles natively compiled modules on first
execution after server restart, as well as after fail-over to an AlwaysOn secondary, or after a
database is taken offline and then brought back online again. In any of these cases, the query
optimizer will create new query plans that are subsequently compiled into the module DLLs.
As discussed in Chapter 2, there are limitations on the T-SQL constructs that can be included
in a natively compiled module, although the list is greatly reduced in SQL Server 2016
compared to SQL Server 2014. Natively compiled stored modules are intended for short,
basic OLTP operations, so many of the complex query constructs provided in the language
are not allowed. In fact, there are so many restrictions, that the documentation lists the
features that are supported, rather than those that are not. You can find the list at this link:
http://msdn.microsoft.com/en-us/library/dn452279.aspx.
Parameter Sniffing
Interpreted T-SQL modules are compiled into intermediate physical execution plans at first
execution (invocation) time, in contrast to natively compiled modules, which are natively
compiled at creation time. When interpreted modules are compiled at invocation, the values
of the parameters supplied for that invocation are used by the optimizer when generating the
execution plan. This use of parameters during compilation is called "parameter sniffing."
SQL Server does not use parameter sniffing for compiling natively compiled modules. All
parameters to the stored procedure are considered to have UNKNOWN values.
162
Chapter 7: Native Compilation of Tables and Native Modules
163
Chapter 7: Native Compilation of Tables and Native Modules
164
Chapter 7: Native Compilation of Tables and Native Modules
it may need to make, and certain execution plan options that are not available when working
with memory-optimized tables. The following subsections describe the most important differ-
ences between optimizing queries on disk-based tables and optimizing queries on memory-
optimized tables.
Hash indexes
There are no ordered scans with hash indexes. If a query is looking for a range of values, or
requires that the results be returned in sorted order, a hash index will not be useful, and the
optimizer will not consider it.
The optimizer cannot use a hash index unless the query filters on all columns in the index
key. The hash index examples in Chapter 4 illustrated an index on just a single column.
However, just like indexes on disk-based tables, hash indexes on memory-optimized tables
can be composite, but the hash function used to determine to which bucket a row belongs is
based on all columns in the index. So, if we had a hash index on (city, state), a row for
a customer from Springfield, Illinois would hash to a completely different value than a row
for a customer from Springfield, Missouri, and also would hash to a completely different
value than a row for a customer from Chicago, Illinois. If a query only supplies a value for
city, a hash value cannot be generated and the index cannot be used, unless the entire index
is used for a scan.
165
Chapter 7: Native Compilation of Tables and Native Modules
For similar reasons, a hash index can only be used if the filter is based on an equality. If the
query does not specify an exact value for one of the columns in the hash index key, the hash
value cannot be determined. So, if we have a hash index on city, and the query is looking
for city LIKE 'San%', a hash lookup is not possible.
Range indexes
Although range indexes have an inherent ordering, a limitation is that they cannot be scanned
in reverse order. There is no concept of "previous pointers" in a range index on a memory-
optimized table. With on-disk indexes, if a query requests the data to be sorted in DESC
order, the on-disk index could be scanned in reverse order to support this. With in-memory
tables, an index would have to be created as a DESC index. In fact, it is possible to have
two indexes on the same column, one defined as ASC (ascending) and one defined as DESC
(descending). It is also possible to have both a range and a hash index on the same column.
No Halloween protection
Halloween protection is not incorporated into the query plans. Halloween protection provides
guarantees against accessing the same row multiple times during query processing. Opera-
tions on disk-based tables use spooling operators to make sure rows are not accessed repeat-
edly, but this is not necessary for plans on memory-optimized tables.
The storage engine provides Halloween protection for memory-optimized tables by including
a statement ID as part of the row header. Since the statement ID for the statement within a
batch that created a row version is stored with the row, if the same statement encounters that
row again, it knows it has already been processed.
No automatic recompile
Although this was mentioned earlier in this chapter, it bears repeating. Natively compiled
plans will never be recompiled on the fly even when the statistics on the underlying tables
have been updated. SQL Server recompiles natively compiled modules in the following
situations:
• when sp_recompile is run on the module
• when the module is altered
• on first execution after SQL Server restart
166
Chapter 7: Native Compilation of Tables and Native Modules
167
Chapter 7: Native Compilation of Tables and Native Modules
Performance Comparisons
Since the very first versions of SQL Server, stored procedures have been inaccurately
described as being stored in a compiled form. The process of coming up with a query plan
for a batch is also frequently described as compilation. However, until SQL Server 2014 and
in-memory OLTP, what was described as compilation wasn't true compilation. SQL Server
would store query plans in an internal form, after they had been parsed and normalized, but
they were not truly compiled. When executing the plan, the execution engine walks the query
tree and interprets each operator as it is executed, calling appropriate database functions. This
is far more expensive than for a true compiled plan, composed of machine language calls to
actual CPU instructions.
When processing a query, the runtime costs include locking, latching, and disk I/O, and the
relatively small cost and overhead associated with interpreted code, compared to compiled
code, gets "lost in the noise." However, in true performance tuning methodology, there is
always a bottleneck; once we remove one, another becomes apparent. Once we remove the
overhead of locking, latching, and disk I/O, the cost of interpreted code becomes a major
component, and a potential bottleneck.
The only way to substantially speed up processing time is to reduce the number of internal
CPU instructions executed. Assume that in our system we use one million CPU instructions
per transaction which results in 100 transactions per second (TPS). To achieve a 10-times
performance improvement, to 1,000 TPS, we would have to decrease the number of instruc-
tions per second to 100,000, which is a 90% reduction.
To satisfy the original vision for Hekaton, and achieve a 100-times performance improve-
ment, to 10,000 TPS, would mean reducing the number of instructions per second to 10,000,
or a 99% reduction! A reduction of this magnitude would be impossible with SQL Server's
existing interpretive query engine or any other existing interpretive engine.
With natively compiled code, SQL Server 2016 In-Memory OLTP has reduced the number of
instructions per second by well over 95%, achieving in some cases an improvement in perfor-
mance approaching the 100-times improvement.
168
Chapter 7: Native Compilation of Tables and Native Modules
The script in Listing 7-4 creates a memory-optimized table called bigtable_inmem. This
is a SCHEMA_ONLY memory-optimized table so SQL Server will log the table creation, but
will not log any DML on the table, so the data will not be durable.
USE IMDB;
GO
) WITH ( MEMORY_OPTIMIZED=ON,
DURABILITY=SCHEMA_ONLY );
GO
Note the commented lines in the script allow you to test the performance with either hash
indexes or range indexes. Running as is, the table will have three hash indexes. But you
can experiment with commenting out the hash index definition and creating instead the
range indexes.
169
Chapter 7: Native Compilation of Tables and Native Modules
Next, Listing 7-5 creates an interop (not natively compiled) stored procedure called
ins_bigtable that inserts rows into bigtable_inmem. The number of rows to
insert is passed as a parameter when the procedure is called.
------- Create the procedure -------
DECLARE @i int = 1;
DECLARE @newid uniqueidentifier
WHILE @i <= @rows_to_INSERT
BEGIN
SET @newid = newid()
INSERT dbo.bigtable_inmem ( id, account_id, trans_type_
id,
shop_id, trans_made, trans_amount
)
VALUES( @newid,
32767 * rand(),
30 * rand(),
100 * rand(),
dateadd( second, @i, cast( '20170731' AS
datetime ) ),
( 32767 * rand() ) / 100. ) ;
SET @i = @i + 1;
END
END
GO
Listing 7-5: Creating an interop procedure, ins_bigtable, to insert rows into the table.
Finally, Listing 7-6 creates the equivalent natively compiled stored procedure.
CREATE PROC ins_native_bigtable ( @rows_to_INSERT int )
with native_compilation, schemabinding, execute AS owner
AS
BEGIN ATOMIC WITH
( TRANSACTION ISOLATION LEVEL = SNAPSHOT,
170
Chapter 7: Native Compilation of Tables and Native Modules
LANGUAGE = N'us_english')
DECLARE @i int = 1;
DECLARE @newid uniqueidentifier
WHILE @i <= @rows_to_INSERT
BEGIN
SET @newid = newid()
INSERT dbo.bigtable_inmem ( id, account_id, trans_type_
id,
shop_id, trans_made, trans_amount
)
VALUES( @newid,
32767 * rand(),
30 * rand(),
100 * rand(),
dateadd( second, @i, cast( '20130410' AS
datetime ) ),
( 32767 * rand() ) / 100. ) ;
SET @i = @i + 1;
END
END
GO
Now we're going to run comparative tests for one million row inserts into bigtable_
inmem, via the interop and natively compiled stored procedures. We'll delete all the rows
from the table before we insert the next million rows. I am even including the DELETE
before the first insertion, in case we want to rerun the tests.
First, Listing 7-7 calls the interop procedure, with a parameter value of 1,000,000, outside of
a user-defined transaction, so each INSERT in the procedure is an autocommit transaction.
DELETE bigtable_inmem;
GO
EXEC ins_bigtable @rows_to_INSERT = 1000000;
GO
Listing 7-7: Inserting a million rows into a memory-optimized table via ins_bigtable.
When I executed this EXEC above, it took 1 minute 28 seconds, as indicated in the status bar
in the SQL Server Management Studio. You might want to record the amount of time it took
on your SQL Server instance.
171
Chapter 7: Native Compilation of Tables and Native Modules
Next, Listing 7-8 calls the interop procedure inside a transaction, so that all the INSERT
operations are a single transaction.
DELETE bigtable_inmem;
GO
BEGIN TRAN
EXEC ins_bigtable @rows_to_INSERT = 1000000;
COMMIT TRAN
When I executed the EXEC above, it took 31 seconds, which was about a third of the time
it took to insert the same number of rows in separate transactions. The savings here are
primarily due to the reduction in the overhead of managing a million separate transactions.
Since this is an interop procedure, each transaction is both a regular SQL Server transaction
and an in-memory OLTP transaction, so there is a lot of overhead. The difference is not due
to any additional logging, because the memory-optimized table is a SCHEMA_ONLY table
and no logging is done at all.
Lastly, Listing 7-9 calls the natively compiled procedure called ins_native_bigtable
with a parameter of 1,000,000.
DELETE bigtable_inmem;
GO
EXEC ins_native_bigtable @rows_to_INSERT = 1000000;
GO
Listing 7-9: Creating a natively compiled procedure and inserting a million rows.
Running this natively compiled procedure to insert the same 1,000,000 rows took only 5
seconds, about 5% of the time it took to insert the rows through an interop procedure.
Of course, your results may vary depending on the kinds of operations you are performing
and the machine resources you have available; and keep in mind that I was testing this
on SCHEMA_ONLY tables. For this example, I wanted to show you the impact that native
compilation itself could have without interference from the overhead of disk writes that
the CHECKPOINT process is doing, and any logging that the query thread would have
to perform.
172
Chapter 7: Native Compilation of Tables and Native Modules
Listing 7-10: Syntax for procedures to enable statistics collection for natively
compiled procedures.
As suggested, performance decreases when you enable statistics collection, but obviously
collecting statistics at the procedure level with sys.sp_xtp_control_proc_exec_
stats will be less expensive than using sys.sp_xtp_control_query_exec_
stats to gather statistics for every query within every procedure.
If we only need to troubleshoot one, or a few, natively compiled modules, there is a param-
eter for sys.sp_xtp_control_query_exec_stats to enable statistics collection for
a single module, so we can run sys.sp_xtp_control_query_exec_stats once for
each of those modules.
173
Chapter 7: Native Compilation of Tables and Native Modules
Summary
This chapter discussed the creation and internal management of natively compiled modules
to access memory-optimized tables. These modules can be stored procedures, triggers,
user-defined scalar functions or user-defined inline table-valued functions. These modules
generate far fewer CPU instructions for the engine to execute than the equivalent interpreted
T-SQL module, and can be executed directly by the CPU, without the need for further compi-
lation or interpretation.
There are some limitations in the T-SQL constructs allowed in natively compiled modules,
and so certain transformations that the optimizer might have chosen are not supported. In
addition, because of differences in the way that memory-optimized tables are organized and
managed, the optimizer often needs to make different choices than it would make from a
similar operation on a disk-based table. We reviewed some of the main differences.
When we access memory-optimized tables, which are also compiled, from natively
compiled modules, we have a highly efficient data access path, and the fastest possible query
processing. We looked at how to run our own performance tests, and also how to collect
performance diagnostic data from some Dynamic Management Views.
Additional Resources
• Native Compilation of Tables and Stored Procedures:
http://preview.tinyurl.com/y8v9fg6q.
• Architectural Overview of SQL Server 2014's In-Memory OLTP Technology:
http://preview.tinyurl.com/lpzkray.
• A peek inside the in-memory OLTP engine:
http://preview.tinyurl.com/q3sb3o3.
• Hekaton: SQL Server's Memory-Optimized OLTP Engine:
http://preview.tinyurl.com/lcg5m4x.
174
Chapter 8: SQL Server Support
and Manageability
SQL Server In-Memory OLTP is an integral part of SQL Server 2016 and, as of SQL Server
2016 Service Pack 1, it is available in the Standard edition, as well as in the Enterprise and
Developer editions. It uses the same management tools, including SQL Server Management
Studio.
Most of the standard SQL Server features work seamlessly with memory-optimized tables.
This chapter will discuss feature support, including the new Native Compilation Advisor,
which will highlight unsupported features in any stored procedures that you wish to convert
to natively compiled procedures, and the Memory Optimization Advisor, which will report
on unsupported features in tables that you might want to convert to memory-optimized
tables. We'll then move on to discuss metrics and metadata objects added to SQL Server
2014 in order to help us manage the objects as well as track memory usage and performance,
including:
• memory allocation, and management using Resource Governor – a key
concern when working with in-memory OLTP databases and objects
• enhancements to system catalog views – such as sys.tables,
sys.indexes, and others
• new xtp (eXtreme Transaction Processing) Dynamic Management Objects,
extended events and performance counters – for performance monitoring and
troubleshooting in-memory OLTP databases.
To round off the chapter, and the book, I'll summarize some of the key points to remember
when designing efficient memory-optimized tables and indexes, and then review consider-
ations for migrating existing tables and procedures over to SQL Server In-Memory OLTP.
175
Chapter 8: SQL Server Support and Manageability
Feature Support
In-memory OLTP and databases containing memory-optimized tables support much,
though not all, of the SQL Server feature set. As we've seen throughout the book,
SQL Server Management Studio works seamlessly with memory-optimized tables,
filegroups, and natively compiled procedures. In addition, we can use SQL Server
Data Tools (SSDT), Server Management Objects (SMO) and PowerShell to manage
our memory-optimized objects.
Database backup and restore are fully supported, as is log shipping. In terms of other "High
Availability" solutions, AlwaysOn components are supported, but database mirroring and
replication of memory-optimized tables are unsupported; a memory-optimized table can be a
subscriber in transactional replication, but not a publisher.
Natively compiled stored procedures support only a limited subset of the full T-SQL "surface
area," but the number of supported features has expanded in SQL Server 2016 compared to
SQL Server 2014. Fortunately, SQL Server Management Studio includes a tool called Native
Compilation Advisor, shown in Figure 8-1, which will highlight any constructs contained in
an existing stored procedure that are incompatible with natively compiled procedures.
176
Chapter 8: SQL Server Support and Manageability
177
Chapter 8: SQL Server Support and Manageability
Another feature, that works similarly to the Native Compilation Advisor, is the Memory
Optimization Advisor, available from SQL Server Management Studio when you right-click
on a disk-based table. This tool will report on table features that are unsupported, such as
computed columns, SPARSE columns, or IDENTITY columns with increment other than 1.
Figure 8-3 shows an example of that. The output for this tool might be for a table that failed
several checks.
The Memory Optimization Advisor will also provide information such as the estimated
memory requirement for the table if it is converted to be memory optimized. Finally, the
Memory Optimization Advisor can convert the table to a memory-optimized table, if it
doesn't contain any unsupported features.
178
Chapter 8: SQL Server Support and Manageability
179
Chapter 8: SQL Server Support and Manageability
Planning space requirements for hash indexes is straightforward. Each bucket requires
8 bytes, so the memory required is simply the number of buckets times 8 bytes. Planning
space for range indexes is slightly trickier. The size for a range index depends on both the
size of the index key and the number of rows in the table. We can assume each index row is
8 bytes plus the size of the index key (assume K bytes), so the maximum number of rows
that fit on a page would be 8176 / (K+8). Divide that result into the expected number of rows
to get an initial estimate. Remember that not all index pages are 8 K, and not all pages are
completely full. As SQL Server needs to split and merge pages, it will need to create new
pages and we need to allow space for them, until the garbage collection process removes
them. For more details on how to estimate memory requirements for your memory-optimized
tables, take a look at this article: https://msdn.microsoft.com/library/dn282389.aspx.
In fact, Resource Governor manages all memory consumed by memory-optimized tables and
their indexes. If we don't map a database to a pool explicitly, then SQL Server will map it
implicitly to the default pool.
180
Chapter 8: SQL Server Support and Manageability
Listing 8-1: Creating a resource pool for a database containing memory-optimized tables.
Next, we need to bind the databases that we wish to manage to their respective pools, using
the procedure sp_xtp_bind_db_resource_pool. Note that one pool may contain
many databases, but a database is only associated with one pool at any point in time.
EXEC sp_xtp_bind_db_resource_pool 'HkDB', 'HkPool';
Listing 8-3: Taking a database offline and then online to allow memory to be associated
with the new resource pool.
We can remove the binding between a database and a pool using the procedure sp_xtp_
unbind_db_resource_pool, as shown in Listing 8-4. For example, we may wish to
move the database to a different pool, or to delete the pool entirely, to replace it with some
other pool or pools.
181
Chapter 8: SQL Server Support and Manageability
Listing 8-4: Remove the binding between a database and a resource pool.
182
Chapter 8: SQL Server Support and Manageability
This report shows you the space used by the table rows and the indexes, as well as the
small amount of space used by the system. Remember that hash indexes will have memory
allocated for the declared number of buckets as soon as they're created, so this report will
show memory usage for those indexes before any rows are inserted. For range indexes,
only a very small amount of memory is allocated before any rows are added, and the
memory requirement will depend on the size of the index keys and the number of rows,
as discussed previously.
183
Chapter 8: SQL Server Support and Manageability
As a simple example, the query in Listing 8-5 reports which databases a SQL Server
instance could support memory-optimized tables on, based on the requirement of having
a memory-optimized filegroup that contains at least one file. It uses the procedure
sp_MSforeachdb to loop through all databases, and print a message for each database
that meets the requirements.
EXEC sp_MSforeachdb 'USE ? IF EXISTS (SELECT 1 FROM sys.filegroups FG
JOIN sys.database_files F
ON FG.data_space_id = F.data_space_id
WHERE FG.type = ''FX'' AND F.type = 2)
PRINT ''?'' + '' can contain memory-optimized
tables.'' ';
GO
A catalog view, sys.hash_indexes, has been added to support hash indexes. This view
is based on sys.indexes, so it has the same columns as that view, with one extra column
added. The bucket_count column shows a count of the number of hash buckets specified
for the index and the value cannot be changed without dropping and recreating the index.
In addition, there are several dynamic management objects that provide information
specifically for memory-optimized tables.
184
Chapter 8: SQL Server Support and Manageability
• sys.dm_db_xtp_checkpoint_files
Displays information about checkpoint files, including file size, physical location,
and the transaction-ID. For the current checkpoint that has not closed, the state
column of this DMV will display UNDER CONSTRUCTION, for new files. A
checkpoint closes automatically when the transaction log grows 512 MB since the
last checkpoint, or if you issue the CHECKPOINT command.
• sys.dm_xtp_gc_stats
Provides information about the current behavior of the in-memory OLTP garbage
collection process. The parallel_assist_count represents the number of
rows processed by user transactions and the idle_worker_count represents
the rows processed by the idle worker.
• sys.dm_xtp_gc_queue_stats
Provides details of activity on each garbage collection worker queue on the server
(one queue per logical CPU). As described in Chapter 5, the garbage collec-
tion thread adds "work items" to this queue, consisting of groups of "stale" rows,
eligible for garbage collection. By taking regular snapshots of these queue lengths,
we can make sure garbage collection is keeping up with the demand. If the queue
lengths remain steady, garbage collection is keeping up. If the queue lengths are
growing over time, this is an indication that garbage collection is falling behind
(and you may need to allocate more memory).
• sys.dm_db_xtp_gc_cycle_stats
For the current database, outputs a ring buffer of garbage collection cycles
containing up to 1024 rows (each row represents a single cycle). As discussed
in Chapter 5, to spread out the garbage collection work, the garbage collection
thread arranges transactions into "generations" according to when they committed
compared to the oldest active transaction. They are grouped into units of 16 trans-
actions across 16 generations as follows:
• Generation 0: Stores all transactions that have committed earlier than the
oldest active transaction and therefore the row versions generated by them can
be immediately garbage collected.
• Generations 1–14: Store transactions with a timestamp greater than the oldest
active transaction meaning that the row versions can't yet be garbage collected.
Each generation can hold up to 16 transactions. A total of 224 (14 * 16) transac-
tions can exist in these generations.
185
Chapter 8: SQL Server Support and Manageability
186
Chapter 8: SQL Server Support and Manageability
• sys.dm_db_xtp_memory_consumers
Reports the database-level memory consumers in the in-memory OLTP database
engine. The view returns a row for each memory consumer that the database
engine uses.
• sys.dm_xtp_transaction_stats
Reports accumulated statistics about transactions that have run since the
server started.
• sys.dm_db_xtp_transactions
Reports the active transactions in the in-memory OLTP database engine
(covered in Chapter 5).
• sys.dm_xtp_threads
(undocumented, for internal use only). Reports on the performance of
the garbage collection threads, whether they are user threads or a
dedicated garbage collection thread.
• sys.dm_xtp_transaction_recent_rows
(undocumented, for internal use only). Provides information that allows the
in-memory OLTP database engine to perform its validity and dependency checks
during post-processing.
Extended events
The in-memory OLTP engine provides three extended event packages to help in monitoring
and troubleshooting. Listing 8-6 reveals the package names and the number of events in
each package.
SELECT p.name AS PackageName ,
COUNT(*) AS NumberOfEvents
FROM sys.dm_xe_objects o
JOIN sys.dm_xe_packages p ON o.package_guid = p.guid
WHERE p.name LIKE 'XTP%'
GROUP BY p.name;
GO
Listing 8-6: Retrieve package information for in-memory OLTP extended events.
187
Chapter 8: SQL Server Support and Manageability
Listing 8-7 returns the names of all 178 of the extended events currently available in the
in-memory OLTP packages.
SELECT p.name AS PackageName ,
o.name AS EventName ,
o.description AS EventDescription
FROM sys.dm_xe_objects o
JOIN sys.dm_xe_packages p ON o.package_guid = p.guid
WHERE p.name LIKE 'XTP%';
GO
Listing 8-7: Retrieve the names of the in-memory OLTP extended events.
Performance counters
The in-memory OLTP engine provides performance counters to help in monitoring and
troubleshooting. Listing 8-8 returns the performance counters currently available.
SELECT object_name AS ObjectName ,
counter_name AS CounterName
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%XTP%';
GO
Listing 8-8: Retrieve the names of the in-memory OLTP performance counters.
My results show 57 counters for seven different objects. The objects are listed and described
in Table 82. The objects start with the substring "SQL Server 2016 XTP" and then are
followed by one of the names shown.
Performance
Description
Counter
Contains counters related to internal XTP engine cursors. Cursors are
Cursors the low-level building blocks that the XTP engine uses to process T-SQL
queries. As such, you do not typically have direct control over them.
Contains counters related to the XTP engine's garbage collector.
Garbage
Counters include the number of rows processed, the number of scans
Collection
per second, and the number of rows expired per second.
188
Chapter 8: SQL Server Support and Manageability
Performance
Description
Counter
The SQL Server XTP I/O Governor performance object contains
counters related to the in-memory OLTP I/O Rate Governor, which must
I/O Governor be enabled using an undocumented trace flag. These counters should
only be used when troubleshooting under the direction of Microsoft
SQL Server support.
Contains counters related to the XTP engine's phantom processing
Phantom
subsystem. This component is responsible for detecting phantom rows
Processor
in transactions running at the SERIALIZABLE isolation level.
Contains counters related to the checkpoint files. Counters include the
Storage
number of checkpoints closed, and the number of files merged.
Contains counters related to XTP transaction logging in SQL Server.
Transaction
Counters include the number of log bytes and the number of log records
Log
per second written by the in-memory OLTP engine.
Contains counters related to XTP engine transactions in SQL Server.
Transactions Counters include the number of commit dependencies taken and the
number of commit dependencies that have been rolled back.
189
Chapter 8: SQL Server Support and Manageability
190
Chapter 8: SQL Server Support and Manageability
Relatively speaking, in-memory OLTP is still a new technology and best practices are still
being discovered as more and more production applications are written to make use of
memory-optimized tables and natively compiled objects.
Migration Considerations
Although in-memory OLTP might sound like a panacea for all your relational database
performance problems, it isn't, of course. There are some applications that can experience
enormous improvement when using memory-optimized tables and natively compiled stored
procedures, and others that will see no drastic gains, or perhaps no gains at all.
191
Chapter 8: SQL Server Support and Manageability
192
Chapter 8: SQL Server Support and Manageability
193
Chapter 8: SQL Server Support and Manageability
Finally, the code to process the INSERTs must be run repeatedly for each row inserted,
and when using interop T-SQL this imposes a lot of overhead. If the code to process the
INSERTs meets the criteria for creating a natively compiled procedure, executing the
INSERTs through compiled code can improve performance dramatically, as demonstrated
in Chapter 7.
CPU-intensive operations
A common requirement is to load large volumes of data, as discussed previously, but then
to process the data in some way, before it is available for reading by the application. This
processing can involve updating or deleting some of the data, if it is deemed inappropriate, or
it can involve computations to put the data into the proper form for use.
The biggest bottleneck that the application will encounter in this case is the locking and
latching as the data is read for processing, and then the CPU resources required once
processing is invoked, which will vary depending on the complexity of the code executed.
As discussed, in-memory OLTP can provide a solution for all of these bottlenecks.
194
Chapter 8: SQL Server Support and Manageability
195
Chapter 8: SQL Server Support and Manageability
196
Chapter 8: SQL Server Support and Manageability
For example, an application might use the READPAST hint to manage work
queues, which requires SQL Server to use locks in order to find the next row in
the queue to process. Alternatively, let's say the application is written to expect the
behavior delivered by accessing disk-based tables using SNAPSHOT isolation. In
the event of a write-write conflict, the correct functioning of the application code
may rely on the expectation that SQL Server will not report the conflict until the
first process commits. This expected behavior is incompatible with that delivered
by the use of SNAPSHOT isolation with memory-optimized tables (the standard
isolation level when accessing memory-optimized tables). If an application relies
on specific locking behavior, then you'll need to delay converting to in-memory
OLTP until you can rewrite the relevant sections of your code.
Current applications
As noted earlier, there were relatively few applications using memory-optimized tables in a
production environment in SQL Server 2014, but the list is growing rapidly for SQL Server
2016 with the addition of many new features supported with memory-optimized tables and
natively compiled procedures. When considering a migration, you might want to review
the published information regarding the types of application that are already benefiting
from running SQL Server In-Memory OLTP. For example (the URLs refer to Microsoft
case studies):
• Bwin.party (http://preview.tinyurl.com/y8uv3vog), the world's largest regulated
online gaming company. SQL Server 2014 allows bwin to scale its applications
to 250 K requests a second, a 16x increase from before, and to provide an overall
faster and smoother customer playing experience.
• Moneris (https://customers.microsoft.com/en-us/story/moneris) is one of the
largest payment processors in North America, serving 350,000 merchant locations
and more than 3 billion transactions each year. They are using in-memory OLTP in
SQL Server 2016 to support over 600 transactions a second, also taking advantage
of the Always Encrypted feature and the new load-balancing capabilities in Always
On Availability Groups.
• Derivco (http://preview.tinyurl.com/y7z7mwvq) manages online gaming and
sports betting, and after first developing products using SQL Server 2014,
upgraded to SQL Server 2016 and immediately reports a 25-percent increase in the
number of concurrent players they could support. They then made minor adjust-
ments to allow the system to use in-memory OLTP, and the scalability improved
197
Chapter 8: SQL Server Support and Manageability
beyond all expectations. A system that used to support 5,000 players now supports
30,000 players! They are confident that, with a few architectural changes, that
number will increase to over 100,000.
• Infosys (https://customers.microsoft.com/en-us/story/infosyssql) is a global leader
in technology services and consulting with clients in more than 50 countries.
Infosys needed an analytics system that could provide real-time data for business
analytics while not compromising on performance or security issues of core
systems. They decided to migrate from SQL Server 2012 to SQL Server 2016 to
leverage the in-memory OLTP and columnstore index features to enable real-time
analytics data to be available on the OLTP systems with no performance impact.
198
Chapter 8: SQL Server Support and Manageability
Consider the following list of steps as a guide, as you work through a migration to
in-memory OLTP:
1. Capture baseline performance metrics running queries against existing tables.
2. Identify the tables with the biggest bottlenecks.
3. Address the constructs in the table DDLs that are not supported for memory-
optimized tables. The Memory Optimization Advisor can tell you what constructs
are unsupported for memory-optimized tables.
4. Recreate the tables as in-memory, to be accessed using interop code.
5. Identify any procedures, or sections of code, which experience performance
bottlenecks when accessing the converted tables.
6. Address the T-SQL limitations in the code. If the code is in a stored procedure,
you can use the Native Compilation Advisor. Recreate the code in a natively
compiled procedure.
7. Compare performance against the baseline.
You can think of this as a cyclical process. Start with a few tables and convert them, then
convert the most critical procedures that access those tables, convert a few more tables, and
then a couple more procedures. You can repeat this cycle as needed, until you reach the point
where the performance gains are minimal.
You can also consider running a report called Transaction Performance Analysis, to
help with the performance analysis prior to migrating to in-memory OLTP. The AMR tool
will provide recommendations on the tables and procedures that may benefit most from
migrating to in-memory OLTP. It is available from the list of standard reports that you'll see
when right-clicking a database name in Object Explorer and clicking Reports | Standard
Reports | Transaction Performance Analysis Overview. A page will open with two options
for reporting.
One of the reports will describe which tables are prime candidates for conversion to memory-
optimized tables, as well as providing an estimate of the size of the effort required to perform
the conversion, based on how many unsupported features the table concurrently uses. For
example, it will point out unsupported data types and constraints used in the table.
Another report will contain recommendations on which stored procedures might benefit from
being converted to natively compiled procedures for use with memory-optimized tables.
199
Chapter 8: SQL Server Support and Manageability
Based on recommendations from the these reports, you can start converting tables into
memory-optimized tables one at a time, starting with the ones that would benefit most
from the memory-optimized structures. As you start seeing the benefit of the conversion to
memory-optimized tables, you can continue to convert more of your tables, but access them
using your normal T-SQL interface, with very few application changes.
Once your tables have been converted, you can then start planning a rewrite of the code into
natively compiled stored procedures, again starting with the ones that the reports indicate
would provide the most benefit.
Summary
Using SQL Server In-Memory OLTP, we can create and work with tables that are memory-
optimized and extremely efficient to manage, often providing performance optimization for
OLTP workloads. They are accessed with true multi-version optimistic concurrency control
requiring no locks or latches during processing. All in-memory OLTP memory-optimized
tables must have at least one index, and all access is via indexes. In-memory OLTP memory-
optimized tables can be referenced in the same transactions as disk-based tables, with only
a few restrictions. Natively compiled stored procedures are the fastest way to access your
memory-optimized tables and performance business logic computations.
If most, or all, of an application's data can be entirely memory resident, then there is no
wait time required for disk reads, which for disk-based tables is frequently one of the most
significant reasons for query waits. For disk-based tables, if no waiting for disk reads is
needed, other wait types, such as waiting for locks to be released, waiting for latches to be
available, or waiting for log writes to complete, would become disproportionately large.
In-memory OLTP addresses all these issues. It removes the issues involved in waiting for
locks to be released, using a new type of multi-version optimistic concurrency control. It
also reduces the delays of waiting for log writes by generating far less log data, and needing
fewer log writes.
200
Chapter 8: SQL Server Support and Manageability
Additional Resources
• Managing Memory for In-Memory OLTP:
http://msdn.microsoft.com/en-us/library/dn465872.aspx.
• Using the Resource Governor – extensive white paper written when the
feature was introduced in SQL Server 2008:
http://preview.tinyurl.com/yd9kq6t8.
• Resource Governor in SQL Server 2012 – covers significant changes
in this release:
http://msdn.microsoft.com/en-us/library/jj573256.aspx.
• Extended Events – the best place to get a start on working with extended
events is in the SQL Server documentation:
https://msdn.microsoft.com/en-us/library/bb630282(v=sql.130).aspx.
• Common Workload Patterns and Migration Considerations – types of
bottlenecks and workloads that are most suited to in-memory OLTP:
http://msdn.microsoft.com/en-us/library/dn673538.aspx.
• Transact-SQL Constructs Not Supported by In-Memory OLTP – recom-
mended workarounds for the current limitations in support for the T-SQL surface:
http://msdn.microsoft.com/en-us/library/dn246937(v=sql.130).aspx.
• Statistics for Memory-Optimized Tables:
https://msdn.microsoft.com/en-us/library/dn232522(v=sql.130).aspx.
• Management Data Warehouse:
https://msdn.microsoft.com/en-us/library/bb677306.aspx.
• Migrating to In-Memory OLTP:
https://msdn.microsoft.com/en-us/library/dn247639.aspx.
• Determining if a Table or Stored Procedure Should Be Ported to
In-Memory OLTP:
https://msdn.microsoft.com/en-us/library/dn205133.aspx.
• The coming in-memory database tipping point:
http://preview.tinyurl.com/ybre4m9o.
201