Dumbriguda Data
Dumbriguda Data
Junior DBAs
Mid-level DBAs
Senior DBAs
DBA consultants
Manager or director of database administration/information technology
Data Architects
Release Managers
Change Managers
Main Responsibilities
As I mentioned, today’s jobs are more demanding. DBAs are now a days performing many roles
and when they specialised in those roles, they grow like
Production Support DBAs: These DBAs are focused on the physical aspects of database
administration such as DBMS installation, configuration, patching, upgrades, backups, restores,
refreshes, performance optimization, maintenance and disaster recovery.
Development DBAs: These DBAs are focued on the logical and development aspects of
database administration such as data model design and maintenance, DDL (data definition
language) generation, SQL writing and tuning, coding stored procedures, collaborating with
developers to help choose the most appropriate DBMS feature/functionality and other pre-
production activities.
Application DBAs: These DBAs are usually found in organizations that have purchased 3rd
party application software such as ERP (enterprise resource planning) and CRM (customer
relationship management) systems.
Hybrid DBAs: As name suggests, these DBAs are usually those who performs almost all the
tasks which are mentioned. These DBAs might not be speciallized however they are the one who
grows like Architects at enterprise level.
It provides the core relational database engine and basic business intelligence (BI)
capabilities.
Limited to 16 Cores
64GB of RAM
It doesn’t include support for the advanced availability features or the more powerful BI
features such as PowerPivot, Power View, and Master Data Services.
It includes support for two-node AlwaysOn Failover Clusters, and it’s licensed either per
core or per server. We will talk about Licensing in next chapter. So don’t worry.
SQL Server 2012 Business Edition
SQL Server 2012 Business Intelligence Edition is a new member of the SQL Server
family.
Limited to 16 cores for the database engine and limited to 64GB of RAM. However, it
can use the maximum number of cores supported by the OS for Analysis Services and
Reporting Services because this is BI Edition.
This Edition includes all of the features in the Standard edition and support for advanced
BI features such as Power View and PowerPivot, but it lacks support for the advanced
availability features like AlwaysOn Availability Groups and other online operations. The
BI Edition supports two-node AlwaysOn Failover Clusters, and it’s licensed per server.
SQL Server 2012 Enterprise Edition has everything that SQL Server offers.
It supports the maximum number of cores and RAM in the host OS and provides the
complete SQL Server feature set, including support for all of the advanced availability
and BI features.
The Enterprise edition supports up to 16-node AlwaysOn Failover Clusters
AlwaysOn Availability Groups
PowerPivot, Power View, Master Data Services, advanced auditing, transparent data
encryption, the ColumnStore index, and more.
The Enterprise edition is licensed per core.
SQL Server 2012 offers free edition as SQL Server Express Edition.
The Express editions are limited to support for one CPU and 1GB of RAM. Databases are
limited to 10GB per database.
A new option called LocalDB is also available.
SQL Server 2012 Web Edition and SQL Server 2012 Developer Edition are same as previous
versions. The Developer edition provides the same feature set as the Enterprise edition.
However, it’s licensed per developer and can’t be used for production work. The Web edition is
licensed only to hosting companies with a Services Provider License Agreement (SLPA).
Every SQL Server relies on five primary system databases, each of which must be present for the
server to operate effectively. For those who are new to SQL server or who have never set up a
server from scratch may find these databases mysterious. I think since you have already installed
your SQL server 2012 so it will be easy for you to understand them. It’s important to understand
their purposes and some key activities that you should be doing. Remember, except tempdb and
resource database, all the databases should be backed up on regular basis.
Now, you must connect to you SQL instance and see where these databases are. Open
Management Studio and click on the databases > system databases to see your very own
databases. Let’s talk about each of them one by one.
System Databases
Master
THE most important database that is required to run SQL server. Though all four databases are
required for SQL server to be up and running, the master database stores basic configuration
information for the server instance and without master database SQL server will not run at all.
Important things that master database contains
Information about all databases and their logical and physical files
Information about user logins
Server configuration settings
Startup procedures
You should be careful while playing with Master Database, you shouldn’t be
alerting/updating/deleting any object or record in it. Causing this may crash your SQL server.
However, I will not stop you and you must try modifying some of the tables in your testing
environment. Try crashing it and recovering it – that’s how you will learn. Don’t attempt it in
production.
Model
The model database is a template database that is copied into a new database whenever it is
created on the instance. Even on a system where new databases are created infrequently, the
model database must exist because it is used to create tempdb every time the server starts. It’s a
best practice to backup model whenever a change is made. This includes
MSDB
The msdb database is used to support a number of technologies within SQL Server, including the
SQL Server Agent, SQL Server Management Studio, Database Mail, and Service Broker.
MSDB stores
Depending on situations, this database can grow out of control and you must maintain the history
as much as you should. It’s important to take backup of MSDB on regular interval. Sometime,
changing database recovery model from Simple to full is advisable.
TEMPDB
The tempdb system database is like a shared temporary storage resource used by a number of
features of SQL Server, and made available to all users. Tempdb is used for
Temporary objects
Worktables
Online index operations
Cursors
Temp tables variables and table variable
It is recreated every time when SQL server is restarted, which means that no objects in tempdb
are permanently stored. Since tempdb is non-permanent storage, backups and restores are not
allowed for this database.
Resource database
The Resource database is a read-only database that contains all the system objects that are
included with SQL Server. This database is written at the time of SQL server installation only.
SQL Server system objects, such as sys.objects, are physically persisted in the Resource
database, but they logically appear in the sys schema of every database. User data or user
metadata is not stored in this database because it’s created at the time of installation only and no
user or user activity is involved at this time. It makes upgrading to a new version of SQL Server
an easier and faster procedure.
Now, you must have seen all the databases except Resource – Resource database is not visible in
Management Studio. However, if you want, you can see it on <drive>:\Program Files\Microsoft
SQL Server\MSSQL12.<instance_name>\MSSQL\Binn\ Try locating your resource database file
– mssqlsystemresource.mdf
Database Objects:
Technically everything within a database is an object. However, let’s talk about some important
database objects. But before going through, you must create a user database. In order to do so,
you should open Management Studio, connect to your instance, right click on database and then
create new. Follow on the screen instructions to create database.
Table
SQL Server database stores every information in a two dimensional objects of rows and columns
– this structure called table. In general, there are system tables and user tables.
Data types
Data types specify the type of data that can be stored in a column of a table. Data types are used
to apply data integrity to the column. SQL Server supports many data type like char, varchar,
integer, binary, decimal, money etc. You can also create your own data type (User defined
datatype) using system data type.
Function
Microsoft SQL server allows you to create functions. These functions are known as User
Defined Functions. It represents business logic using one or more transact SQL statements. It can
accept parameter(s) and can return scalar data value or a table data type. It can be used in the
SQL statement anywhere which is added advantage over stored procedure.
Index
Index can be thought as index of the book that is used for fast retrieval of information. Index
uses one or more column index keys and pointers to the record, to locate record. Index is used to
speed up query performance. Kind of the indexes are clustered and non-clustered. Both exist as
B-tree structure. We will talk about indexes in detail later.
Constraint
Using Constraint, SQL Server enforces the integrity to the database. It defines the rules that
restrict unwanted data in the column. Constraints can be table constraints or column constraints.
We will talk about constraints in detail. For now, a quick note about different constraints.
Stored Procedures
A stored procedure is a compiled set of Transact-SQL statements. The business logic can be
encapsulated using stored procedure. It improves network traffic by running set of Transact-SQL
statements at one go.
Trigger
A trigger is a special type of event driven stored procedure. It gets initiated when Insert, Delete
or Update event occurs. It can be used to maintain referential integrity. A trigger can call stored
procedure.
View
View can be created to retrieve data from one or more tables. Query used to create view can
include other views of the database. You can also access remote data using distributed query in a
view.
Users
Each database has users who are allowed to access data with in database. Its an identity of a
login when its connected to database. A database user can have the same name as login.
However there is no restriction. You can map a login with different database user name.
Schema
A database schema is a way to logically group objects such as tables, views, stored procedures
etc. Think of a schema as a container of objects.
Synonyms
A synonym is an alias or alternate name for a table, view, sequence, or other schema object.
You must go and try creating each and every object which are listed above. You will get better
understanding only after practicals. Once you are done, check your progess.
Database Keys:
Lets keep this as easy as possible for better understanding of the concepts. Point to Remember,
the same definition, principle and naming applies equally to Entity Modelling and Normalisatio.
Keys, as name suggests, a part of a relational database and a important part of the structure of a
table. They ensure each record within a table can be uniquely identified by one or a combination
of fields within the table. Fields, are nothing but columns. They help enforce integrity and help
identify the relationship between tables. There are three main types of keys:
Candidate keys
Primary keys
Foreign keys.
There is also an alternative key or secondary key that can be used, as the name suggests, as a
secondary or alternative key to the primary key
Super Key
A Super key is any combination of fields within a table that uniquely identifies each record
within that table.
Candidate Key
A candidate is a subset of a super key. Lets say I have 5 super keys in a table then one or more
than one can be a candidate key. A candidate key is a single field or the least combination of
fields that uniquely identifies each record in the table. The least combination of fields
distinguishes a candidate key from a super key. Every table must have at least one candidate key
but at the same time can have several. In order to be eligible for a candidate key it must pass
certain criteria.
It must contain unique values
It must not contain null values
It contains the minimum number of fields to ensure uniqueness
It must uniquely identify each record in the table
Once your candidate keys have been identified you can now select one to be your primary key
Primary Key
A primary key is a candidate key that is most appropriate to be the main reference key for the
table. As its name suggests, it is the primary key of reference for the table and is used throughout
the database to help establish relationships with other tables. As with any candidate key the
primary key must contain unique values, must never be null and uniquely identify each record in
the table.
Primary keys are mandatory for every table – this is not necessary though. Each record in table
must have a value for its primary key. When choosing a primary key from the pool of candidate
keys, it is always advisible to choose a single simple key as compared a composite key.
Foreign Key
A foreign key is generally a primary key from one table that appears as a field in another where
the first table has a relationship to the second. In other words, if we had a table A with a primary
key X that linked to a table B where X was a field in B, then X would be a foreign key in B.
A table may have one or more choices for the primary key. Collectively these are known as
candidate keys as discuss earlier. One is selected as the primary key. Those not selected are
known as secondary keys or alternative keys.
Simple Key
Any of the keys described before (ie primary, secondary or foreign) may comprise one or more
fields, for example if firstName and lastName was our key this would be a key of two fields
where as studentId is only one. A simple key consists of a single field to uniquely identify a
record. In addition the field in itself cannot be broken down into other fields, for example,
studentId, which uniquely identifies a particular student, is a single field and therefore is a simple
key. No two students would have the same student number.
Compound Key
A compound key consists of more than one field to uniquely identify a record. A compound key
is distinguished from a composite key because each field, which makes up the primary key, is
also a simple key in its own right. An example might be a table that represents the modules a
student is attending. This table has a studentId and a moduleCode as its primary key. Each of the
fields that make up the primary key are simple keys because each represents a unique reference
when identifying a student in one instance and a module in the other.
Composite Key
A composite key consists of more than one field to uniquely identify a record. This differs from a
compound key in that one or more of the attributes, which make up the key, are not simple keys
in their own right. Taking the example from compound key, imagine we identified a student by
their firstName + lastName. In our table representing students on modules our primary key
would now be firstName + lastName + moduleCode. Because firstName + lastName represent a
unique reference to a student, they are not each simple keys, they have to be combined in order
to uniquely identify the student. Therefore the key for this table is a composite key.
Database design can be divided into two portions, Logical Design and physical design. Logical
design is nothing but understanding the business, business logics, business rules and then
converting those logics to tables, columns, constraints, rules, keys, SPs, views etc.
You can understand Logical design in simple words, like if you are creating a new database for a
school, then Roll Number, Student Name, Standard, Address are good to have columns in
Student_Record table.
Physical database design, on the other hand, involves mapping the logical design onto physical
media, taking advantage of the hardware and software features available (or simply RDBMS or
any other database tool), which allows the data to be physically accessed and maintained as
quickly as possible, and indexing. In simple words, physical database design is nothing but
designing your physical data storage plan, decisions for storing data files on SAN or local disk,
which drive, how many data files or file groups etc.
Bad logical database design results in bad physical database design as well which generally leads
to poor database performance. So, if it is DBA’s responsibility to design a database from scratch,
spend good amount of time and take the necessary steps and efforts to get the logical database
design right. Once the logical design is right, then you also need to take the time to get the
physical design right.
Both the logical and physical design must be right before you can expect to get good
performance out of your database. If the logical design is not right before you begin the
development of your application, it is too late after the application has been implemented to fix
it. No amount of fast, expensive hardware can fix the poor performance caused by poor logical
database design
This is the time when we should talk about indexes in SQL Server. Indexes my be confusing, isnt
it?. Its easy to say that Indexes are performance booster but there are so many to remember. Here
I will try to keep this chaper as crispy as I can. Remember, an index is always be modified by
SQL Server when inserts, updates, and deletes are performed. This will lead to CPU and disk
overhead, so be wise when you create indexes and test them thoroughly. First lets see what is the
differences between clustered and nonclustered indexes.
CLUSTERED And NONCLUSTERED INDEXES
Clustered index is a SQL Server index that sorts and stores data rows in a table, based on key
values. We will talk about Keys later in the section.Nonclustered index is a SQL Server index
which contains a key value and a pointer to the data in the heap or clustered index.
The key difference between clustered and nonclustered SQL Server indexes is that a clustered
index controls the physical order of the data pages. The data pages of a clustered index will
always include all the columns in the table, even if you only create the index on one column. The
column you specify as key columns affect how the pages are stored in the B-tree (we will talk
about B-Tree Later) index structure. A nonclustered index does not affect the ordering and
storing of the data.
A B-tree structure has at least two levels – the root and the leaves. If there are enough records,
intermediate levels may be added as well. Clustered index leaf-level pages contain the data in the
table. Nonclustered index leaf-level pages contain the key value and a pointer to the data row in
the clustered index or heap.According to Knuth’s definition, a B-tree of order ‘m’ is a tree which
satisfies the following properties:
Now, another good point is to know the difference between PRIMARY KEY and A
CLUSTERED INDEX
Primary key is a constraint to enforce uniqueness in a table. The primary key columns cannot
hold NULL values.In SQL Server, when you create a primary key on a table, if a clustered index
is not defin ed and a nonclustered index is not specified, a unique clustered index is created to
enforce the constraint. By default, the primary key is enforced by a unique clustered index. This
is only by default, not by requirement. However, there is no guarantee that this is the best choice
for a clustered index for that table.
A key column is the column(s) that the index is created on, the non-key column are included
columns. Exmaple makes is more clear.
Example:
CREATE NONCLUSTERED INDEX idx ON Table1 (Col1, Col2) INCLUDE (Col3, Col4)
In the above example, Col1 and Col2 are key columns, Col3 and Col4 are non-key columns
Another Example:
CREATE CLUSTERED INDEX cidx ON Table1 (Col1)
In the above example Col1 is the key column, and all other columns in the table are classed as
non-key columns, as the clustered index is the table. A column cannot be both a key and a non-
key. It is either a key column or a non-key, included column.