SQL Server Dba / Sme: Me Interviewer
SQL Server Dba / Sme: Me Interviewer
Me: I have gone through the job role description but would like to know more about the responsibilities.
Interviewer: Well to simply describe the job role, SME will be taking care of the entire XXXXX product suite database
systems. He or She should be the point of contact for all database development and administration activities. Also
the role would be responsible for all product related technical escalations.
Me: Got it, but how a SQL Server SME can be a product owner? Apart from the technical point what other skill set
you are looking for?
Interviewer: Well the SME will not responsible from the business or domain prospect. But the person should be able
to handle all product related escalations, if any website is not working or the end customers facing a technical issue
they would come to you and you need to take the ownership and should handle the problem with the help of
concern technical team.
Me: I understand, in fact I am also working for a product development company. Other than the database activities I
need to participate in product enhancements related technical discussions and brainstorming sessions.
Interviewer: Yes I knew that. In our environment you many need to attend the performance issues on frequent
basis. Can we talk something about performance?
Me: Sure! It’s one of my interested areas. I had a very good experience in dealing with the performance issues.
Interviewer: That’s’ great! Ok, let’s say you are newly joined in an environment and you are assigned to handle all
database operations for a mission critical application also from the technical point you are the owner of this
application. The very first day you got a call from a business analyst saying that application is running dead slow and
you are asked to check it immediately. What is your action plan? Tell me how do you handle, please include each and
every step in detail.
Me: As a first step I would check if the entire application is slow or just a specific part is not running well. Also would
check for the details “If Any Errors”, “Time Out”, “Access from My location” etc.
Interviewer: Ok, “entire application is running slow, I am not even able to open the home page. It’s taking a long
time and ended with a timeout error” then what is your next step?
Me: Since it’s a mission critical application I’ll immediately open an incident with a bridge call and will add all the
required teams into that Ex: Application, DBA, Server, Network, Proxy, Storage, Middleware etc. Meanwhile I will
send a high alert notification to all the required teams and business stake holders.
Interviewer: Great, ok all teams are joined the call and you sent the notification too. What next?
Me: First thing I would check if any maintenance activity in place without a proper communication. I would check the
ADM (Application Dependency Mapping) which means all components / layers/ tiers involved in the application. I
would ask the respective teams to quickly perform a high level health check for the components which shouldn’t
take more than 5 min.
Interviewer: As I told you, I need the maximum information, can you please elaborate?
1. Application team to check the exact error message coming in while trying to access the application and ask
them to verify the problem is not location or the resource specific.
2. Server team to perform Health Check for all Servers involved
3. DBA team to check the Database Servers Health
4. Web-service / Middleware team to perform HC on all Middleware components
5. Network / Proxy / DNS Team to perform HC (Health Check) on DNS, Firewall, Load Balancers and Network
Connectivity
6. Also I would check if any recent changes happened on the infrastructure from any of these teams.
Interviewer: Perfect, let’s say you are representing the database systems and you need to take care of Database
Part. Can you tell me what you need to check while doing HC?
Me: We do use pre-defined scripts with scheduled jobs. We’ll just run the job and it captures all required
information and email the HC report.
Interviewer: See the problem is Application running slow; does it make sense checking the database Servers? If I say
that I am not able to login to the app then you might check the DB side. But here the case is different and they are
not able to open the application home page.
Me: Yes, it clearly makes sense because we have already insisted the respected teams to investigate from the
different prospects. I have seen applications where the complete metadata is dynamic which means in an application
page everything is dynamic including the text box size, location, label etc. In that case if it fails to establish a
connection with the database may delay the load time.
Interviewer: Agreed! Ok now tell me what is your approach if incase one of the parameter is failing in database
health check. Let’s say CPU / Memory / Disk / Service / Database / page errors in log file. How do you handle these
scenarios?
Me: Well, on a given Server if there is a problem with one of the resources, first thing we need to make sure that the
problem is from the SQL Server. There are chances that a Server can have services other than SQL Server or there
might be more than one SQL Instance on Server. Thereof first and foremost thing is we need to identify the
problematic point.
If it is SQL Server causing high CPU, identify the database and then the query / batch / proc causing the high CPU.
Usually high I/O operations cause the high CPU. The most common issues “Row by Row processing ex: Cursors”,
“User Defined Functions”, “Updating Statistics with Full Scan”, “Parallelism Issues”, etc.
If it is SQL Server causing high Memory, identify the database and then the actual database / query / batch which is
taking the majority portion in memory. There are different caches available, identify the problematic area that’s
might be a data cache or plan cache or wrong memory limit configurations.
Disk is full for some reason, quickly check the options, first try to copy files to other drive and make some free space,
reasons might be a huge log file backup, dump files, ldf file size increased etc.
Just like these I’ll try possible options / solutions when I see the actual problem.
Interviewer: Great! Ok after the initial 10 min investigation, everyone reported back all are working fine, network,
Database Servers, Windows / Linux Servers, Application Servers, Services, Firewall, and Middleware etc. Everyone
reported everything looks good, and of course you also didn’t see any issue from database side. But still the
application is not opening and ending with a timeout message. What’s your next step?
Me: From my experience in technology anything and everything can be traceable, mostly the problem can be
identified from the returning error message and from log files. If still we can’t identify the problem then we would
ask the Change Management team to provide all recent changes implemented on all Application supporting
Servers /components / services. For example Application / Database code changes, Patching related, network /
Firewall changes etc. But I am sure we can find the bottleneck from somewhere in these checks.
Interviewer: By the way from the first step, you are involving the network and firewall teams any specific reason for
that?
Me: See as you said my role is not just restricted for database ownership but extended as an app owner. In this
situation application home page is not opening and failing with a timeout error. From my experience there are more
chances to have a bottleneck at application lever or at network / firewall level and less chances to have a problem
from DB side. And that’s the reason I am involving network and firewall teams along with the app team.
Interviewer: Appreciated! Coming back to our actual question, unfortunately no luck, you got a response from all
teams saying that there are no recent changes implemented on any of these Servers / components which are related
to the application. Time is running out but still application is getting timeout. What’s your next step?
Me: Are there any third party tools / services involved in this entire architecture? As I told you it should be caught in
the initial check itself. I have seen few scenarios where application was down only for the few customers and it was
working for the rest. In that time we had difficulty in identifying the problem. But in the current scenario what we
have been discussing the application is down / not responding for all and still not able to identify the problem.
Interviewer: Application not working only for specific customers? Is that an internal application?
Me: No! It’s a public / internet facing application and the customers from few countries were not able to access it
from internet.
Interviewer: That’s interesting, can you explain more about the situation and how did you resolve it?
Me: It was a problem with third party component and we came to know that after few hours of investigation.
Interviewer: Can you explain the exact situation and how did you resolve it? I need all possible details, I have the
patience can you explain if you have time?
Any request comes from the internet will reach that third party CDN firewall, the request is verified and from there it
is redirected it to our company firewall. Every time when it redirects the request to our company firewall it uses a
predefined list of IP addresses. The third party team added new IP’s to enhance their capability and these IP’s are
not shared with us. When our end customers are trying to access the website their requests were redirecting from
the changed IP addresses to our company firewall. Since our company firewall not able to identify the IP address
(These are the new IP’s) it was dropping the connection. The moment our proxy team added these IP addresses to
our firewall whitelist the application started working in those areas and customers sent the confirmation.
In this scenario I didn’t do anything special from technical point but I analyzed the architecture diagram and
identified the third party network direction provided details to the network and proxy team.
Interviewer: That’s a fantastic experience. Ok, you found there is a problem and you quickly provided the solution
and application started working fine. In this entire incident procedure do you see any process level mistakes either
from my questions or from your answers?
Me: I can’t say it’s a problem but from my point of view, if it is a mission critical application we should already have
all required monitoring in place and from the business prospect any incident should be identified by the automated
alerts or monitoring system. If an incident is identified by a customer or from a business people indicates the failure
from the infrastructure team.
Me: I’ll insist to open a Root Cause Analysis with a problem request to provide a long term solution to prevent these
kinds of failures.
Interviewer: Hmm, ok let’s come back to my first question again. Now it’s from a different prospective, they are able
to login to the application without any issue but they are facing an issue with a specific page. The page is getting
timeout. How do you deal with that?
Me: As we discussed earlier in most of the cases when a specific page or part of the application is running slow which
means that problem is with the application / SQL coding or with the underlying database.
Me: I would quickly perform Health Check on database Servers just to make sure no issues from the resource
utilization. Based on a SQL version I’ll quickly run a Server side trace or take the help of extended events to identify
the actual query / SQL code / procedure / function that’s causing the problem. Meantime I would ask application
team to investigate from application side.
Interviewer: Ok you narrow down the filter and identified the bottleneck. One of the procedures is running slow and
it’s taking 7 min. How do you troubleshoot it?
Me: I would check the stored procedure execution plan to identify the primary bottleneck. I handled lot of situations
like these.
Interviewer: Great to know that. So can you tell me the possible failure points which might causes the performance
slow down?
Me:
1. Bad Parameter Sniffing
2. Scalar User Defined functions using in a query or with a loop
3. TEMPDB PAGELATCH contention
4. The CXPACKET WAIT TYPE
5. Blocking or long running transactions
6. Cross Database Queries on huge datasets
7. Poorly written queries
8. Wrongly configuring Server/Database options
9. Poorly designed Indexes
Interviewer: Great! In a project you are representing the technology lead role. You have a scheduled meeting with
the customers and it’s a critical deal. You are going to give them a presentation on how cloud based model works for
Database Management systems. This meeting determines a million dollar deal. You prepared well and all set for
meeting, the moment you are about to enter into meeting room you got a call and unfortunately it’s a medical
emergency at home and no one is there to help at home except you. How do you handle that?
Me: I would inform the situation to my manager and I attend the home emergency. I’ll try to look for an alternative
person who can manage the meeting with my assistance.
Me: Not even my family, it’s always “ME” is a priority for me, my dreams, my wishes, my career, it’s always me. I
work hard, work smart because it’s my career and I have my own goals. To answer your question my priority is “My
Health”, “Family” then my “Career”. In fact these are equally important for anyone but the one who manage or
balance between these parameters can enjoy the success.
Interviewer: I mean you had discussion with our techies, I think this is the 3rd time you are interacting with us
correct? What’s your opinion on our technical panel?
Me: To be frank I never ever seen a 2 and half hours of technical discussion on the same question. Overall I had a
strange experience with your technical team and unknowingly it created enthusiasm on your work environment.
Me: You still didn’t tell me why the application was down?
Interviewer: I think you covered the maximum scenarios! So what do you think are you going to win this interview?
Me: Certainly!
Interviewer: How do you say that?
Interviewer: Okhay, would you join us if I offer you less than your current salary?
Interviewer: Aww, smart enough ….you may expect a call from our team sometime in the next week.
Where do you find the default Index fill factor and how to change it?
The default index fill factor is available in index properties to manage data storage in the data pages. You have to
change through SSMS, and in object explorer just right click on the required Server and select the properties and go
to Database settings. In the default index fill factor box, you have to select or change the index fill factor that you
want to change.
What are the different Authentication modes in SQL Server and how can you change authentication mode?
We have windows authentication and Mixed mode available in SQL Server. To change authentication mode you have
to go SSMS, right click the Server and go for properties. On the security page, under Server authentication go to new
Server authentication mode and submit ok.
Due to some maintenance being done, the SQL Server on a failover cluster needs to be brought down. How do you
bring the SQL Server down?
In the cluster administrator, right click on the SQL Server group and from the popup menu item select take offline.
When setting Replication, can you have Distributor on SQL Server 2012, Publisher on SQL Server 2016?
No
What are the different types of database compression introduced in SQL Server 2016?
Row level and page level.
Can we take full backup (not copy_only) from secondary replica of Always-On ?
No
What is the default value of Cost Threshold of Parallelism in SQL Server 2016 ?
The default value is 0
How will you know why your Transactional Log file is waiting to be shrunk in SQL Server 2012?
when the transaction log fills, SQL Server cannot increase the size of the log file so the process fails with the error
and needs to shrink.
What permission is required and for which account to enable Instant File Initialization in SQL Server 2016?
Perform Volume Maintenance Tasks permission should be required.
Which Isolation level in SQL Server resolves both Non-Repeatable Read & Phantom Read problems ?
Serializable Isolation Level
How many synchronous secondary replicas supported in SQL Server 2016 Always-On ?
3 replicas
What are different maintenance and monitoring tools you have worked on?
1. DB Performance Analyzer
2. Red Gate SQL monitor
3. Apex SQL Monitor
4. Activity monitor
5. DTA
6. SQL Profiler
7. PerfMon which is windows native tool
8. SQLDIAG and SQLNEXUS
Informational commands provide feedback regarding the database such as providing information about the
procedure cache. Validation commands include commands that validate the database such as the ever-popular
CHECKDB. Finally, miscellaneous commands are those that obviously don’t fit in the other three categories. This
includes statements like DBCC HELP, which provides the syntax for a given DBCC command.
How can you control the amount of free space in your index pages?
You can set the fill factor on your indexes. This tells SQL Server how much free space to leave in the index pages
when re-indexing. The performance benefit here is fewer page splits (where SQL Server has to copy rows from one
index page to another to make room for an inserted row) because there is room for growth built into the index.
What are the High-Availability solutions in SQL Server and differentiate them briefly.
Failover Clustering, Database Mirroring, Log Shipping, and Replication are the High-Availability features available in
SQL Server
How many files can a Database contain in SQL Server? How many types of data files exist in SQL Server?
1. A Database can contain a maximum of 32,767 files.
2. There are Primarily 2 types of data files Primary data file and Secondary data file(s)
3. There can be only one Primary data file and multiple secondary data files as long as the total # of files is less than
32,767 files
If you are given access to a SQL Server, how do you find if the SQL Instance is a named instance or a default
instance?
I would go to the SQL Server Configuration Manager. In the left pane of the tool, I would select SQL Server Services,
the right side pane displays all of the SQL Server Services/components that are installed on that machine. If the
Service is displayed as (MSSQLSERVER), then it indicates it is a default instance, else there will be the Instance name
displayed.
When setting Replication, is it possible to have a Publisher as 64 Bit SQL Server and Distributor or Subscribers as a
32 Bit SQL Server.
Yes, it is possible to have various configurations in a Replication environment.
What are the different types of database compression introduced in SQL Server 2008?
Row compression and Page Compression.
What are the different types of Upgrades that can be performed in SQL Server?
1. In-place upgrade
2. Side-by-Side Upgrade
How many types of system databases are present in SQL Server 2008?
SQL Server 2008 (and 2005) contains six special databases: master, model, tempdb, msdb, and
msSQLsystemresource, Distribution.
What are the services are available along with SQL Server?
Reporting services, Analysis Services, Management Studio, Integration services, Full text search,browser, etc..
What are main services for SQL Server?
SQL Server and SQL Server Agent
What is Replication?
The source data will be copied to destination through replication agents
What is mirroring?
The primary data will be copied to secondary via network
What is clustering?
The data will be stored in shared location which is used by both primary and secondary Servers based on availability
of the Server.
In your backup strategy you have loss some data, how do you convince your client?
First check that the SLA (Service Level Agreement) and make sure data loss is in acceptable limits as per SLA. If not
you might have missed something in designing backup strategy to matchup the client SLA. However first thing should
be “Identify the possible ways” to get our data back. There are certain things from where we may get our data back
and of course it always depends on the situation. Check if we can get the tail log backup, see if that data is static or
dynamic, if it’s a static / master data then we may get it from the other mirror / partner database, or see if we can
match it with any Pre-Prod environment etc. Even after checking all possible ways you still failed to get data back
then approach your manager with a detailed RCA (Root Cause Analysis) by saying:
Why this happened?
Where is the mistake?
Can we provide any quick / temporary solution to get our data back?
What is the prevention from future failures?
Provide all possible logs / event information that you collected during the RCA
Where Does The Copy Job Runs In The Log Shipping Primary Or Secondary?
Secondary Server. This question is basically asked to find out whether you have a hands on work on log shipping or
not.
I Have All The Primary Data Files, Secondary Data Files As Well As Logs. Now, Tell Me Can I Still Restore The
Database Without Having A Full Backup?
You cannot restore the database without having a full database backup. However, if you have
the copy of all the data files (.mdf and .ndf) and logs (.ldf) when database was in working condition (or your desired
state) it is possible to attach the database using sp_attach_db
Can We Perform Backup Restore Operation On Tempdb?
No
Where SQL Server User Names And Passwords Are Stored In SQL Server?
They get stored in System Catalog Views sys.Server_principals and sys.SQL_logins.
What are the common issues a SQL DBA should deal with as a part of DBA daily job?
1. Backup Failure
2. Restore Failure
3. Log Full Issues
4. Blocking Alerts
5. Deadlocks Alerts
6. TEMPDB full issues
7. Disk Full Issues
8. SQL Connectivity Issues
9. Access issues
10. Installation and Upgrade Failures
11. SQL Agent Job failures
12. Performance Issues
13. Resource (Memory/IO/CPU etc.) Utilization Alerts
14. High-Availability and Disaster Recovery related issues
“model” system DB is down and we are trying to create a new database. Is it possible to create a new database
when model DB is down?
We can’t create a new database when model database is down. SQL Server restart will be unsuccessful when model
database is down as TEMPDB creation failed. TEMPDB is created based on model DB configurations, since model DB
is down TEMPDB will not be created.
What is the maximum limit of SQL Server instances for a standalone computer?
50 instances on a stand-alone Server for all SQL Server editions. SQL Server supports 25 instances on a failover
cluster.
Let’s say we have a situation. We are restoring a database from a full backup. The restore operation ran for 2
hours and failed with an error 9002 (Insufficient logspace). And the database went to suspect mode. How do you
troubleshoot this issue?
In that case we can actually add a new log file on other drive and rerun the restore operation using the system
stored procedure “sp_add_log_file_recover_suspect_db”. Parameters are the same as while creating a new log file
Let’s say we have a situation. We are restoring a database from a full backup. The restores operation runs for 2
hours and failed with an error 1105 (Insufficient space on the file group). And the database went to suspect mode.
How do you troubleshoot this issue?
In that case we can actually add a new data file on another drive and rerun the restore operation using the system
stored procedure “sp_add_data_file_recover_suspect_db”. Parameters are the same as while creating a new data
file.
Can you describe factors that causes the logfile grow?
1. CHECKPOINT has not occurred since last log truncation
2. No log backup happens since last full backup when database is in full recovery
3. An active BACKUP or RESTORE operation is running from long back
4. Long running active transactions
5. Database mirroring is paused or mode is in high performance
6. In replication publisher transactions are not yet delivered to distributer
7. Huge number of database snapshots is being created
I have my PROD SQL Server all system db’s are located on E drive and I need my resource db on H drive how can
you move it?
No only resource db cannot be moved, Resource db location is always depends on Master database location, if u
want to move resource db you should also move master db.
HA / DR:
High Availability means your database will be available for usage in a very large proportion of the time. It is
measured in percent of uptime or number of nine's, e.g. 99.99%.Basically it means "your DB and its data will (almost)
always be available to users".
Disaster recovery is a process of getting back to "available" state once your db becomes unavailable to users.
That's why DR is tied to HA, it's a process that resolves other part of those nines percentage (e.g. spending 0.01% of
time in recovery). Sometimes that DR will be with data loss. There should be defined upfront how much data loss is
acceptable.
You need a combination of healthy and redundant hardware and software (not just SQL Server software) to achieve
HA and do DR.
Each solution is different from what it protects from (e.g. user dropped table, motherboard failure, storage failure,
network failure, OS patching downtime, site failure by e.g. fire or tsunami, ww3, earth exploding or getting eaten by
a black hole)
Clustering and mirroring helps you achieve HA. Log shipping will not give you very high availability, but it will be
higher than not having it, and because of time lag there is a data loss. Synchronous mirroring gives you zero data
loss only if it is in "synchronized" (not "synchonizing") state. Cluster protects from motherboard failure and patching
downtime, but does not protect from storage failure. etc.
Verify that both the MS SQLServer and MSSearch services are running. The full-text search runs as a service named
Microsoft Search Service (MSSearch service). So, if this service was not started, the full-text search cannot work.
If a full-text catalog population or query fails, check that there is no mismatch of user account information between
the MSSQLServer and MSSearch services.
Change the password for the MS SQLServer service using the SQL Server Enterprise Manager (do not use Services in
Control Panel to change user account information). Changing the password for the MSSQLServer service results in an
update of the account the MSSearch service runs under.
Verify that the MSSearch service runs under the local system account. The Microsoft Search service is assigned to
the local system account during full-text search installation. Do not change the MSSearch service account
information after the installation. Otherwise, it cannot keep track of the MSSQLServer service account.
Verify whether you have a UNC path specification in your PATH variable. Having the UNC path specification(s) in
the SYSTEM or USER PATH variables can result in a full-text query failure with the message that full-text catalog is
not yet ready for queries. To work around this, you should replace the UNC path(s) with remapped drive(s) or add
the location \%SYSTEMDRIVE%\Program Files\Common Files\SYSTEM\ContentIndex in front of any UNC path
specification in the SYSTEM path.
If you encountered error indicating that insufficient memory is available, set the virtual memory setting to an
amount equal to three times the physical memory and set the SQL Server 'max Server memory' Server configuration
option to 1.5 times the physical memory. Because working with full-text search is very resource expensive, you
should make sure you have enough physical and virtual memory. Set the virtual memory size to at least 3 times the
physical memory installed in the computer and set the SQL Server 'max Server memory' Server configuration option
to half the virtual memory size setting (1.5 times the physical memory).
If you encountered an error indicating that your full-text query contains only ignored words, try to rewrite the
query to a phrase-based query, removing the noise words. You will get the error indicating that full-text query
contains ignored words when the CONTAINS predicate is used with words such as 'OR', 'AND' or 'BETWEEN' as
searchable phrase. For example, this select statement returns an error:
SELECT ProductName FROM Products WHERE CONTAINS(ProductName, 'and OR between')
Rewrite the English Query's questions, so that these questions will not require a full-text search on a table with a
uniqueidentifier key. Asking the English Query's questions that require a full-text search on a table with a
uniqueidentifier key may cause English Query to stop responding.
If you decide to install the full-text search by using the BackOffice 4.5 custom setup, after a successful installation of
SQL Server 7.0 (without installing full-text search), you should run SetupSQL.exe from the BackOffice CD-ROM(2) (\
SQL70\Machine_platform\Setup\SetupSQL.exe) You are not allowed to install it by using the BackOffice 4.5 setup,
because the BackOffice Custom Installation dialog box falsely indicates that the full-text search has been installed
already.
If you encountered an error indicating that the full-text query timed out trying to reduce the size of the result set,
increase the 'remote query timeout' setting or insert the full-text query result set into a temporary table instead of
streaming the results directly to the client.
Make a single column unique index for the table you want to be used in a full-text query. The full-text indexing
cannot work on a table that has a unique index on multiple columns. If the table you want to be used in a full-text
query does not currently have a single column unique index, add an IDENTITY column with a UNIQUE index or
constraint. Upgrade to SQL Server 2000 if you need to work with full-text search in a clustered environment. The full
text search is not available in SQL Server 7.0 clustered environments.
If all of the alerts are not firing, check that the SQLServerAgent and EventLog services are running. These services
must be started, if you need the alerts to be fired. So, if these services are not running, you should run them.
If an alert is not firing, make sure that it is enabled. The alert can be enabled or disabled. To check if an alert is
enabled, you can do the following:
1. Run SQL Server Enterprise Manager.
2. Expand a Server group; then expand a Server.
3. Expand Management; then expand SQL Server Agent.
4. Double-click the appropriate alert to see if the alert is enabled.
5. Check the history values of the alert to determine the last date that the alert worked.To view the history
values of the alert, you can do the following:
6. Run SQL Server Enterprise Manager.
7. Expand a Server group; then expand a Server.
8. Expand Management; then expand SQL Server Agent.
9. Double-click the appropriate alert to see the alert history.
Verify that the counter value is maintained for at least 20 seconds. Because SQL Server Agent polls the performance
counters at 20 second intervals, if the counter value is maintained for only a few seconds (less than 20 seconds),
there is a high likelihood that the alert will not fire.
If the alert fires, but the responsible operator does not receive notification, try to send 'e-mail', 'pager', or 'net send'
message to this operator manually. In most cases, this problem arises when you have entered an incorrect 'e-mail',
'pager', or 'net send' addresses. If you can send an 'e-mail', 'pager', or 'net send' message manually to this operator,
check the account the SQL Server Agent runs under, as well as the operator's on-duty schedule.
How can you control the amount of free space in your index pages?
You can set the fill factor on your indexes. This tells SQL Server how much free space to leave in the index pages
when re-indexing. The performance benefit here is fewer page splits (where SQL Server has to copy rows from one
index page to another to make room for an inserted row) because there is room for growth built into the index.
What are the new features in SQL Server 2005 when compared to SQL Server 2000?
There are quite a lot of changes and enhancements in SQL Server 2005. A few of them are listed here:
1. Database Partitioning
2. Dynamic Management Views
3. System Catalog Views
4. Resource Database
5. Database Snapshots
6. SQL Server Integration Services
7. Support for Analysis Services on a Failover Cluster.
8. Profiler being able to trace the MDX queries of the Analysis Server.
Peer-to peer Replication
Database Mirroring
What are the High-Availability solutions in SQL Server and differentiate them briefly.
Failover Clustering, Database Mirroring, Log Shipping, and Replication are the High-Availability features available in
SQL Server
If you are given access to a SQL Server, how do you find if the SQL Instance is a named instance or a default instance?
I would go to the SQL Server Configuration Manager. In the left pane of the tool, I would select SQL Server Services,
the right side pane displays all of the SQL Server Services/components that are installed on that machine. If the
Service is displayed as (MSSQLSERVER), then it indicates it is a default instance, else there will be the Instance name
displayed.
What are the differences in Clustering in SQL Server 2005 and 2008 or 2008 R2?
On SQL Server 2005, installing the SQL Server failover cluster is a single-step process whereas on SQL Server 2008 or
above it is a multi-step process. That is, in SQL Server 2005, the Installation process itself installs on all of the nodes
(be it 2 nodes or 3 nodes). In 2008 or above this has changed, we would need to install separately on all the nodes. 2
times if it is a 2 node cluster or 3 times in a 3 node cluster, and so on.
List out some of the requirements to set up a SQL Server failover cluster.
Virtual network name for the SQL Server, Virtual IP address for SQL Server, IP addresses for the Public Network and
Private Network(also referred to as Heartbeat) for each node in the failover cluster, shared drives for SQL Server
Data and Log files, Quorum Disk, and MSDTC Disk.
System stored procedures:
Stored Procedures are a pre-compiled set of one or more statements that are stored together in the database. They
reduce the network load because of the pre-compilation. Systems and procedures are crucial for your admin job, as
well. You have a procedure for processing an expense report; you have a system for doing the monthly accounting.
A procedure is simply one component of a larger system, and most systems are made up of a series of procedures. In
SQL Server, many administrative and informational activities can be performed by using system stored procedures.
Change Data Capture Stored Used to enable, disable, or report on change data capture objects.
Procedures
Cursor Stored Procedures Used to implements cursor variable functionality.
Data Collector Stored Procedures Used to work with the data collector and the following components:
collection sets, collection items, and collection types.
Database Engine Stored Procedures Used for general maintenance of the SQL Server Database Engine.
Database Mail Stored Procedures Used to perform e-mail operations from within an instance of SQL Server.
(Transact-SQL)
Database Maintenance Plan Stored Used to set up core maintenance tasks that are required to manage
Procedures database performance.
Distributed Queries Stored Procedures Used to implement and manage distributed queries.
Filestream and FileTable Stored Used to configure and manage the FILESTREAM and FileTable features.
Procedures (Transact-SQL)
Firewall Rules Stored Procedures Used to configure the Azure SQL Database firewall.
(Azure SQL Database)
Full-Text Search Stored Procedures Used to implement and query full-text indexes.
General Extended Stored Procedures Used to provide an interface from an instance of SQL Server to external
programs for various maintenance activities.
Log Shipping Stored Procedures Used to configure, modify, and monitor log shipping configurations.
Management Data Warehouse Stored Used to configure the management data warehouse.
Procedures (Transact-SQL)
OLE Automation Stored Procedures Used to enable standard Automation objects for use within a standard
Transact-SQL batch.
Policy-Based Management Stored Used for Policy-Based Management.
Procedures
PolyBase stored procedures Add or remove a computer from a PolyBase scale-out group.
Query Store Stored Procedures Used to tune performance.
(Transact-SQL)
Replication Stored Procedures Used to manage replication.
Security Stored Procedures Used to manage security.
Snapshot Backup Stored Procedures Used to delete the FILE_SNAPSHOT backup along with all of its snapshots or
to delete an individual backup file snapshot.
Spatial Index Stored Procedures Used to analyze and improve the indexing performance of spatial indexes.
SQL Server Agent Stored Procedures Used by SQL Server Profiler to monitor performance and activity.
SQL Server Profiler Stored Procedures Used by SQL Server Agent to manage scheduled and event-driven activities.
Stretch Database Stored Procedures Used to manage stretch databases.
Temporal Tables Stored Procedures Use for temporal tables
XML Stored Procedures Used for XML text management.
User Defined Stored Procedures in SQL:
1. Creating a Table
2. User Defined Stored Procedure
3. To Execute the Stored Procedure
4. Stored Procedure with Parameters
5. Modifying the Stored Procedure
6. Deleting the Stored Procedure
7. Adding Security through Encryption
Stored Procedure is the block of the SQL statement in Relational Database Management System (RDBMS), and it is
typically written by the programmer, Database Administrator, Data Analyst, and is saved and re-used for multiple
programs. It can also have different types depending on the various RDBMS. However, the two most crucial Stored
Procedures found in any RDBMS are as follows.
User-defined Stored Procedure
System Stored Procedure
Stored Procedure is a prepared compiled code which is stored or cached and then re-used. You can have cached
code for re-use which makes maintainability far easier, i.e., it doesn't need to be changed multiple times, which can
maintain security.
Now if you create a clustered index for all the columns used in Select statement then the SQL doesn’t need to go to
base tables as everything required are available in index pages.
Where do you find the default Index fill factor and how to change it?
The default index fill factor is available in index properties to manage data storage in the data pages. You have to
change through SSMS, and in object explorer just right click on the required Server and select the properties and go
to Database settings. In the default index fill factor box, you have to select or change the index fill factor that you
want to change.
What are the different Authentication modes in SQL Server and how can you change authentication mode?
We have windows authentication and Mixed mode available in SQL Server. To change authentication mode you have
to go SSMS, right click the Server and go for properties. On the security page, under Server authentication go to new
Server authentication mode and submit ok.
Due to some maintenance being done, the SQL Server on a failover cluster needs to be brought down. How do you
bring the SQL Server down?
In the cluster administrator, right click on the SQL Server group and from the popup menu item select take offline.
When setting Replication, can you have Distributor on SQL Server 2012, Publisher on SQL Server 2016?
No
What are the different types of database compression introduced in SQL Server 2016?
1. Row level
2. page level.
Where Does The Copy Job Runs In The Log Shipping Primary Or Secondary?
Secondary Server
I Have All The Primary Data Files, Secondary Data Files As Well As Logs. Now, Tell Me Can I Still Restore The
Database Without Having A Full Backup?
You cannot restore the database without having a full database backup. However, if you have the copy of all the data
files (.mdf and .ndf) and logs (.ldf) when database was in working condition (or your desired state) it is possible to
attach the database using sp_attach_db.
One Of The Developers In My Company Moved One Of The Columns From One Table To Some Other Table In The
Same Database. How Can I Find The Name Of The New Table Where The Column Has Been Moved?
This question can be answered by querying system views.
The previous query will return all the tables that use the column name specified in the WHERE condition.
What Is Service Broker?
Service Broker is a message-queuing technology in SQL Server that allows developers to integrate SQL Server fully
into distributed applications. Service Broker is feature which provides facility to SQL Server to send an asynchronous,
transactional message. it allows a database to send a message to another database without waiting for the response,
so the application will continue to function if the remote database is temporarily unavailable.
What are The Steps To Take To Improve Performance Of A Poor Performing Query?
1. Use indexes efficiently
2. Create all primary and foreign keys and relationships among tables.
3. Avoid using cursors
4. Avoid using Select*, rather mention the needed columns and narrow the resultset as needed.
5. Denormalize
6. Use partitioned views
7. Use temporary tables and table variables
8. Reduce joins and heavy clauses like GROUP BY if not needed
9. Implement queries as stored procedures.
10. Have a WHERE Clause in all SELECT queries.
11. Use data types wisely
12. Instead of NULLS use string values such as N/A
Safe Access Sandbox: Here a user can perform SQL operations such as creating stored procedures, triggers, etc. but
cannot have access to the memory and cannot create files.
External Access Sandbox: User can have access to files without having a right to manipulate the memory allocation.
Unsafe Access Sandbox: This contains untrusted codes where a user can have access to memory.
Transaction log backup: This is the mechanism of backing up the transaction logs that have been maintained in the
Server. This way the details of the database getting updated is obtained. This cannot be a stand-alone back up
mechanism. But can save a lot of time if we already have the file system related to the DB backed up on the new
deployment Server.
Differential backup: This is a subset of the complete backup, where only the modified datasets are backed up. This
can save the time when we are just trying to maintain a backup Server to main Server.
File backup: This is the quickest way to take the backup of entire database. Instead of taking in the data actually
stored in DB, the files are backed up and the file system thus obtained when combined with the transaction logs of
the original system will render the database that we are trying to back up.
What are the common issues a SQL DBA should deal with as a part of DBA daily job?
1. Backup Failure
2. Restore Failure
3. Log Full Issues
4. Blocking Alerts
5. Deadlocks Alerts
6. TEMPDB full issues
7. Disk Full Issues
8. SQL Connectivity Issues
9. Access issues
10. Installation and Upgrade Failures
11. SQL Agent Job failures
12. Performance Issues
13. Resource (Memory/IO/CPU etc.) Utilization Alerts
14. High-Availability and Disaster Recovery related issues
“model” system DB is down and we are trying to create a new database. Is it possible to create a new database
when model DB is down?
We can’t create a new database when model database is down. SQL Server restart will be unsuccessful when model
database is down as TEMPDB creation failed. TEMPDB is created based on model DB configurations, since model DB
is down TEMPDB will not be created.
What is the maximum limit of SQL Server instances for a standalone computer?
50 instances on a stand-alone Server for all SQL Server editions. SQL Server supports 25 instances on a failover
cluster.
Let’s say we have a situation. We are restoring a database from a full backup. The restore operation ran for 2
hours and failed with an error 9002 (Insufficient logspace). And the database went to suspect mode. How do you
troubleshoot this issue?
In that case we can actually add a new log file on other drive and rerun the restore operation using the system
stored procedure “sp_add_log_file_recover_suspect_db”. Parameters are the same as while creating a new log file.
Let’s say we have a situation. We are restoring a database from a full backup. The restores operation runs for 2
hours and failed with an error 1105 (Insufficient space on the file group). And the database went to suspect mode.
How do you troubleshoot this issue?
In that case we can actually add a new data file on another drive and rerun the restore operation using the system
stored procedure “sp_add_data_file_recover_suspect_db”. Parameters are the same as while creating a new data
file.
If you are given access to a SQL Server, how do you find if the SQL Instance is a named instance or a default
instance?
I would go to the SQL Server Configuration Manager. In the left pane of the tool, I would select SQL Server Services,
the right side pane displays all of the SQL Server Services/components that are installed on that machine. If the
Service is displayed as (MSSQLSERVER), then it indicates it is a default instance, else there will be the Instance name
displayed.
What is the difference between the 2 operating modes of Database Mirroring (mentioned in the above answer)?
High-Safety Mode is to make sure that the Principal and reflected info area unit synchronal state, that is the
transactions are committed at the same time on both Servers to ensure consistency, but there is/might be a
time lag.
High-Performance Mode is to ensure that the Principal database run faster, by not waiting for the Mirrored
database to commit the transactions. There is a small likelihood of knowledge loss and conjointly the
reflected info may be insulating material behind (in terms being up so far with the Principal database).
2. Open the latest SQL Server Error Log and check for errors logged for the database which is marked as suspect.
You can open SQL Server Error Log by expanding Management Node -> SQL Server Error Logs. In my Server I could
find below mentioned entries in SQL Server Error Logs.
Sample Error Messages within SQL Server Error Log when database is marked as SUSPECT
Starting up database 'BPO'.
3. When a database is in SUSPECT mode you will not be able to get connected to the database. Hence you need to
bring the database first in EMERGENCY mode to repair the database. Execute the below mentioned TSQL code to
bring the database in EMERGENCY mode.
USE master
GO
ALTER DATABASE BPO SET EMERGENCY
GO
Once the database is in EMERGENCY mode you will be able to query the database.
4. Execute the DBCC CHECKDB command which will check the logical and physical integrity of all the objects within
the specified database.
DBCC CHECKDB (BPO)
GO
DBCC CHECKDB will take time depending upon the size of the database. Its always recommended to run DBCC
CHECKDB as part of your regular maintenance schedule for all the SQL Server Databases.
5. Next step will be to bring the user database in SINGLE_USER mode by executing the below mentioned TSQL
code.
ALTER DATABASE BPO SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
When you repair your database using REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB command there can be
some loss of data.
Once the database is successfully repaired using REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB command
then there is no way to go back to the previous state.
6. Once the database is in SINGLE_USER mode execute the below TSQL code to repair the database.
DBCC CHECKDB (BPO, REPAIR_ALLOW_DATA_LOSS)
GO
7. Finally, execute the below mentioned TSQL command to allow MULTI_USER access to the database.
ALTER DATABASE BPO SET MULTI_USER
GO
How to Start SQL Server with Minimal Configuration or without TempDB database
1. Open Command Prompt as an administrator and then go to the BINN directory where SQL Server is installed
and type SQLservr.exe /f /c. On our Production Server SQL Server is installed on the following drive location
“E:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2014\MSSQL\Binn\“.
SQLservr.exe /f /c
2. Open New Command Prompt window as an administrator and then Connect to SQL Server Instance
Using SQLCMD.
SQLCMD –S localhost –E
3. Once you have performed the troubleshooting steps. Exit SQLCMD window by typing Quit and Press Enter.
For more information. see, How to Start SQL Server without TempDB
4. In the initial window click CTRL C and enter Y to Stop SQL Server Service.
5. Finally, start SQL Server Database Engine Using SQL Server Configuration Manager.
6. Best Practices to follow when connecting to SQL Server Instance with Minimal Configuration
7. It is recommended to use SQLCMD command line utility and the Dedicated Administrator Connection
(DAC) to connect to the SQL Server Instance. In case if you use normal connection then stop SQL Server
Agent service as it will use the first available connection thereby blocking other users.
How many data files I can put in TEMPDB? What is the effect of adding multiple data files?
It is recommended that you have one TempDB data file per CPU core. For example a Server with 2 dual core
processors would recommend 4 TempDB data files. With that in mind, Microsoft recommends a maximum of
8 TempDB files per SQL Server. With that said, there is really no set rule for how many TempDB files you should have
to maximize performance. Each system is different. The only real way to know what will work best for your system is
to scale out the number of TempDB files one at a time and analyze the performance after each additional file is
added.
One of the disk drive is 95% full within a very less time. Now you started getting disk full alerts and you need to
free up the space. What are the different options you try to clear the space? And what are the possible reasons
that cause the sudden disk full issues?
1. To monitor these metrics, and to set up alerts to warn you when:
2. Free disk space dips below a threshold value
3. Unallocated space in a database file dips below a threshold value
4. Database files grow, physically, in size
By monitoring each of these metrics you will, firstly, cut out all the unplanned downtime caused by ‘unexpectedly’
running out of disk space; an event that can adversely affect a DBA’s career. Secondly, you’ll increase the time
available to react. If you can predict when a database is going to need to grow, you can schedule in maintenance
time, to increase capacity, at a point when it will have the least impact on your business. It will also help you avoid
file auto-growth occurring at unpredictable times, when it could block and disrupt important business processes.
Finally, you’ll also be able to investigate and fix any problem that may be causing excessive database growth. All this
will result in much less disruption to business processes, and a drastic reduction in time spent on ad-hoc disk space
management.
Let’s say we are not receiving emails from last 4 hours as there is a problem with database mail. Now we would
like to restart the database mail. Have you ever faced this issue? If yes how do you restart the database mail?
Troubleshooting Database Mail involves checking the following general areas of the Database Mail system. These
procedures are presented in a logical order, but can be evaluated in any order.
Permissions You must be a member of the sysadmin fixed Server role to troubleshoot all aspects of Database Mail.
Users who are not members of the sysadmin fixed Server role can only obtain information about the e-mails they
attempt to send, not about e-mails sent by other users.
Is database mail enabled
In SQL Server Management Studio, connect to an instance of SQL Server by using a query editor window, and then
execute the following code:
SQLCopy
sp_configure 'show advanced', 1;
GO
RECONFIGURE;
GO
sp_configure;
GO
In the results pane, confirm that the run_value for Database Mail XPs is set to 1. If the run_value is not 1, Database
Mail is not enabled. Database Mail is not automatically enabled to reduce the number of features available for attack
by a malicious user.
If you decide that it is appropriate to enable Database Mail, execute the following code:
SQLCopy
sp_configure 'Database Mail XPs', 1;
GO
RECONFIGURE;
GO
To restore the sp_configure procedure to its default state, which does not show advanced options, execute the
following code:
SQLCopy
sp_configure 'show advanced', 0;
GO
RECONFIGURE;
GO
SQLCopy
EXEC msdb.dbo.sysmail_help_principalprofile_sp;
Use the Database Mail Configuration Wizard to create profiles and grant access to profiles to users.
Is database mail started
The Database Mail External Program is activated when there are e-mail messages to be processed. When there have
been no messages to send for the specified time-out period, the program exits. To confirm the Database Mail
activation is started, execute the following statement:
SQLCopy
EXEC msdb.dbo.sysmail_help_status_sp;
If the Database Mail activation is not started, execute the following statement to start it:
SQLCopy
EXEC msdb.dbo.sysmail_start_sp;
If the Database Mail external program is started, check the status of the mail queue with the following statement:
SQLCopy
EXEC msdb.dbo.sysmail_help_queue_sp @queue_type = 'mail';
The mail queue should have the state of RECEIVES_OCCURRING. The status queue may vary from moment to
moment. If the mail queue state is not RECEIVES_OCCURRING, try restarting the queue. Stop the queue using the
following statement:
SQLCopy
EXEC msdb.dbo.sysmail_stop_sp;
Then start the queue using the following statement:
SQLCopy
EXEC msdb.dbo.sysmail_start_sp;
Note
Use the length column in the result set of sysmail_help_queue_sp to determine the number of e-mails in the Mail
queue.
Do problems affect some or all accounts
If you have determined that some but not all profiles can send mail, then you may have problems with the Database
Mail accounts used by the problem profiles. To determine which accounts are successful in sending mail, execute the
following statement:
SQLCopy
SELECT sent_account_id, sent_date FROM msdb.dbo.sysmail_sentitems;
If a non-working profile does not use any of the accounts listed, then it is possible that all the accounts available to
the profile are not working properly. To test individual accounts, use the Database Mail Configuration Wizard to
create a new profile with a single account, and then use the Send Test E-Mail dialog box to send mail using the new
account.
To view the error messages returned by Database Mail, execute the following statement:
SQLCopy
SELECT * FROM msdb.dbo.sysmail_event_log;
Note
Database Mail considers mail to be sent when it is successfully delivered to a SMTP mail Server. Subsequent errors,
such as an invalid recipient e-mail address, can still prevent mail from being delivered, but will not be contained in
the Database Mail log.
Retry mail delivery
If you have determined that the Database Mail is failing because the SMTP Server cannot be reliably reached, you
may be able to increase your successful mail delivery rate by increasing the number of times Database Mail attempts
to send each message. Start the Database Mail Configuration Wizard, and select the View or change system
parameters option. Alternatively, you can associate more accounts to the profile so upon failover from the primary
account, Database Mail will use the failover account to send e-mails.
On the Configure System Parameters page, the default values of five times for the Account Retry Attempts and 60
seconds for the Account Retry Delay means that message delivery will fail if the SMTP Server cannot be reached in 5
minutes. Increase these parameters to lengthen the amount of time before message delivery fails.
Note
When large numbers of messages are being sent, large default values may increase reliability, but will substantially
increase the use of resources as many messages are attempted to be delivered over and over again. Address the root
problem by resolving the network or SMTP Server problem that prevents Database Mail from contacting the SMTP
Server promptly.
Can you explain what happens to Virtual log file (VLF’s) when we see the error “Log File is full”?
Each physical transaction log file is divided internally into numerous virtual log files, or VLFs. The virtual log files are
not a certain size nor can you specify how many VLF’s are in a physical log file. The Database Engine does this for us,
but for performance reasons it tries to maintain a small number of virtual files.
System Performance is affected when the virtual log file is defined by small size and growth_increment values
meaning if the log files grow to a large size because of many small increments, it will increase the number of virtual
log files. This is why it’s a good reason to set Autogrow to a larger increment. If the log is set to grow at 1MB at a
time it may grow continuously resulting in more and more virtual log files. An increased number of VLF’s can slow
down database startup and log backup/restore operations.
SQL Server internally manages the Log file into multiple smaller chunks called Virtual Log Files or VLFs. A Virtual Log
File is a smaller file inside Log file which contains the actual log records which are actively written inside them. New
Virtual Log Files are created when the existing ones are already active and new space is required. This brings us to
the point where the value of the Virtual Log Files is created. So, whenever there is a crash and recovery condition,
SQL Server first needs to read the Virtual Log File. Certainly, if the number of Virtual Log Files is huge then the time
taken by the recovery will also be huge which we do not want.
There’s not a right or wrong number of VLF’s per database, but remember, the more you have the worse
performance you may have. You can use DBCC LOGINFO to check the number of VLF’s in your database.
If you found that you have a high number of Virtual Log Files, then you might be at risk, but don’t worry, the solution
is pretty simple.
Just shrink the Log File and re-grow it again. But there is a catch that might require a downtime so just keep it in
mind it’s not really very easy to shrink the Log file all the time. So follow mentioned below steps and you will most
probably be in a safe position.
1. Backup the Transaction Log file if the recovery model of the database is FULL (which generally speaking
should be FULL for all production databases for good recovery processes).
2. Issue a CHECKPOINT manually so that pages from the buffer can be written down to the disk.
3. Make sure there are no huge transactions running and keeping the Log file full.
4. Shrink the Log file to a smaller size
5. Re-grow the log file to a larger size on which your log file generally keeps on working.
6. That’s all! You will have a very low number of Virtual Log Files as of now.
If I apply both FORCESCAN and FORCESEEK on the same query what will happen?
seeks are better than scans in terms of retrieving data from SQL Server. In reality, sometimes a seek is the optimal
approach, but in some cases a scan can actually be better.
Using AdventureWorks2012, create the following index on the Sales.SalesOrderHeader table:
CREATE INDEX IX_SOH_OrderDate
ON Sales.SalesOrderHeader(OrderDate, CustomerID, TotalDue);
Now, compare these two queries:
DECLARE @start_date DATETIME = '20060101', @end_date DATETIME = '20120102';
Here we see that the query that was forced to seek actually took longer, used more CPU, and had to perform a much
larger number of reads.
So, what is a “deadlock”? It can be divided into two root words: “dead” and “lock”. We could intuitively
understand it as a lock that leads to a dead end…
In relational database management systems, locking is a mechanism that happens every time. Actually, we can
acquire a lock on different kind of resources (row identifier, key, page, table…) and using different modes (shared,
exclusive…). Choosing a mode instead of another to access a given resource in a session will either let other sessions
(or transaction) access the same resource or will make other sessions wait for that resource to be unlocked. Let’s
notice that all locking modes are not compatible. For that reason, Microsoft provided a documentation page about
what they call lock compatibility.
Still, intuitively, we could say that a deadlock falls into the second case, the one that tells other sessions to wait for a
resource, but this wait might never end. This second case is commonly referred to as “blocking”.
Understanding deadlocks
Although it’s based on the same principles, deadlocks are different from blocking. Actually, when a deadlock
situation happens, there is no identifiable head blocker as both sessions implied hold incompatible locks on objects
the other session needs to access. It’s a circular blocking chain.
For better understanding, we will go back to the situation we used for blocking presentation and add some
complexity to that situation.
Let’s say that in order to modify a row in Invoice table, UserA must also read from an InvoiceDetails table to get the
total that is billed to customer. Let’s say that, no matter the reason, UserB has already acquired an exclusive lock on
a page containing a row of InvoiceDetails table that UserA needs to read.
In such a case, we are in the situation depicted by following figure. (As a reminder, green is used to refer
to UserA and orange for UserB)
As you can see in the figure above, both threads are waiting for a lock that won’t be ever released as the activity of
one is suspended until the other releases its acquired locks. There can be more complicated in real-life situations and
I would suggest those interested in the subject to search the web for resources like the one written by Minette
Steynberg in 2016 entitled What is a SQL Server deadlock?.
Fortunately, the SQL Server database engine comes with a deadlock monitor thread that will periodically check for
deadlock situations, choose one of the processes implied as a victim for termination. While this is a good point for
one of the sessions, this is not for the other. The “victim” is actually terminated with an error and has to be run
again.
Here is some further information about the deadlock monitor thread:
1. It runs every 5 seconds by default
2. When it detects a deadlock, this interval falls from 5 seconds to as low as 100 milliseconds based on
frequency of deadlock occurrences
3. When it finally finds no deadlock, it put the interval to its default of 5 seconds
4. Once the deadlock victim is chosen, it will roll back the transaction of this victim and return a 1205 error
message to the user. The error message looks like follows
5. Transaction (Process ID 89) was deadlocked on resources with another process and has been chosen as the
deadlock victim. Rerun the transaction.
6. By default, the deadlock victim is chosen based on the estimated amount of resource consumption for
rolling back. It’s the least expensive one that is chosen. We can use SET DEADLOCK_PRIORITY
<Value> statement to influence the choice of deadlock victim.
There are two trace flags or interest for deadlock monitoring: 1204 and 1222.
According to documentation on Microsoft’s website, the first one will tell SQL Server to return resources and types of
lock participating in a deadlock and also the current command affected while the second one returns the resources
and types of lock that are participating in a deadlock and also the current command affected in an XML format.
Basically, both do the same job and I would recommend using 1222 trace flag in preference to the first one as it will
generate an XML which is easier to parse and integrate.
The output of both trace flags will be visible in SQL Server Error Log.
When this item is selected, we can go to the third tab and specify whether to save deadlock XML events in a
separate results file. We will need to parse this file with a script or something else.
This method is simple and provides a results we can use quite easily to investigate deadlocks. We can also run the T-
SQL equivalent code to generate the trace.
Note: This solution can impact performance as it will use additional resources to collect and provide deadlock
information to user. As such, this isn’t the optional solution
Once auditing is set properly, it shouldn’t be changed, unless documented. The Audit settings history report shows
the auditing setting changes. Unexpected changes should be investigated, as they can lead to incomplete audit trails,
and thus threaten data security:
Once setting Replication, can you have a Distributor on SQL Server 2005, Publisher of SQL Server 2008?
No, you cannot have a Distributor on a previous version than the Publisher
Which SQL Server table is used to hold the stored procedure scripts?
Sys.SQL_Modules is a SQL Server table used to store the script of stored procedure. Name of the stored procedure is
saved in the table called Sys.Procedures.
You are working as a database administrator in an international company. You are administrating a SQL Server
database that contains critical data stored in specific columns. What should you use in order to encrypt the critical
data stored in these columns taking into consideration that the database administrators should not be able to
view the critical data?
You should use the Always Encrypted feature, by setting the column master key, the column encryption key then
encrypts the required columns.
You are working as a database administrator in an international company. You are trying to enhance the
performance of the application that is connecting to the SQL Server instance by controlling the parallelism
behavior of the queries. This can be achieved by tuning the Max Degree of Parallelism and Cost Threshold for
Parallelism Server-level configurations. What are the two cases in which you will not consider tuning the Cost
Threshold for Parallelism?
When the Maximum Degree of Parallelism option is set to 1 or the number of logical processors available in the SQL
Server is only one. In this case, the queries will always run in a single thread.
How can we check whether the port number is connecting or not on a Server?
TELNET <HOSTNAME> PORTNUMBER
TELNET PAXT3DEVSQL24 1433
TELNET PAXT3DEVSQL24 1434
Common Ports:
MSSQL Server: 1433
HTTP TCP 80
HTTPS TCP 443
What port do you need to open on your Server firewall to enable named pipes connections?
Port 445. Named pipes communicate across TCP port 445.
What are the different log files and how to access it?
SQL Server Error Log: The Error Log, the most important log file, is used to troubleshoot system problems. SQL
Server retains backups of the previous six logs, naming each archived log file sequentially. The current error log file is
named ERRORLOG. To view the error log, which is located in the %Program-Files%Microsoft SQL
ServerMSSQL.1MSSQLLOGERRORLOG directory, open SSMS, expand a Server node, expand Management, and click
SQL Server Logs.
SQL Server Agent Log: SQL Server’s job scheduling subsystem, SQL Server Agent, maintains a set of log files with
warning and error messages about the jobs it has run, written to the %ProgramFiles%Microsoft SQL
ServerMSSQL.1MSSQLLOG directory. SQL Server will maintain up to nine SQL Server Agent error log files. The current
log file is named SQLAGENT.OUT, whereas archived files are numbered sequentially. You can view SQL Server Agent
logs by using SQL Server Management Studio (SSMS). Expand a Server node, expand Management, click SQL Server
Logs, and select the checkbox for SQL Server Agent.
Windows Event Log: An important source of information for troubleshooting SQL Server errors, the Windows Event
log contains three useful logs. The application log records events in SQL Server and SQL Server Agent and can be
used by SQL Server Integration Services (SSIS) packages. The security log records authentication information, and the
system log records service startup and shutdown information. To view the Windows Event log, go to Administrative
Tools, Event Viewer.
SQL Server Setup Log: You might already be familiar with the SQL Server Setup log, which is located at
%ProgramFiles%Microsoft SQL Server90Setup BootstrapLOGSummary.txt. If the summary.txt log file shows a
component failure, you can investigate the root cause by looking at the component’s log, which you’ll find in the
%Program-Files%Microsoft SQL Server90Setup BootstrapLOGFiles directory.
SQL Server Profiler Log: SQL Server Profiler, the primary application-tracing tool in SQL Server, captures the system’s
current database activity and writes it to a file for later analysis. You can find the Profiler logs in the log .trc file in the
%ProgramFiles%Microsoft SQL ServerMSSQL.1MSSQLLOG directory
How many data files I can put in Tempdb? What is the effect of adding multiple data files?
By far, the most effective configuration is to set tempdb on its own separate fast drive away from the user
databases. I would set the number of files based on # of cpu’s divided by 2. So, if you have 8 cpu’s, then set 4
tempdb files. Set the tempdb large enough with 10% data growth. I would start at a general size of 10 GB for each
size. I also would not create more than 4 files for each mdf/ldf even if there were more than 8 cpu’s. you can always
add more later.
Let’s say a user is performing a transaction on a clustered Server and failover has occurred. What will happen to
the Transaction?
If it is active/passive, there is a good chance the transaction died, but active/passive is considered by some the better
as it is not as difficult to administer. I believe that is what we have on active. Still, active/active may be best
depending on what the requirements are for the system.
How you do which node is active and which is passive. What are the criteria for deciding the active node?
Open Cluster Administrator, check the SQL Server group where you can see the current owner. So the current owner
is the active node and other nodes are passive.
I have my PROD SQL Server all system DB's are located on E drive and I need my resource db on H drive how can
you move it?
No only resource db cannot be moved, Resource db location is always depends on Master database location, if u
want to move resource db you should also move master db.
Any idea what is the Resource db mdf and ldf file names?
1. msSQLsystemresource.mdf
2. msSQLsystemresource.ldf
What Changes In The Front End Code Is Needed If Mirroring Is Implemented For The High Availability?
You need to add only FAILOVER PARTNER information in your front end code. “Data Source=ServerA;Failover
Partner=ServerB;Initial Catalog=AdventureWorks;Integrated Security=True;”.
Where Does The Copy Job Runs In The Log Shipping Primary Or Secondary?
Secondary Server. This question is basically asked to find out whether you have hands on work on log shipping or
not.
Which system database is a read-only database that contains copies of all system objects?
Resource system database, that resides in the msSQLsystemresource.mdf file.
Which system database that is only created when the SQL Server instance is configured as a replication
distributor?
The Distribution database that stores replications transactions and metadata.
Which DBCC command should you use to check the consistency of the system tables?
DBCC CHECKCATALOG. The command that checks the catalog consistency.
Q15: After running the DBCC CHECKDB command on a database, it is found that a non-clustered index is corrupted.
You are working as a database administrator in an international company. You are administrating a SQL Server
instance on which you need to write a message to the Server security log when a fixed Server role is modified.
How could you achieve that?
You need to define a SQL Server Audit Specification with the Server Audit target to the Server Security Log
As a SQL Server database administrator, what is the main tool that you can use to perform the SQL Server network
configuration changes?
The SQL Server Configuration Manager.
You are a SQL Server database administrator in a company, and you are configuring the firewall rules on the
Windows Server that is hosting your SQL Server. What are the ports that should be considered in the firewall
rules, in order to allow connections to the following SQL Server components?
1. SQL Server Engine default instance: (TCP port 1433)
2. Dedicated Admin Connection: (TCP port 1434)
3. SQL Server Browser Service: (UDP port 1434)
4. Database Mirroring: (No default, but the commonly used is TCP port 5022)
5. SQL Server Analysis Services: (TCP port 2383)
6. Reporting Services Web Services: (TCP port 80)
7. Microsoft Distributed Transaction Coordinator: (TCP port 135)
As a SQL Server database administrator, you are requested to prevent a client who is trying to search for a SQL
Server instance, using the “Connect to Server” Browse button, from seeing that instance. How could you achieve
that?
We can use the HideInstance flag under the Protocols for <Server instance> under the SQL Server Configuration
Manager. In this way, the SQL Server Browser service will not be able to expose the SQL Server instance to that
client. But, to be able to connect to that instance, you should provide the port number that this SQL Server instance
is listening on, in the connection string, even if the SQL Server Browser Service is running.
Take into consideration that, hiding the SQL Server instance may cause an issue when hiding a clustered instance,
where the cluster service will not be able to reach that instance. This can be fixed by creating an alias on each node
that shows the other nodes names and the port numbers that these nodes are listening on.
You are a SQL Server database administrator in a company. One of your clients complains that he is not able to
connect to the SQL Server and getting the error message below. How could you troubleshoot it?
Login Failed for User ‘<domainname>\<username>’
Check that the database is accessible by another authorized user
Check that the user who is trying to connect has permission to perform that action to the database
Which are the third-party tools used in SQL Server and why would you use them?
1. SQL Check (Idera): For monitoring Server activities and memory levels
2. SQL Doc 2 (Redgate): For documenting databases
3. SQL Backup 5 (Redgate): For automating the backup process
4. SQL Prompt (Redgate): For providing IntelliSense for SQL Server 2005/2000
5. LiteSpeed 5.0 (Quest): For backup and restore processes
Benefits of using these third-party tools:
1. Faster and flexible backup and recovery options
2. Secure backups with encryption
3. An enterprise view of the backup and recovery environment
4. Easy identification of optimal backup settings
5. Visibility into the transaction log and transaction log backups
6. A timeline view of backup history and schedules
7. Recovery of individual database objects
8. Encapsulation of a complete database restore into a single file to speed up restore time
9. Improving SQL Server functionality
10. Saving time and proving better information or notification
1. One is the READ_COMMITTED isolation level. This is the only level that supports both a pessimistic (locking-
based) and an optimistic (version-based) concurrency control model.
2. The other is the SNAPSHOT isolation level that supports only an optimistic concurrency control model.
If you are given access to a SQL Server, how do you find if the SQL Instance is a named instance or a default
instance?
I would go to the SQL Server Configuration Manager. In the left pane of the tool, I would select SQL Server Services,
the right side pane displays all of the SQL Server Services/components that are installed on that machine. If the
Service is displayed as (MSSQLSERVER), then it indicates it is a default instance, else there will be the Instance name
displayed.
What are the differences in Clustering in SQL Server 2005 and 2008 or 2008 R2?
On SQL Server 2005, installing the SQL Server failover cluster is a single step process whereas on SQL Server 2008 or
above it is a multi-step process. That is, in SQL Server 2005, the Installation process itself installs on all of the nodes
(be it 2 nodes or 3 nodes). In 2008 or on top of this has modified, we would need to install separately on all the
nodes. 2 times if it’s a two-node cluster or three times in an exceedingly three-node cluster so on.
What area unit the various kinds of info compression introduced in SQL Server 2008?
Row compression and Page Compression.
What are the different types of Upgrades that can be performed in SQL Server?
In-place upgrade and Side-by-Side Upgrade.
On a Windows Server 2003 Active – Passive failover cluster, how do you find the node which is active?
Using Cluster Administrator, connect to the cluster and select the SQL Server cluster. Once you have selected the
SQL Server group, on the right-hand side of the console, the column“Owner” gives us the information of the node on
which the SQL Server group is currently active.
Once setting Replication, can you have a Distributor on SQL Server 2005, Publisher of SQL Server 2008?
No, you cannot have a Distributor on a previous version than the Publisher.
What are statistics, under what circumstances they go out of date, and how do you update them?
Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that
index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining
whether to choose an index or not while executing a query. Some situations under which you should update
statistics:
1. If there is significant change in the key values in the index
2. If a large amount of data in an indexed column has been added, changed, or removed (that is, if the
distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE
statement and then repopulated
3. Database is upgraded from a previous version
What is a Schema in SQL Server ? Explain how to create a new Schema in a Database?
A schema is used to create database objects. It can be created using CREATE SCHEMA statement. The objects
created can be moved between schemas. Multiple database users can share a single default schema. CREATE
SCHEMA sample; Table creation Create table sample.sampleinfo { id int primary key, name varchar(20) }
“model” system DB is down and we are trying to create a new database. Is it possible to create a new database
when model DB is down?
We can’t create a new database when model database is down. SQL Server restart will be unsuccessful when model
database is down as TEMPDB creation failed. TEMPDB is created based on model DB configurations, since model DB
is down TEMPDB will not be created.
What is the maximum limit of SQL Server instances for a standalone computer?
50 instances on a stand-alone Server for all SQL Server editions. SQL Server supports 25 instances on a failover
cluster.
How to apply service pack on Active / Passive cluster on 2008 and 2012?
1. Freeze the service groups on Node A (active node).
2. Confirm all SQL services are stopped on Node B.
3. Upgrade the SQL Server 2008 instance on Node B.
4. Reboot node B.
5. Unfreeze the service group on node A.
6. Fail over the service group to Node B.
7. After the service group comes online, freeze the service group on Node B.
8. Confirm all SQL services are stopped on Node A.
9. Upgrade the SQL Server 2008 instance on Node A.
10. Reboot Node A.
11. Unfreeze the service group on node B.
12. Fail back the service group to Node A.
For transactional replication, the behavior of log shipping depends on the sync with backup option. This option can
be set on the publication database and distribution database; in log shipping for the Publisher, only the setting on
the publication database is relevant.
Setting this option on the publication database ensures that transactions are not delivered to the distribution
database until they are backed up at the publication database. The last publication database backup can then be
restored at the secondary Server without any possibility of the distribution database having transactions that the
restored publication database does not have. This option guarantees that if the Publisher fails over to a secondary
Server, consistency is maintained between the Publisher, Distributor, and Subscribers. Latency and throughput are
affected because transactions cannot be delivered to the distribution database until they have been backed up at the
Publisher.
1. Replication Monitor: In replication monitor from the list of all subscriptions just double click on the desired
subscription. There we find three tabs.
1. Publisher to Distributor History
2. Distributor to Subscriber History
3. Undistributed commands
2. Replication Commands:
1. Publisher.SP_ReplTran: Checks the pending transactions at p
2. Distributor.MSReplCommands and MSReplTransactions: Gives the transactions and commands details.
Actual T_SQL data is in binary format. From the entry time we can estimate the latency.
3. Distributor.SP_BrowseReplCmds: It shows the eaxct_seqno along with the corresponding T-SQL command
4. sp_replmonitorsubscriptionpendingcmds: It shows the total number of pending commands to be applied at
subscriber along with the estimated time.
3. Tracer Tokens:
Available from Replication Monitor or via TSQL statements, Tracer Tokens are special timestamp transactions written
to the Publisher’s Transaction Log and picked up by the Log Reader. They are then read by the Distribution Agent and
written to the Subscriber. Timestamps for each step are recorded in tracking tables in the Distribution Database and
can be displayed in Replication Monitor or via TSQL statements.
When Log Reader picks up Token it records time in MStracer_tokens table in the Distribution database. The
Distribution Agent then picks up the Token and records Subscriber(s) write time in the MStracer_history tables also
in the Distribution database.
Below is the T-SQL code to use Tracer tokens to troubleshoot the latency issues.
–A SQL Agent JOB to insert a new Tracer Token in the publication database.
USE [AdventureWorks]
Go
EXEC sys.sp_posttracertoken @publication = <PublicationName>
Go
–Token Tracking Tables
USE Distribution
Go
–publisher_commit
SELECT Top 20 * FROM MStracer_tokens Order by tracer_id desc
–subscriber_commit
SELECT Top 20 * FROM MStracer_history Order by parent_tracer_id desc
Let’s say we have a situation. We are restoring a database from a full backup. The restore operation ran for 2
hours and failed with an error 9002 (Insufficient logspace). And the database went to suspect mode. How do you
troubleshoot this issue?
In that case we can actually add a new log file on other drive and rerun the restore operation using the system
stored procedure “sp_add_log_file_recover_suspect_db”. Parameters are the same as while creating a new log file.
Let’s say we have a situation. We are restoring a database from a full backup. The restores operation runs for 2
hours and failed with an error 1105 (Insufficient space on the file group). And the database went to suspect mode.
How do you troubleshoot this issue?
In that case we can actually add a new data file on another drive and rerun the restore operation using the system
stored procedure “sp_add_data_file_recover_suspect_db”. Parameters are the same as while creating a new data
file.
How do you troubleshoot a Full transaction log issue?
1. Columns log_reuse_wait and log_reuse_wait_desc of the sys.databases catalog view describes what is the
actual problem that causes log full / delay truncation.
2. Backing up the log.
3. Freeing disk space so that the log can automatically grow.
4. Moving the log file to a disk drive with sufficient space.
5. Increasing the size of a log file.
6. Adding a log file on a different disk.
7. Completing or killing a long-running transaction.
How MAXDOP impacts SQL Server?
The Microsoft SQL Server max degree of parallelism (MAXDOP) configuration option controls the number of
processors that are used for the execution of a query in a parallel plan. This option determines the computing and
threads resources that are used for the query plan operators that perform the work in parallel.
For Servers that use more than eight processors, use the following configuration:
MAXDOP=8
For Servers that use eight or fewer processors, use the following configuration: MAXDOP=0 to N
What are the phases of SQL Server database restore process?
1. Copy Data: Copies all data,log and index pages from backup file to database mdf, ndf and ldf files
2. REDO: Rollfoward all committed transactions to database and if it finds any uncommitted transactions it goes
to the final phase UNDO.
3. UNDO: Rollback any uncommitted transactions and make database available to users.
See I have an environment, Sunday night full backup, everyday night diff backup and every 45 min a transactional
backup. Disaster happened at 2:30 PM on Saturday. You suddenly found that the last Sunday backup has been
corrupted. What’s your recovery plan?
When you find that the last full backup is corrupted or otherwise unrestorable, making all differentials after that
point useless. You then need to go back a further week to the previous full backup (taken 13 days ago), and restore
that, plus the differential from 8 days ago, and the subsequent 8 days of transaction logs (assuming none of those
ended up corrupted!).
If you’re taking daily full backups, a corrupted full backup only introduce an additional 24 hours of logs to restore.
Alternatively, a log shipped copy of the database could save your bacon (you have a warm standby, and you know
the log backups are definitely good).
What is .TUF file? What is the significance of the same? Any implications if the file is deleted?
.TUF file is the Transaction Undo File, which is created when performing log shipping to a Server in Standby mode.
When the database is in Standby mode the database recovery is done when the log is restored; and this mode also
creates a file on destination Server with .TUF extension which is the transaction undo file. This file contains
information on all the modifications performed at the time backup is taken.
The file plays a important role in Standby mode… the reason being very obvious while restoring the log backup all
uncommitted transactions are recorded to the undo file with only committed transactions written to disk which
enables the users to read the database. So when we restore next transaction log backup; SQL Server will fetch all the
uncommitted transactions from undo file and check with the new transaction log backup whether committed or not.
If found to be committed the transactions will be written to disk else it will be stored in undo file until it gets
committed or rolledback.
If .tuf file is got deleted there is no way to repair log shipping except reconfiguring it from scratch.
Explain how the SQL Server Engine uses an Index Allocation Map (IAM)?
SQL Server Engine uses an Index Allocation Map (IAM) to keep an entry for each page to track the allocation of these
available pages. The IAM is considered as the only logical connection between the data pages, that the SQL Server
Engine will use to move through the heap.
What is the “Forwarding Pointers issue” and how can we fix it?
When a data modification operation is performed on heap table data pages, Forwarding Pointers will be inserted
into the heap to point to the new location of the moved data. These forwarding pointers will cause performance
issues over time due to visiting the old/original location vs the new location specified by the forwarding pointers to
get a specific value.
Starting from SQL Server version 2008, a new method was introduced to overcome the forwarding pointers
performance issue, by using the ALTER TABLE REBUILD command that will rebuild the heap table.
Describe the structure of a SQL Server Index that provides faster access to the table’s data?
A SQL Server index is created using the shape of B-Tree structure, that is made up of 8K pages, with each page, in
that structure, called an index node. The B-Tree structure provides the SQL Server Engine with a fast way to move
through the table rows based on index key, that decides to navigate left or right, to retrieve the requested values
directly, without scanning all the underlying table rows. You can imagine the potential performance degradation that
may occur due to scanning large database table.
The B-Tree structure of the index consists of three main levels:
1. Root Level, the top node that contains a single index page, form which SQL Server starts its data search,
2. Leaf Level, the bottom level of nodes that contains the data pages we are looking for, with the number of
leaf pages depends on the amount of data stored in the index,
3. Intermediate Level, one or multiple levels between the root and the leaf levels that holds the index key
values and pointers to the next intermediate level pages or the leaf data pages. The number of intermediate
levels depends on the amount of data stored in the index.
Explain Index Depth, Density and Selectivity factors and how these factors affect index performance?
1. Index depth is the number of levels from the index root node to the leaf nodes. An index that is quite deep
will suffer from performance degradation problem. In contrast, an index with a large number of nodes in
each level can produce a very flat index structure. An index with only 3 to 4 levels is very common.
2. Index density is a measure of the lack of uniqueness of the data in a table. A dense column is one that has a
high number of duplicates.
3. Index selectivity is a measure of how many rows scanned compared to the total number of rows. An index
with high selectivity means a small number of rows scanned when related to the total number of rows.
What are the pros and cons of using ONLINE index creation or rebuilding options?
Setting the ONLINE option to ON when you create or rebuild the index will enable other data retrieving or
modification processes on the underlying table to continue, preventing the index creation process from locking the
table. On the other hand, the ONLINE index creation or rebuilding process will take longer time than the offline
default index creation process.
Which type of indexes are used to maintain the data integrity of the columns on which it is created?
Unique Indexes, by ensuring that there are no duplicate values in the index key, and the table rows, on which that
index is created.
How could we benefits from a Filtered index in improving the performance of queries?
It uses a filter predicate to improve the performance of queries that retrieve a well-defined subset of rows from the
table, by indexing the only portion of the table rows. The smaller size of the Filtered index, that consumes a small
amount of the disk space compared with the full-table index size, and the more accurate filtered statistics, that cover
the filtered index rows with only minimal maintenance cost, help in improving the performance of the queries by
generating a more optimal execution plan.
How can you find the missing indexes that are needed to potentially improve the performance of your queries?
The Missing Index Details option in the query execution plan, if available.
The sys.dm_db_missing_index_details dynamic management view, that returns detailed information about missing
indexes, excluding spatial indexes,
Why is an index described as a double-edged sword?
A well-designed index will enhance the performance of your system and speed up the data retrieval process. On the
other hand, a badly-designed index will cause performance degradation on your system and will cost you extra disk
space and delay in the data insertion and modification operations. It is better always to test the performance of the
system before and after adding the index to the development environment, before adding it to the production
environment.
CXPACKET—This wait type is involved in parallel query execution and indicates the SPID is waiting on a parallel
process to complete or start. Excessive CXPacket waits may indicate a problem with the WHERE clause in the query.
Look at researching and changing Maximum Degree of Parallelism (MAXDOP).
DTC—This wait type is not on the local system. When using Microsoft Distributed Transaction Coordinator (MS-DTC),
a single transaction is opened on multiple systems at the same time, and the transaction cannot be concluded until
it’s been completed on all of the systems.
NETWORKIO—The async_network_io (in SQL 2005/2008) and networkio (in SQL 2000) wait types can point to
network-related issues, but most often are caused by a client application not processing results from the SQL Server
quickly enough.
OLEDB—This wait type indicates that a SPID has made a function call to an OLE DB provider and is waiting for the
function to return the required data. This wait type may also indicate the SPID is waiting for remote procedure calls,
linked Server queries, BULK INSERT commands, or full-search queries.
PAGEIOLATCH_*—Buffer latches, including the PAGEIOLATCH_EX wait type, are used to synchronize access to BUF
structures and associated pages in the SQL Server database. The most frequently occurring buffer latching situation
is when SQL Server is waiting to read a data file page or workload from storage. These pages and workloads were not
cached in memory and need to be retrieved from the disk. Additional memory will help prevent pages from getting
pushed out.
SOS_SCHEDULER_YIELD—SQL Server instances with high CPU usage often show the SOS_SCHEDULER_YIELD wait
type. This doesn’t mean the Server is underpowered; it means further research is needed to find which individual
task in a query needs more CPU time. Check the Max Degree of Parallelism (MAXDOP) to ensure proper core usage.
Ensure high CPU performance from both within Windows and the system BIOS.
WRITELOG—When a SQL Server session waits on the WRITELOG wait type, it’s waiting to write the contents of the
log cache (user delete/update/inserts operations) to the disk where the transaction log is stored and before telling
the end user his or her transactions have been committed. Disabling unused indexes will help, but the disk is the
bottleneck here, and the transition log should be moved to more appropriate storage.
SQL SERVER – ACID (Atomicity, Consistency, Isolation, Durability)
ACID (an acronym for Atomicity, Consistency Isolation, Durability) is a concept that Database Professionals generally
look for when evaluating databases and application architectures. For a reliable database all these four attributes
should be achieved.
Atomicity is an all-or-none proposition.
Consistency guarantees that a transaction never leaves your database in a half-finished state.
Isolation keeps transactions separated from each other until they’re finished.
Durability guarantees that the database will keep track of pending changes in such a way that the Server can
recover from an abnormal termination.
Above four rules are very important for any developers dealing with databases.
Atomicity: – The atomicity acid property in SQL. It means either all the operations (insert, update, delete) inside a
transaction take place or none. Or you can say, all the statements (insert, update, delete) inside a transaction are
either completed or rolled back.. Every transaction follows atomicity model, which means that if a transaction is
started, it should be either completed or rollback. To understand this lets take above example, if person is
transferring amount from account “A” to account “B”, it should be credited to account B after completing the
transaction. In case if any failure happens, after debiting amount from account “A” , the change should be rollback.
Consistency: - This SQL ACID property ensures database consistency. It means, whatever happens in the middle of
the transaction, this acid property will never leave your database in a half-completed state. If the transaction
completed successfully, then it will apply all the changes to the database.If there is an error in a transaction, then all
the changes that already made will be rolled back automatically. It means the database will restore to its state that it
had before the transaction started. If there is a system failure in the middle of the transaction, then also, all the
changes made already will automatically rollback. It says that after the completion of a transaction, changes made
during the transaction should be consistent. Let’s understand this fact by referring the above example, if account “A”
has been debited by 200 RS then after completion of transaction account “B” should be credited by 200 RS. It means
changes should be consistent.
Isolation: - Every transaction is individual, and One transaction can’t access the result of other transactions until the
transaction completed. Or, you can’t perform the same operation using multiple transactions at the same time.
Isolation states that every transaction should be isolated with each other, there should not be any interference
between two transactions.
Durability: - Once the transaction completed, then the changes it has made to the database will be permanent. Even
if there is a system failure, or any abnormal changes also, this SQL acid property will safeguard the committed
data.Durability means that once the transaction is completed, all the changes should be permanent, it means that in
case of any system failure, changes should not be lost.
Before further proceed with Isolation level we have to clear understanding about two things
Dirty Reads: This is when we read uncommitted data, when doing this there is no guarantee that data read will ever
be committed.
Phantom Reads: This is when data that we are working with has been changed by another transaction since you first
read it in. This means subsequent reads of this data in the same transaction could well be different.
To Check the Current Isolation Level
DBCC useroptions
READ COMMITTED is the default isolation level for SQL Server. It prevents dirty reads by specifying that statements
cannot read data values that have been modified but not yet committed by other transactions. If the
READ_COMMITTED_SNAPSHOT option is set as ON, the Read transactions need not wait and can access the last
committed records. Other transactions can modify, insert, or delete data between executions of individual SELECT
statements within the current transaction, resulting in non-repeatable reads or phantom rows.
REPEATABLE READ is a more restrictive isolation level than READ COMMITTED. It encompasses READ COMMITTED
and additionally specifies that no other transactions can modify or delete data that has been read by the current
transaction until the current transaction commits. Concurrency is lower than for READ COMMITTED because shared
locks on read data are held for the duration of the transaction instead of being released at the end of each
statement. But Other transactions can insert data between executions of individual SELECT statements within the
current transaction, resulting in phantom rows.
The highest isolation level, serializable, guarantees that a transaction will retrieve exactly the same data every time
it repeats a read operation, but it does this by performing a level of locking that is likely to impact other users in
multi-user systems.
SNAPSHOT isolation specifies that data read within a transaction will never reflect changes made by other
simultaneous transactions. The transaction uses the data row versions that exist when the transaction begins. No
locks are placed on the data when it is read, so SNAPSHOT transactions do not block other transactions from writing
data. Transactions that write data do not block snapshot transactions from reading data. If no waiting is acceptable
for the SELECT operation but the last committed data is enough to be displayed, this isolation level may be
appropriate.
In a Cartesian Join, all possible combinations of the variables are created. In this example, if we had 1,000 customers
with 1,000 total sales, the query would first generate 1,000,000 results, then filter for the 1,000 records where
CustomerID is correctly joined. This is an inefficient use of database resources, as the database has done 100x more
work than required. Cartesian Joins are especially problematic in large-scale databases, because a Cartesian Join of
two large tables could create billions or trillions of results.
To prevent creating a Cartesian Join, use INNER JOIN instead:
SELECT Customers.CustomerID, Customers.Name, Sales.LastSaleDate
FROM Customers
INNER JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
The database would only generate the 1,000 desired records where CustomerID is equal.
Some DBMS systems are able to recognize WHERE joins and automatically run them as INNER JOINs instead. In those
DBMS systems, there will be no difference in performance between a WHERE join and INNER JOIN. However, INNER
JOIN is recognized by all DBMS systems.
This query would pull 1,000 sales records from the Sales table, then filter for the 200 records generated in the year
2016, and finally count the records in the dataset.
In comparison, WHERE clauses limit the number of records pulled:
This query would pull the 200 records from the year 2016, and then count the records in the dataset. The first step in
the HAVING clause has been completely eliminated.
HAVING should only be used when filtering on an aggregated field. In the query above, we could additionally filter
for customers with greater than 5 sales using a HAVING statement.
SELECT Customers.CustomerID, Customers.Name, Count(Sales.SalesID)
FROM Customers
INNER JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
WHERE Sales.LastSaleDate BETWEEN #1/1/2016# AND #12/31/2016#
GROUP BY Customers.CustomerID, Customers.Name
HAVING Count(Sales.SalesID) > 5
How to shrink transaction log file grow and tools used in SQL Server.
It is always better to be a proactive database administrator and keep an eye on the SQL Server Transaction Log file
growth, in order to prevent the issues when having the log file running out of free space for a long time. Rather than
sleeping beside the Server, you can use a monitoring tool such as the System Center Operations Manager (SCOM)
tool, Performance Monitor counters, or simply create an alert that reads from one of the system catalog views and
notify an operator by email when the free space of the SQL Transaction Log file becomes under a predefined
threshold.
sys.dm_db_log_space_usage is a dynamic management view, introduced in SQL Server 2012, that is used to return
space usage information for the transaction log. The below query can be used to check the free space percentage in
the SQL Transaction Log file of the current database:
And the result will be like:
If the result returned from the previous query falls down a predefined threshold, before running the SQL Server
Transaction Log file out of free space, the DBA will be notified by an email, SMS or call based on the monitoring tool
used in your entity.
Cause
Transaction log files store all of the transactions and the database modifications made by these transactions as
updates and changes to the database occur. As more and more transactions modify the database and there is more
transaction activity then maintenance activity on the transaction log, these logs will grow. In particular, if there was
some unusual amount of activity on the database, this will cause the logs to grow quickly. Another possible cause,
could be the Automatic grow file option has been set too high such that the transaction log file will grow by x%
resulting in an exponential file growth.
How to transfer user credentials from source to destination Server in SQL Server
Method 1:
1. Reset the password on the destination SQL Server computer (Server B)
2. To resolve this issue, reset the password in SQL Server computer, and then script out the login.
Method 2:
1. Transfer logins and passwords to destination Server (Server B) using scripts generated on source Server
(Server A)
2. Create stored procedures that will help generate necessary scripts to transfer logins and their passwords. To
do this, connect to Server A using SQL Server Management Studio (SSMS) or any other client tool and run the
required script:
1. Setup SQL agent jobs like backup failed alert, blocking alert, db creation or deletion alert, login creation or
deletion alert, monitor disk space, monitor log space etc. to have point in time monitoring
2. Regular backup restore drills to ensure recovery time
3. regular analysis of queries for performance optimization
There are many things you can plan as per your environment.
2: Take Note of Monitor Index Usage :Querying the sys.dm_db_index_operational_stats() DMV can yield a plethora
of information regarding your index usage. Use this information to find ways to ensure a smoother user experience.
3: Locate Problem Queries: Up to 80 to 90 percent of poor SQL Server performance is the result of five to ten
queries or stored procedures. This is common for most SQL Server instances. So, if your Server performance is down,
be sure to check your queries and procedures.
4: Utilize the Tools Available: Many performance issues can be resolved using the tools available from Microsoft.
SQL Server comes with Dynamic Management Views or DMVs, SQL Server Profiler, Extended Events, Execution Plans,
and newer versions also contain Live Query Stats and the Query Store tools that provide an arsenal to performance
issues.
5: Find I/O Choke Points: I/O stalls are one of the most common reasons for SQL Server performance issues. Find
any I/O bottlenecks and fix them.
6: Refuse Hardware: Perhaps it's no surprise that one of the biggest IT expenses for most businesses is memory and
CPU hardware for SQL Server instances. Any applications that don't utilize stored procedures or correctly define
queries place extreme load on the Server.
7: Avoid Shrinking Data Files: Contrary to popular belief, shrinking data files should be avoided at all costs, as it can
significantly impact SQL Server performance in a negative way. Shrinking causes fragmentation to occur and any
following queries to suffer.
8: Monitor Log Files: Transaction log files can be vital to monitoring overall performance of your SQL Server. One of
the most common issues is failing to leave enough space for the transaction log file to function normally. In addition,
forcing an autogrow operation may cause issues with SQL Server performance. Decreasing the log file backup
interval can also prevent log files from growing out of control.
9: Organize Data: Just as programmers keep their code clean, any savvy SQL Server user knows why it’s important to
be diligent when it comes to organizing data and log files onto different physical drive arrays to reduce drive
contention and latency. It is also prudent to locate those files away from the operating system drive to remove single
points of failure.
10: Reduce tempdb Contention:Some applications tend to rely heavily on tempdb. If this is the case for your
organization, you may run into some contention relating to internal structures that work with tempdb file. If you
experience contention, you should increase the number of tempdb data files ensuring that the files are equally sized.
11: Modify MAX Memory Limit: Fortunately for 64-bit users, Microsoft has greatly improved memory allocation and
sharing within the OS and other apps in the SQL Server. As a result, the MAX Memory setting being set at default is
not ideal. Don't be excessive when setting the memory on SQL Server; you want the OS and other applications to
have access to this memory. The ideal practice is to set the MAX memory setting to 80-90% of the available memory
on the Server, if there are no other major applications residing on the Server.
SCOM vs SCCM
Both SCOM and SCCM look like named twins with different traits and roles. SCCM is a system center configuration
manager which is also called configuration manager is a kind of tool which helps administrators with a way to
manage all the aspects of a business. SCOM is a system center operations manager also named as operational
manager, with a single interface which displays the crucial piece of our IT environment. In this blog post we are going
to discuss what is SCOM, why SCOM, What is SCCM, Why SCCM, comparison of SCOM vs SCCM, benefits and
disadvantages of SCOM and SCCM, etc.
What is SCOM?
System center operation manager uses a single interface, which shows state, health and information of the
computer system. It creates alerts generated based on availability, configuration, and security situation being
identified, It works with the help of Unix box host and Microsoft windows Server, it refers to a set of filtering rules
which are specific for some monitored applications. It provides customer management packs, when an administrator
role is needed to install agents and create management packs, they are given rights to simply view the list of recent
alerts for any valid user account.
Why is SCOM?
It is employed to connect many various separately managed groups of a central location. Its basic idea is to put a
piece of software on the computer, which may be useful for monitoring. Its agent forwards the alert to a central
Server, during alert occurrence and detection. Its Server application maintains a database which includes a history of
alerts.
What is SCCM?
System center configuration manager is a product from Microsoft, which enables the management, deployment,
security of devices and applications of an enterprise. Administrators commonly used it for patch management,
endpoint protection, and software distribution, it is a part of Microsoft system center systems management suite. It
integrates the console, which enables management of Microsoft applications such as application virtualization. It
relies on single infrastructure, which has a goal of unifying physical and virtual clients under one roof, it adds tools to
help IT administrators with the help of access control.
Why is SCCM
It discovers desktops, Servers and mobile devices that are connected to a network, with the help of the active
dictionary and installs clients software on each node. Then it manages application deployment on a device basis,
which allows it for automated patching with the help of windows Server update services and policy enforcement
with the help of network access protection. Its endpoint protection manager is built into the system center
configuration manager, which helped to secure data stored on devices.
2. SCCM:
Microsoft’s System Center Configuration Manager or “Configuration Manager,” as all call it like, is a tool which offers
administrators along with a way to manage all of aspects of an organization’s Windows, that are based desktops,
Servers, and devices which are from a single hub.
SCOM: The Microsoft’s System Center Operations Manager , or “Operations Manager,” is as useful as Microsoft’s
System Center Configuration Manager. It is a monitoring tool, which provides a look at the health and performance
of all our services of IT in one spot.
3. SCCM:
The Configuration Manager’s unified infrastructure pulls all of the organization’s clients of physical, virtual, and
mobile under a single large umbrella. It contains tools and resources that provide administrators the ability to
control the access within the cloud and on site. Its Administrators may grant end users access for the devices and
applications they required without the worry of compromised security.
SCOM: The Operations Manager may monitor performance of both Server and client applications, and it may provide
us the information about the health of our services across both datacenter and cloud infrastructures.
4. SCCM:
Unified design of Configuration Manager may help us to do many essential jobs within single platform
1. Deploy software
2. Protect data
3. Manage updates
4. Monitor system health
5. Enforce organizational compliance
SCCM can automate the associated tasks so we may be sure that they happen on the schedule that we set and
within our organization’s policies and standards.
SCOM: Operations Manager with a single interface, shows the administrators a view of the vital pieces of our IT
environment all at a time.
Status
Health
Security
Performance
Configuration
5. Both the SCCM and SCOM are just two components in a product of a large family, which help the administrators to
manage the vast array of applications and services which may be found in a business. SCCM may assist us with the
ongoing tasks that are related to keeping our infrastructure secure and up-to-date. SCOM will monitor our devices
and services and then share the information about them that we require. Both the SCOM and SCCM are distinct but
they are complementary pieces of a productive and safe IT landscape
Advantages of SCOM
1. SCOM is user friendly, which doesn't need any maintenance. It has availability of monitoring that looks most
valuable, it also has required capacity and ability to send notifications.
2. Its solutions save you a lot of work by reducing the effort that is required to start monitoring, which creates a
lot of headaches in other places to manage your software and data centers. It also helps our applications
teams, which allow them to drill remaining into issues that perform root cause analysis.
3. It improves our organizations that simplifies the monitoring process, it reports when there is a connectivity
issue which needs to be fixed and easier to concrete on which one needs to get fixed.
4. Extensibility is one of the most valuable features, that has no limits of what you can do with it. It allows to
standardize all the reports for monitoring the network, which helps a lot for auditing purposes.
Disadvantages of SCOM
1. It lacks details of certain products like granular access and application monitoring, it needs Microsoft
provided management packs with products.
2. We may have difficulties integrating Linux into some networking devices, there is a need for drastic al
improvement in on prem network monitoring. It also has issues with capacity and limited space which needs
some improvement.
3. It needs more standard libraries for the market solutions, which are from out of the box, which you don't
need to do a lot of work on.
Advantages of SCCM
1. Graphical based reports of software updates is the most valuable feature, that have been almost successful
except rare cases like security breaches. And saves a lot of money by allowing you to install things
automatically in an exact way on every computer.
2. It updates the patches and updates from windows, one of the main valuable features which we can utilize a
lot. To become more competitive in the marketplace, they are investing a lot in new features.
3. It has a very straightforward initial setup which is not that much complicated, and provides integration
between products. It is a good choice of deployment which performs very well, scalability also another best
feature provided by it.
Disadvantages of SCCM
1. With the help of Microsoft premier support, you may get what you pay for. You will get excellent support by
paying for it. The on-screen display of the main room needs some improvement.
2. We may face a lot of problems initially with its complex set up, for those who are new for this solution. And
the deployment process is lengthy which should be quicker to complete.
3. There is a chance for virus attacks through adding new applications suddenly and silently, and incase of
software installation version flaw all users of it are affected. During software installation there is no
immediate notice for failure until the user gets a pop up warning.
Conclusion
SCOM and SCCM both are a part of the Microsoft system family, which are strictly different but they are
complementary components of safe and productive IT infrastructure. They are part of a large family of products,
which assist the admin that manage a large variety of applications and services,that can be found in
organizations.SCCM can help you to manage ongoing tasks related to maintenance of infrastructure security,SCOM
helps you to monitor the services and devices to share the information regarding them to you as per your needs and
requirements.They both have their own distinctive traits and roles.
Type in a Server Name, by default the Server you are currently on will be populated in the name. You are able to set
up MDW on a separate Server and do the data collection for another Server.
Click on New next to Database Name. A pop-up will appear to create a database for the MDW to collect and write
the data to. Type in Database Name and click ok.
The Database Name will now be populated and you can click on next.
Select a user to map to the MDW and set the Database membership role and click Next.
A summary window for the configuration will display and you can click Finish.
Next we need to set up a data collection to collect data from the Server and databases you are interested in getting
information from. Right click on the Data Collection node again and select Configure Management Data Warehouse.
This time you will select Set up data collection and click Next.
Select the Server and Database where you want to store the data and click Next.
A summary screen will display and click Finish. Make sure SQL Server Agent is running, otherwise the setup will fail.
Once complete you will see the Data Collection sets in the Object Explorer
To access the reports, you will right-click on the Data Collection node > Reports > Management Data Warehouse
What is review of KEDB and SOP in SQL Server?
The Known Error Database (KEDB) is created by Problem Management and used by Incident and Problem
Management to manage all Known Error Records.
There are three ITIL terms that you need to be familiar with to understand KEDB. These include
1. Incident
2. Problem
3. known error.
When you face an unplanned interruption to an IT service, it is referred to as an incident. For example, if your email
service goes down without notice from your provider, this could be tagged as an incident.
A problem is the underlying cause of an incident. Simply put, this is the thing that caused the issue in the first place.
In the example above, the reason behind the email outage is the problem. Let's say that the root cause of a problem
is identified. Now it's no longer a problem, but a known error. For the email incident, the root cause is identified as
one of the critical services on the email Server which was in hung mode. So, what was once a problem is now a
known error.
Why do you need KEDB? How exactly does a business justify expending capital and operational costs on the
database?
Getting back to the email incident, let's say that the critical service was in hung mode after running a number of
diagnostics and carrying out a series of tests. After identifying it, the resolution might have been quicker where the
service was stopped and restarted. But to get to the resolution, it took plenty of effort and, more importantly, cut
into some precious time. While the diagnostics and resolution were being applied, the email service was down. This
could result in penalties imposed by customers, and intangible losses like future business opportunities and
customer satisfaction.
However, this organization that provides email services to its customers maintains a KEDB, and this particular
incident was recorded. When the email service goes down again, the technical support team can simply refer to the
previous outage in the KEDB, and can start diagnosis with the service that caused the issue last time. If it happens to
be the same service causing the issue, resolution now happens within fraction of the time. As you can see, this
greatly reduces downtime and all other negative effects that stem from service outages. This is KEDB in action!
A KEDB record will have details of the incident, when the outage happened and what was done to resolve it.
However, for a speedy resolution, the KEDB must be powerful enough to retrieve relevant records using filters and
search keywords. Without a KEDB in place, service management organizations tend to reinvent the wheel time and
again, rather than working toward building a mature organization that allocates its funds toward improving services.
Workaround and permanent solution.
When there are service outages, there are two ways of restoring them. The first, and most ideal, is a permanent
solution. A permanent solution entails a fix that guarantees no more outages, at least on a certain level. The second,
and most common, type of restoration is the workaround, which looks for a temporary, alternate solution. A
workaround is generally followed by identifying and implementing a permanent solution at a later date.
In the email service outage, restarting the service is a workaround. The technical staff knows that this will solve the
issue for the moment (which is of vast importance), but that it is bound to repeat in the future. Before the incident
occurs again, it's on the technical team to investigate why the service is unresponsive and to find a permanent
solution.
Let's look at another classic example that I have used time and again during trainings – this one really drives home
the concepts of workaround and permanent solution. Imagine that the printer in your cabin stops working and you
need it right away. You log an incident with your technical staff, stating that you are about to get into a client
meeting and you need to print some documents. The support person determines that he is unable to fix the printer
in time and provides you a workaround to send your files to a common printer in the foyer.
The workaround helps, as your objective is to get the prints and run into a meeting. But, there's no way you want the
hassle of having to do this every time you need to print. So, when the meeting is over, you push for a permanent
solution. When you return, your printer is working and there is a note from the support staff stating that the power
cable was faulty and has been replaced. This is a permanent solution. And while there's a chance that the new cable
could also go faulty, the odds are in your favor.
Why did I discuss workaround and permanent solution on a post that is aimed at KEDB?
Known errors exist because the fix is temporary. The known error database consists of records where a permanent
solution does not exist, but a workaround does. For a known error record, if a permanent solution was to be
implemented, then the record can be expunged or archived for evidence. Known error records with an implemented
permanent solution must not be a part of the KEDB in principle.
This concept is further built upon in the next section where we'll talk about the various process trees for creating,
using and archiving known error records.
2. When an incident is reported, the support team refers to the KEDB first to check if a workaround exists in the
database. If it does, they will refer to the known error record and follow the resolution steps involved. Suppose the
fix provided is inaccurate, the support staff can recommend alternate resolution steps to ensure that KEDB is high on
quality.
Let's say that at another time and place, MS Outlook application starts to crash. The technical staff can refer to the
KEDB to check what was done on previous occasions, and can recommend the workaround to the customer until a
permanent solution is in place.
3. If a permanent solution to a known error is identified and implemented, the incident must not happen anymore.
So, the known error record is either taken out of the KEDB or archived with a different status. This is done to ensure
that the database is optimized with only the known errors, and accessing records does not become cumbersome due
to a high volume of known error records.
While the user is accessing email service via webmail, the issue is being investigated to identify that a Bluetooth
extension is causing the outlook to crash. The permanent solution is to disable the extension or even uninstall it. This
solution is implemented not only on the Outlook that crashed, but on all the systems accessing Outlook, to ensure
the same incident doesn't happen again. After implementing and testing the permanent solution, the known error
record can either be archived with a pre-defined status or deleted.
This technology was designed to have the entire encryption process be completely transparent to the applications
accessing the database. It does this by using either Advanced Encryption Standard (AES), or Triple DES, encrypting
the file pages and then decrypted as the information goes into memory. This inhibits limitations from querying the
data in an encrypted database. This is essentially real time I/O encryption and decryption and does not increase the
size of said database.
Also note, that as a result of Transparent Data Encryption, database backups will also be encrypted. In the event that
a backup of the database gets lost or stolen, the culprit will not be able to restore the database without the
appropriate certificate, keys and passwords.
Also, the TempDB database will be automatically encrypted. Since the tempdb is used by all user databases
(processing/storing temporary objects). You shouldn’t notice much of a difference in how Transparent Data
Encryption operates, but this is good to know and often overlooked. What good is an encrypted database if the data
placed in TempDB isn’t encrypted?
If you’re a DBA there is a very strong chance that you are in charge of securing some very sensitive information.
First we must determine the correct version of SQL Server that allows Transparent Data Encryption. I like to call it an
expensive feature as it requires Enterprise Editions. It also works with Developer Edition, but of course, this is just for
testing and development purposes. When implementing this in a production environment you must have the correct
version of SQL Server. I’ve listed the eligible editions below.
SQL 2016 Evaluation, Developer, Enterprise
SQL 2014 Evaluation, Developer, Enterprise
SQL Server 2012 Evaluation, Developer, Enterprise
SQL Server 2008 R2 Datacenter, Evaluation, Developer, Enterprise, Datacenter
SQL Server 2008 Evaluation, Developer, Enterprise
Now let’s have a quick overview of the Transparent Data Encryption architecture and hierarchy. First we have the
Windows Operating System Level Data Protection API, which decrypts the Service Master Key found in the SQL
Server instance level. The Server Master Key is created at the time of the initial SQL Server instance setup. From
there we go the database level. The Service Master Key encrypts the database Master Key for the master database.
The database master key creates a certificate in the master database. Keep in mind that you must create a backup of
this certificate. Not only for environmental refreshes but disaster recovery purposes. Once Transparent Data
Encryption is enabled on the database you won’t be able to restore or move it another Server unless this same
certificate has been installed. Keep good (and secure records) of the certificate and password. The certificate is then
used to enable encryption at the database level, thus creating the database encryption key.
Dynamic Data Masking is a new security feature introduced in SQL Server 2016 that limits the access of unauthorized
users to sensitive data at the database layer.
As an example of the need for such a feature is allowing the applications developers to access production data for
troubleshooting purposes and preventing them from accessing the sensitive data at the same time, without affecting
their troubleshooting process. Another example is the call center employee who will access the customer’s
information to help him in his request, but the critical financial data, such as the bank account number or the credit
card full number, will be masked to that person.
Dynamic Data Masking, also known as DDM, is a very simple security feature that can be fully built using T-SQL
commands which we are familiar with, easy to use and also flexible to design. This data protection method allows
you to determine your “sensitive” data, by field in order to configure the suitable masking function to hide it from
queries. This feature requires no coding effort from the application side or encrypting or applying any change to the
real data stored in the disk.
Dynamic Data Masking masks the sensitive data “on the fly” to protect sensitive data from non-privileged users using
built-in or customized masking functions, without preventing them from retrieving the unmasked data.
To implement DDM, first, you need to specify your sensitive data, the role to mask it and specify designated
privileged users that have access to that sensitive data. The next step is to select and implement a masking function
Masking functions
There are four main types of masking functions that can be configured in Dynamic Data Masking, which we will
introduce briefly here and use in the demo later.
The first type is the Default function that masks the data according to the field data type; if the field data type is
binary, varbinary or image, a single byte of binary value 0 will be used to mask that field. For the date and time data
types, the 01.01.1900 00:00:00.0000000 value will be used to mask that date field. If the data type of the masked
field is one of the numeric data types, a zero value will be used to mask that field. For the string data types, XXXX
value will be used to mask that field. If the field length is less than 4 characters, less number of Xs will be used to
mask its value.
The second masking method is the Email function that is used to mask the fields that store the email address. The
Email function shows only the first character of the email address and mask the rest of the email, same as
[email protected].
The Random masking function is used to mask any numeric data type by replacing the original value with a random
value within the range specified in that function.
The last masking type is the Custom function, that allows you to define your own mask for the specified field by
exposing the first and last letters defined by the prefix and suffix and add a padding that will be shown in the middle
in the form of prefix, [padding value], suffix, taking into consideration that part of the prefix or the suffix will not be
exposed if the field’s original value is too short to be masked.
Like any feature in SQL Server, there are number of limitations for the Dynamic Data Masking feature, where you
can’t define the DDM on an encrypted column, a column with FILESTREAM, COLUMN_SET or Sparse Column that is
part of the Column_Set, Computed column or a key column in a FULLTEXT index. Also, if the column to be masked is
part of an index or any type of dependencies, we should drop that dependency, configure the DDM on that column
then create the dependency again. Also, Dynamic Data Masking will not prevent privileged users from altering the
masked column or modifying the masked data, though.
How it works
Let’s start our demo to understand how to configure the Dynamic Data Masking feature practically and how it works.
Assume that we need to mask the employees’ critical data in order to prevent the developer who is responsible for
developing and troubleshooting that system from viewing that sensitive data. First, we will create the
Employee_Financial table where we will store the critical data:
Let us create a table with relevant masking functions as shown in the below table script.
Following is the dataset that was inserted from the above script.
Let us create a user to demonstrate data masking which has the SELECT permissions to the created data.
The following is the output as you can see that data is masked.
When you need to provide the UNMASK permissions to the above user.
GRANT UNMASK TO MaskUser
Then the user will be able to view the data. You cannot grant column level UNMASK permissions to the users. If you
want to hide the data, then you can grant DENY permissions for the relevant column for the user. However, there
are no special permissions needed to create masked columns. You only need the CREATE TABLE permissions and
ALTER column permission to create or alter mask column.
Further, if you want to find out what are the masked columns in the database, you can use the following script.
SELECT OBJECT_NAME(OBJECT_ID) TableName,
Name ,
is_masked,
masking_function
FROM sys.masked_columns
SQL Server indexes are created on a column level in both tables and views. Its aim is to provide a “quick to locate”
data based on the values within indexed columns. If an index is created on the primary key, whenever a search for a
row of data based on one of the primary key values is performed, the SQL Server will locate searched value in the
index, and then use that index to locate the entire row of data. This means that the SQL Server does not have to
perform a full table scan when searching for a particular row, which is much more performance intensive task –
consuming more time and using more SQL Server resources.
Relational indexes can be created even before there is data in the specified table, or even on tables and views in
another database.
Even so, these automatic modifications will continuously scatter the information in the index throughout the
database – fragmenting the index over time. The result – indexes now have pages where logical ordering (based on
the key-value) differs from the physical ordering inside the data file. This means that there is a high percentage of
free space on the index pages and that SQL Server has to read a higher number of pages when scanning each index.
Also, ordering of pages that belong to the same index gets scrambled and this adds more work to the SQL Server
when reading an index – especially in IO terms.
The Index fragmentation impact on the SQL Server can range from decreased efficiency of queries – for Servers with
low-performance impact, all the way to the point where SQL Server completely stops using indexes and resorts to
the last-straw solution – full table scans for each and every query. As mentioned before, full table scans will
drastically impact SQL Server performance and this is the final alarm to remedy index fragmentation on the SQL
Server.
The solution to fragmented indexes is to rebuild or reorganize indexes. But, before considering maintenance of
indexes, it is important to answer two main questions:
Detecting fragmentation
Generally, in order to solve any problem, it is essential to first and foremost locate it, and isolate the affected area
before applying the correct remedy.
Fragmentation can be easily detected by running the system function sys.dm_db_index_physical_stats which returns
the size and the fragmentation information for the data and indexes of tables or views in SQL Server. It can be run
only against a specific index in the table or view, all indexes in the specific table or view, or vs. all indexes in all
databases:
The results returned after running the procedures include following information:
After the fragmentation has been detected, the next step is to determine its impact on the SQL Server and if any
course of action needs to be taken.
There is no exact information on the minimal amount of fragmentation that affects the SQL Server in a specific way
to cause performance congestion, especially since the SQL Server environments greatly vary from one system to
another.
However, there is a generally accepted solution based on the percent of fragmentation
(avg_fragmentation_in_percent column from the previously described sys.dm_db_index_physical_stats function)
Fragmentation is less than 10% – no de-fragmentation is required. It is generally accepted that in majority of
environments index fragmentation less than 10% in negligible and its performance impact on the SQL Server is
minimal.
Fragmentation is between 10-30% – it is suggested to perform index reorganization
Here is the reasoning behind the thresholds above which will help you to determine if you should perform index
rebuild or index reorganization:
Index reorganization is a process where the SQL Server goes through the existing index and cleans it up. Index
rebuild is a heavy-duty process where an index is deleted and then recreated from scratch with an entirely new
structure, free from all piled up fragments and empty-space pages.
While index reorganization is a pure cleanup operation that leaves the system state as it is without locking-out
affected tables and views, the rebuild process locks the affected table for the whole rebuild period, which may result
in long down-times that could not be acceptable in some environments.
With this in mind, it is clear that the index rebuild is a process with a ‘stronger’ solution, but it comes with a price –
possible long locks on affected indexed tables.
On the other side, index reorganization is a ‘lightweight’ process that will solve the fragmentation in a less effective
way – since cleaned index will always be second to the new one fully made from scratch. But reorganizing index is
much better from the efficiency standpoint since it does not lock the affected indexed table during the course of
operation.
Servers with regular maintenance periods (e.g. regular maintenance over weekend) should almost always opt for the
index rebuild, regardless of the fragmentation percent, since these environments will hardly be affected by the table
lock-outs imposed by index rebuilds due to regular and long maintenance periods.
How to reorganize and rebuild index: Using SQL Server Management Studio:
1. In the Object Explorer pane navigate to and expand the SQL Server, and then the Databases node
2. Expand the specific database with fragmented index
3. Expand the Tables node, and the table with fragmented index
4. Expand the specific table
5. Expand the Indexes node
6. Right-click on the fragmented index and select Rebuild or Reorganize option in the context menu (depending on the
desired action):
We can get details about any particular statistics as well. Right-click on the statistics and go to properties. It opens
the statistics properties and shows the statistics columns and last update date for the particular statistics.
Click on the Details, and it shows the distribution of values and the frequency of each distinct value occurrence
(histogram) for the specified object.
T-SQL to view SQL Server Statistics
We can use DMV sys.dm_db_stats_properties to view the properties of statistics for a specified object in the current
database.
Execute the following query to check the statistics for HumanResources.Employee table.
SELECT sp.stats_id,
name,
filter_definition,
last_updated,
rows,
rows_sampled,
steps,
unfiltered_rows,
modification_counter
FROM sys.stats AS stat
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
WHERE stat.object_id = OBJECT_ID('HumanResources.Employee');
In the following screenshot, we can verify that all the stats get an update at the same time.
It updates only specific statistics. In the following screenshot, we can verify this.
We should not specify 0 PERCENT or 0 Rows to update the statistics because it just updates the statistics
object, but it does not contain statistics data
We cannot use FULL SCAN and SAMPLE together
We should use SAMPLE under specific requirements only. We might take less sample size, and it might not
be suitable for the query optimizer to choose the appropriate plan
We should not disable the auto-update statistics even we are regularly updating the SQL Server Statistics.
Auto Update Statistics allows SQL Server to automatically update stats according to the predefined threshold
Updating statistics with FULL SCAN might take longer for an extensive data set object. We should plan it and
do it off business hours
We usually perform database maintenance such as index rebuild or index reorganize. SQL Server automatically
updates the statistics after the index rebuild. It is equivalent to update statistics with FULL SCAN however; it does not
update the column statistics. We should update column statistics after index rebuild as well. We can use the
following queries to do the task for all statistics on a specified object.
SQL Server does not update statistics with the index reorganize activity. We should manually update the statistics, if
required or need to rely on the automatically updated statistics.
Select the Update Statistics maintenance task from the list of tasks.
Click Next, and you can define the Update Statistics task.
In this page, we can select the database (specific database or all databases), objects (specific or all objects). We can
also specify to update all, column or index statistics only.
We can further choose the scan type as a Full Scan or sample by. In the Sample by, we need to specify the sample
percentage or sample rows as well.
CMDBs capture attributes of the CIs, including CI importance, CI ownership and CI identification code. A CMDB also
provides details about the relationships (dependencies) between CIs, which is a powerful tool if used correctly. As a
business enters more CIs into the system, the CMDB becomes a stronger resource to predict changes within the
organization. For example, if an outage occurs, IT can understand through the CI data who or which systems will be
affected.
Another benefit of a CMDB is the ability to integrate data from another vendor's software, reconcile that data,
identify any inconsistences within the database and then ensure all data is synchronized. A CMDB system can also
integrate other configuration-related processes, such as change management and incident management, to better
manage the IT environment.
Once implemented, an initial challenge is to import all relevant data into the CMDB. This can be a tedious task, as
admins must input a wealth of information about each IT asset, including financial information, upgrade history and
performance profile. Modern CMDB tools offer enhanced discovery capabilities, allowing the tool to find and profile
CIs automatically. However, this data doesn't always come from the same source. In theory, a process called data
federation brings together data from disparate locations to prevent IT from replacing or eliminating other data
systems. In practice, data is dispersed across sources that are not well integrated, which prevents IT managers from
federating data.
Over time, IT must maintain and update the CMDB's data. It's common for a CMDB to fail because IT does not
update the information and, therefore, it becomes stale and unusable.
Recently the term configuration management to reflect the increased use of software-based configurations and
interactions: scripting the configuration of a software stack, container management and Kubernetes, automation
down to the code level, and cloud resources and provisioning. The DevOps universe of technologies and practices --
containers, microservices, infrastructure as code (IaC), source control, package management and release automation
-- has changed what it means to map and track assets' configurations and dependencies. Machine learning (ML) and
AI promise to more quickly and accurately predict the impact of undesirable results from configuration changes and
their propagation
One important thing to note here is that SQL Server can only truncate up to the oldest open transaction. Therefore,
if you are not seeing the expected relief from a checkpoint, it could very well be that someone forgot to commit or
roll back their transaction. It is very important to finalize all transactions as soon as possible.
How can you control the amount of free space in your index pages?
You can set the fill factor on your indexes. This tells SQL Server how much free space to leave in the index pages
when re-indexing. The performance benefit here is fewer page splits (where SQL Server has to copy rows from one
index page to another to make room for an inserted row) because there is room for growth built into the index.
Beware though that re-computing the query statistics causes queries to be recompiled. This may or may not negate
all performance gains you might have achieved by calling update statistics. In fact, it could have a negative impact on
performance depending on the characteristics of the system.
The execution history of the job is displayed and you may choose the execution time (if the job failed multiple times
during the same day). There would information such as the time it took to execute that Job and details about the
error that occurred.
Does Transparent Data Encryption provide encryption when transmitting data across the network?
No, Transparent Data Encryption (TDE) does not encrypt the data during transfer over a communication channel.
Slow operation: All pointers, being moved to/from the page/rows, have to be fixed and the SHRINKFILE operation is
single-threaded, so it can be really slow (the single-threaded nature of SHRINKFILE is not going to change any time
soon)
Recommendations: First, use TRUNCATEONLY to shrink the file. It removes the inactive part of the log and then
performs the shrink operation.
Rebuild/reorganize indexes once the shrink is done so that the fragmentation level is decreased.
What is Collation?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using
rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana
character types, and character width.
SQL Server Replication is based on the “Publish & Subscribe” metaphor. Let us discuss each SQL entity or SQL
component in detail below.
Article
An article is the basic unit of SQL Server consisting of tables, views, and stored procedures. With a filter option, the
article in the SQL replication process can be scaled either vertically or horizontally. It is possible to create multiple
articles on the same object with certain limitations or restrictions.
With the help of the New Publication Wizard, you can navigate the properties of an article and set permissions when
needed. You can set permissions at the time of publication as well, and these are read-only permissions only.
SELECT
Pub.[publication] [PublicationName]
,Art.[publisher_db] [DatabaseName]
,Art.[article] [Article Name]
,Art.[source_owner] [Schema]
,Art.[source_object] [Object]
FROM
[distribution].[dbo].[MSarticles] Art
INNER JOIN [distribution].[dbo].[MSpublications] Pub
ON Art.[publication_id] = Pub.[publication_id]
ORDER BY
Pub.[publication], Art.[article]
Once the article has been created successfully and you want to change some properties, then a new replication
snapshot should be generated. If the article has one or more subscriptions, then all of them should be reinitialized
independently. To list all the articles in SQL Server publication, you can use the following commands.
Read: SSRS Sub Reports and deployment process-How to do it
Publication: A publication is the logical collection of articles within a database. It allows us to define and configure
articles’ properties at a higher level so that they can be inherited from to other articles in the group. An article
cannot be distributed independently, but it needs publication.
EXEC sp_helppublication;
Publisher: This is the source database where SQL replication starts and makes data available for the replication.
Publishers define what they publish through a publication. A publisher can have one or more publications where
each publisher defines a data propagation mechanism by creating multiple replications stored procedures together.
USE Distribution
GO
select * from MSpublications
Distributor: A distributor is a storehouse for replication data associated with one or more publishers. In a few cases,
it acts as the publisher and distributor both. In the case of SQL Server replication, it can be termed as the local
distributor. If it is distributed on a different Server, then it is termed as the remote distributor. Each publisher is
associated with a distribution database and a distributor.
The distribution database identifies and stores the SQL replication status data, publication metadata and acts as a
queue sometimes to move the data from Publisher to Subscribers. Based on the replication model, the distributor is
responsible for notifying Subscriber that the user has subscribed to a publication and article properties are changed.
Also, distribution databases help to maintain data integrity. Each distributor should have a distribution database
consisting of article details, replication metadata, and the data. One distributor can have multiple distribution
databases. However, all publications defined on a single Publisher should use the same distribution database.
EXEC sp_get_distributor
Highlights
1. The Distributor acts as the mediator between a Publisher and the Subscriber.
2. It receives snapshots or published transactions and stores or forwards these publications to Subscribers.
3. It includes a set of 6 system databases including Distribution Database.
Subscriber
This is the destination database where replication ends. A subscriber can subscribe to multiple publications from
multiple publishers. A subscriber can send back the data to the publisher and publish data to other subscribers based
on the replication model and the design.
EXEC sp_helpsubscriberinfo;
Subscriptions
Pull Subscriptions Push Subscriptions
This subscription is created at the Subscriber Server. This subscription is created at the Publisher Server.
In this subscription, the publisher is
In this subscription, the subscriber initiates the replication
responsible for updating all changes to the
instead of a publisher.
subscriber.
Snapshot Agent
1. It is an executable file that helps in preparing snapshot files that contain schema and published table data
and database objects.
2. It usually stores data in the snapshot folder and records the synchronized jobs in a distributed database.
3. Distribution Agent
4. It is mainly used with transactional and snapshot replication.
5. It applies the snapshot to the subscriber and moves transactions from the distribution database to
subscribers.
6. It runs at the distributor to push subscriptions, or it runs at the Subscriber to pull subscriptions.
7. Log Reader Agent
8. It is used with the transactional replication that moves transactions that you want to replicate from the
transaction log on the publisher and the distributed database.
9. Each database has its person Log Reader Agent that runs on the distributor and is able to connect with the
Publisher.
Merge Agent
1. It is used with the snapshot file at the beginner level and transfers incremental data changes that happen.
2. Each merge subscription has a personal Merge Agent that could connect with both Publisher and the
Subscriber.
3. It can capture changes with the help of triggers.
Queue Reader Agent
1. It is used with transactional replication along with the queued update option.
2. It runs at the Distributor and moves changes made at the Subscriber back to the Publisher.
3. In the case of Queue Reader Agent, only one instance exists to service all publications and publishers for an
assigned distribution database.
Snapshot Replication
1. The snapshot replication is used to provide an initial set of database objects for merge or transaction
publications.
2. It can copy and distribute database objects in the same manner as they appear at the current moment.
3. It is used to give an initial set of data for merge and transactional replication.
4. It is used when data scenarios and data refreshes are appropriate.
5. It is used when data is not changing frequently.
6. It is used to replicate a small amount of the data.
7. It is used to replicate lookup tables that don’t change frequently.
8. It keeps data copies of data for a certain time as mentioned by developers.
9. For example, if there is one product Company that is changing the price of its whole products frequently
once or twice a year, replicating the complete snapshot of data, in this case, is highly recommended.
Transactional Replication
SQL Server Transactional replication offers a more flexible solution for databases that change frequently. Here, the
replication agent monitors the publisher for database changes and transmit those changes to the subscribers. You
can schedule these transmissions in a transactional replication either periodically or regularly.
Merge Replication
This replication allows publishers and subscribers to make changes to the database individually. It is possible to work
together for both entities without any network connection. When they have connected again, the merge replication
agent checks both entities for changes and modifies the database accordingly.
If there are some conflicts in changes, then the agent uses one predefined conflict resolution algorithm to check on
the appropriate data. This type of replication is mainly used by laptop users and others who are not continually
connected to the publisher.
Each of the replication types has its own benefits and needs. They are suitable for different database scenarios. You
can choose any one of them based on your replication needs. It is clear at this point that the SQL Server replication
process offers database administrator a powerful tool for managing or scaling databases in an enterprise
environment.
For a complex or bust database application, it will create traffic immediately even if your network links are up to the
date. You are exponentially increasing the workload on each SQL Server, and each Server system must keep changes
and maintain the same on other Servers too. At the same time, the publisher is also consolidating changes.
1 – Selectivity
We can define selectivity as: The degree to which one value can be differentiated within a wider group of similar
values. For instance, when Mario Zagalo was a coach of Brazilian soccer team, one of his hardest jobs was select the
players for each match. Brazil is a huge country and, as you probably know, we have a lot of very good players. So,
the selectivity of good players in Brazil is very low, because there are a lot of them. If I ask to you to select just
players who would make good attackers, you would probably return to me with a lot of guys, and returning with
many players lowers the selectivity.
In database terms, suppose we have a table called Internationally_Renown_Players with a column called Country. If
we write:
SELECT * FROM Internationally_Renown_Players WHERE Country= 'Brazil'
… then we can say that this query is not very selective, because many rows will be returned (Update: Sadly, this
didn’t help us in the 2010 World Cup). However, if we write:
SELECT * FROM Internationally_Renown_Players WHERE Location = 'Tuvalu'
… then this query will be very selective, because it will return very few rows (I know if they have a soccer team, but
they have yet to really make a name for themselves.)
Another good (and broader interest) example would be to have a table called Customers, with columns for Gender
and, say, Passport_Number. The selectivity of the Gender column is very low because we can’t do much filtering with
just the F or M values, and so many rows will be returned by any query using it. by contrast, a query filtering with the
Passport_Number column will always return just one row, and so the selectivity of that column is very high.
2. Density:
The term “density” comes from physics, and is calculated by the dividing the mass of a substance by the volume it
occupies, as represented mathematically below.
Where D = Density, M = Mass and V = Volume.
The explanation can be stretched, as is the case with “Geographic Density” (something we’ve become used to
hearing and understanding); For example, the geographic density of Brazil is calculated by dividing the number of
habitants (or the ‘mass’) by the size of the geographic area (the ‘volume’, which is 187,000,000 people divided by
8,514,215.3 km2, which gives us 21.96 habitants per km2. In SQL Server, we can interpret this as:
The more dense a column is, more rows that column returns for a given query.
Take note that is exactly the opposite of selectivity, for which a higher value means less rows. To calculate the
density of a column, you can run the following query:
SELECT (1.0 / COUNT(DISTINCT <ColumnName>)) FROM <TableName>
… The larger a number that query returns, the more ‘dense’ your column is, and the more duplications it contains.
With this number, the QO can determine two important facts about the frequency of data of a column, and to
explain this better, let’s create an example table.
CREATE TABLE Test(Col1 INT)
GO
DECLARE @i INT
SET @I = 0
WHILE @i < 5000
BEGIN
INSERT INTO Test VALUES(@i)
INSERT INTO Test VALUES(@i)
INSERT INTO Test VALUES(@i)
INSERT INTO Test VALUES(@i)
INSERT INTO Test VALUES(@i)
SET @i = @i + 1
END
GO
CREATE STATISTICS Stat ON Test(Col1)
GO
As you can see, we have a table with 25,000 rows, and each value is duplicated across five rows; so the density is
calculated as 1.0 / 5000 = 0.0002. The first question we can answer that that information is: “How many unique
values do we have in the column ‘Col1’?” Just using the density information we can make the following, 1.0 / 0.0002
= 5000. The second question is: “What is the average number of duplication per value in the ‘Col1’ column?” Using
the density information we can calculate: 0.0002 * 25000 = 5 (Which is exactly the average of duplicated values in
this case).
Why You Should Know This
Using the table created above, let’s see where the SQL Server uses that information in practice. Take the following
query:
DECLARE @I INT
SET @I = 999999
SELECT *
FROM Test
WHERE Col1 = @I
The SQL Server will use the information about the average density to estimate how many rows will be returned (in
this case, 5 rows). We will see more about why the SQL uses the density information in the section 7.
Another example:
SELECT COUNT(*) FROM Test
GROUP BY Col1
In the first step, the Table Scan operator estimates that 25000 rows will be returned, and then after the Hash Mach
Operator applies the Group By, only 5000 rows will be returned. In that case, SQL Server uses the information about
how many unique rows are in the column.
The density information used to give the QO an impression of how many duplicated rows exist in a column, which
allows it to chose the optimal operator or join algorithm in an execution plan. In certain cases, the QO also uses
density information to guess how many rows will be returned by a query – look at section 5 for more details.
3. Cardinality:
This is used to measure the number of rows which satisfy one condition. For instance, imagine one table with 500
rows, and a query with “WHERE NAME = ‘Joao'” – the Optimizer goes to the statistics and reads from the histogram
that ‘Joao’ represents 5% of the table, so the cardinality is (5% x 500) = 25. In the execution plan, we can think of the
cardinality as the “estimated number of rows” displayed in the tool tips of each operator. I probably don’t need to
spell it out, but a bad estimation of cardinality can easily result in an inefficient plan.
Why you should know that
In my opinion, the cardinality of a column is one of the most important pieces of information to the creation of an
efficient execution plan. To create an execution plan, we really need to know (or at least be able to make a decent
estimate about) how many rows will be returned to each operator. For example, a join query which returns a lot of
rows would be better processed with a hash join; But if the QO doesn’t know about the cardinality, it can mistakenly
chose to use a Loop Join or, even worst, chose to order the key columns to make use of a merge join!
If you have upgraded your application from a previous version of SQL Server, different indexes may be more efficient
in new SQL Server build because of optimizer and storage engine changes. The Database Engine Tuning Advisor helps
you to determine if a change in indexing strategy would improve performance.
Join hints prevent an ad hoc query from being eligible for auto-parameterization and caching of the query plan.
When you use a join hint, it implies that you want to force the join order for all tables in the query, even if those
joins do not explicitly use a hint.
If the query that you are analyzing includes any hints, remove them, and then reevaluate the performance.
Examine the Execution Plan
After you confirm that the correct indexes exist, and that no hints are restricting the optimizer's ability to generate
an efficient plan, you can examine the query execution plan. You can use any of the following methods to view the
execution plan for a query:
SQL Profiler:
Execution Plan event in SQL Profiler, it would occur immediately before the StmtCompleted event for the query for
the system process ID (SPID).
SQL Query Analyzer: Graphical Show plan
With the query selected in the query window, click the Query menu, and then click Display Estimated Execution Plan.
Note: If the stored procedure or batch creates and references temporary tables, you must use a SET STATISTICS
PROFILE ON statement or explicitly create the temporary tables before you display the execution plan.
SHOWPLAN_ALL and SHOWPLAN_TEXT
To receive a text version of the estimated execution plan, you can use the SET SHOWPLAN_ALL and
SET SHOWPLAN_TEXT options. See the SET SHOWPLAN_ALL (T-SQL) and SET SHOWPLAN_TEXT (T-SQL) topics in SQL
Server Books Online for more details.
Note
If the stored procedure or batch creates and references temporary tables, you must use the SET STATISTICS PROFILE
ON option or explicitly create the temporary tables before displaying the execution plan.
STATISTICS PROFILE
When you are displaying the estimated execution plan, either graphically or by using SHOWPLAN, the query is not
executed. Therefore, if you create temporary tables in a batch or a stored procedure, you cannot display the
estimated execution plans because the temporary tables will not exist. STATISTICS PROFILE executes the query first,
and then displays the actual execution plan. See the SET STATISTICS PROFILE (T-SQL) topic in SQL Server Books
Online for more details. When it is running in SQL Query Analyzer, this appears in graphical format on the Execution
Plan tab in the results pane.
Using a join hint in a large query implicitly forces the join order for the other tables in the query as if FORCEPLAN was
set.
Correct Join Type
SQL Server uses nested loop, hash, and merge joins. If a slow-performing query is using one join technique over
another, you can try forcing a different join type. For example, if a query is using a hash join, you can force a nested
loops join by using the LOOP join hint. See the "FROM (T-SQL)" topic in SQL Server Books Online for more details on
join hints.
Using a join hint in a large query implicitly forces the join type for the other tables in the query as if FORCEPLAN was
set.
Parallel Execution
If you are using a multiprocessor computer, you can also investigate whether a parallel plan is in use. If parallelism is
in use, you see a PARALLELISM (Gather Streams) event. If a particular query is slow when it is using a parallel plan,
you can try forcing a non-parallel plan by using the OPTION (MAXDOP 1) hint. See the "SELECT (T-SQL)" topic in SQL
Server Books Online for more details.
Data modeling is a technique to document a software system using entity relationship diagrams (ER Diagram) which
is a representation of the data structures in a table for a company’s database. It is a very powerful expression of the
company’s business requirements. Data models are used for many purposes, from high-level conceptual models,
logical to physical data models and typically represented by the entity-relationship diagram. It serves as a guide used
by database analysts and software developers in the design and implementation of a system and the underlining
database.
Attributes
Information such as property, facts you need to describe each table – Attributes are facts or descriptions of entities.
They are also often nouns and become the columns of the table. For example, for entity students, the attributes can
be first name, last name, email, address, and phone numbers.
Primary Key Is An Attribute Or A Set Of Attributes That Uniquely Identifies An Instance Of The Entity. For Example,
For A Student Entity, Student Number Is The Primary Key Since No Two Students Have The Same Student Number.
We Can Have Only One Primary Key In A Table. It Identifies Uniquely Every Row And It Cannot Be Null.
Foreign Key Is A Key Used To Link Two Tables Together. Typically You Take The Primary Key Field From One Table
And Insert It Into The Other Table Where It Becomes A Foreign Key (It Remains A Primary Key In The Original Table).
We Can Have More Than One Foreign Key In A Table.
Relationships: How tables are linked together – Relationships are the associations between the entities. Verbs often
describe relationships between entities. We will use Crow’s Foot Symbols to represent the relationships. Three types
of relationships are discussed in this lab. If you read or hear cardinality ratios, it also refers to types of relationships.
Cardinality: it defines the possible number of occurrences in one entity which is associated with the number of
occurrences in another. For example, ONE team has MANY players. When present in an ERD, the entity Team and
Player are inter-connected with a one-to-many relationship.
In an ER diagram, cardinality is represented as a crow’s foot at the connector’s ends. The three common cardinal
relationships are one-to-one, one-to-many, and many-to-many. Here is some examples cardinality of relationship in
ERD:
The Data Model is defined as an abstract model that organizes data description, data semantics, and consistency
constraints of data. The data model emphasizes on what data is needed and how it should be organized instead of
what operations will be performed on data. Data Model is like an architect's building plan, which helps to build
conceptual models and set a relationship between data items.
The two types of Data Modeling Techniques are
1. Entity Relationship (E-R) Model
2. UML (Unified Modelling Language)
Why use Data Model?
Types of Data Models
Conceptual Data Model
Logical Data Model
Physical Data Model
Advantages and Disadvantages of Data Model
Types of Data Models: There are mainly three different types of data models: conceptual data models, logical data
models, and physical data models, and each one has a specific purpose. The data models are used to represent the
data and how it is stored in the database and to set the relationship between data items.
Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically created by
Business stakeholders and Data Architects. The purpose is to organize, scope and define business concepts and rules.
Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model is
typically created by Data Architects and Business Analysts. The purpose is to developed technical map of rules and
data structures.
Physical Data Model: This Data Model describes HOW the system will be implemented using a specific DBMS
system. This model is typically created by DBA and developers. The purpose is actual implementation of the
database.
Types of Data Model:
Conceptual Data Model
A Conceptual Data Model is an organized view of database concepts and their relationships. The purpose of creating
a conceptual data model is to establish entities, their attributes, and relationships. In this data modeling level, there
is hardly any detail available on the actual database structure. Business stakeholders and data architects typically
create a conceptual data model.
The 3 basic tenants of Conceptual Data Model are
Entity: A real-world thing
Attribute: Characteristics or properties of an entity
Relationship: Dependency or association between two entities
Data model example:
Customer and Product are two entities. Customer number and name are attributes of the Customer entity
Product name and price are attributes of product entity
Sale is the relationship between the customer and product
Conclusion
1. Data modeling is the process of developing data model for the data to be stored in a Database.
2. Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring
quality of the data.
3. Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.
4. There are three types of conceptual, logical, and physical.
5. The main aim of conceptual model is to establish the entities, their attributes, and their relationships.
6. Logical data model defines the structure of the data elements and set the relationships between them.
7. A Physical Data Model describes the database specific implementation of the data model.
8. The main goal of a designing data model is to make certain that data objects offered by the functional team
are represented accurately.
9. The biggest drawback is that even smaller change made in structure require modification in the entire
application.
Another advantage of filegroups is the ability to back up only a single filegroup at a time. This can be extremely
useful for a VLDB, because the sheer size of the database could make backing up an extremely time-consuming
process.
Yet another advantage is the ability to mark the filegroup and all data in the files that are part of it as either read-
only or read-write.
Disadvantages
The first is the administration that is involved in keeping track of the files in the filegroup and the database objects
that are placed in them.
The other is that if you are working with a smaller database and have RAID-5 implemented, you may not be
improving performance.
Is there any benefit to add multiple log files? Yes/No, Why?
No, there is no benefit of adding multiple log files in a database as the write operations in a Transaction log files are
always serial.
What are the recommended settings for transaction Log File for file growth?
If you are required to set the setting for Auto growth of Transaction log file, it should always be in a specific size
instead of percentage.
What’s the difference between database version and database compatibility level?
Database version
The database version is a number stamped in the boot page of a database that indicates the SQL Server version of
the most recent SQL Server instance the database was attached to.
USE master;
GO
SELECT DatabaseProperty ('dbccpagetest', 'version');
GO
The transaction log is used to guarantee the data integrity of the database and for data recovery.
The SQL Server Database Engine divides each physical log file internally into a number of virtual log files.
Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file.
The only time virtual log files affect system performance is if the log files are defined by small size and
growth_increment values. If these log files grow to a large size because of many small increments, they will
have lots of virtual log files. This can slow down database startup and also log backup and restore
operations.
Can you describe SQL Server Memory Architecture
SQL Server dynamically acquires and frees memory as required. Typically, an administrator need not have to
specify how much memory should be allocated to SQL Server, although the option still exists and is required
in some environments.
SQL Server supports Address Windowing Extensions (AWE) allowing use of physical memory over 4 gigabytes
(GB) on 32-bit versions of Microsoft Windows operating systems. This feature is deprecated from Dinali
2012.
SQL Server tries to reach a balance between two goals:
Keep the buffer pool from becoming so big that the entire system is low on memory.
Minimize physical I/O to the database files by maximizing the size of the buffer pool.
Do you have any idea about Buffer Management?
A buffer is a 8kb size in memory. To reduce the I/O operations from database to disk buffer manager use the buffer
cache. BM gets the data from database to buffer cache and modifies the data and the modified page is sent back to
the disk
The buffer manager only performs reads and writes to the database. Other file and database operations such as
open, close, extend, and shrink are performed by the database manager and file manager components.
When you submit a query to a SQL Server database, a number of processes on the Server go to work on that query.
The purpose of all these processes is to manage the system such that it will provide your data back to you, or store
it, in as timely a manner as possible, whilst maintaining the integrity of the data.
All these processes go through two stages:
1. Relational Engine
2. Storage Engine
At Client:
User enter data and click on submit
The client database library transforms the original request into a sequence of one or more Transact-SQL
statements to be sent to SQL Server. These statements are encapsulated in one or more Tabular Data
Stream (TDS) packets and passed to the database network library
The database network library uses the network library available in the client computer to repackage the TDS
packets as network protocol packets.
The network protocol packets are sent to the Server computer network library across the network
At Server:
The extracted TDS packets are sent to Open Data Services (ODS), where the original query is extracted.
ODS sends the query to the relational engine
A connection established to the relational engine and assign a SID to the connection
At Relational Engine:
Check permissions and determines if the query can be executed by the user associated with the request
Query sends to Query Parser
It checks that the T-SQL is written correctly
Build a Parse Tree \ Sequence Tree
Parse Tree sends to Algebrizer
Verifies all the columns, objects and data types
Aggregate Binding (determines the location of aggregates such as GROUP BY, and MAX)
Builds a Query Processor Tree in Binary Format
Query Processor Tree sends to Optimizer
Based on the query processor tree and Histogram (Statistics) builds an optimized execution plan
Stores the execution plan into cache and send it to the database engine
At Database Engine:
Database engine map a batch into different tasks
Each task associated with a process
Each process assigned with a Windows Thread or a Windows Fiber. The worker thread takes care of this.
The Thread/Fiber send to the execution queue and wait for the CPU time.
The Thread/Fiber identifies the table location where the data need to be stored
Go to the file header, checks the PFS, GAM and GSAM and go to the correct page
Verifies the page is not corrupted using Torn page Detection / Check SUM and writes the data
If require allocates new pages and stores data on it. Once the data is stored/updated/added in a page, it
updates the below locations
Page Header – Checksum / Torn Page Detection (Sector info)
In this process theMemory manager take care of allocating buffers, new pages etc,
Lock manager take care of allocating appropriate locks on the objects/pages and releasing them when task
completed
Thread Scheduler: schedules the threads for CPU time
I/O manager: Establish memory bus for read/write operations from memory to disk and vice versa
Deadlock\Resource\Scheduler Monitor: Monitors the processes
If that is a DML operation, it picks the appropriate page from disk and put the page in Memory.
While the page is available on Memory based on the ISOLATION LEVEL an shared / exclusive / update /
Schema lock issued on that page.
Once the page is modified at Memory, that means once the transaction completed the transactional
operation logged into log file (.ldf) to the concerned VLF.
Here we should understand that only the operation (T-SQL statements) logged into ldf file. The modified
page waits in memory till check point happens. These pages are know as dirty pages as the page data is differ
in between the page on Disk and Memory.
Once the checkpoint happens the page will be written back to the disk.
Once the process is completed the result set is submitted to the relational engine and follow the same
process for sending back the result set to client application.
The connection will be closed and the SID is removed
Network Protocols:
What are the different types of network protocols? Explain each of them in detail
Shared Memory:
Clients using the shared memory protocol can only connect to a SQL Server instance running on the same computer;
it is not useful for most database activity. Use the shared memory protocol for troubleshooting when you suspect
the other protocols are configured incorrectly.
Server – Machine 1
Clients – Machine 1
TCP/IP:
TCP/IP is a common protocol widely used over the Internet. It communicates across interconnected networks of
computers that have diverse hardware architectures and various operating systems. TCP/IP includes standards for
routing network traffic and offers advanced security features. It is the most popular protocol that is used in business
today.
Server – Machine 1
Clients – WAN (Any machine from any network)
Named Pipes:
Named Pipes is a protocol developed for local area networks. A part of memory is used by one process to pass
information to another process, so that the output of one is the input of the other. The second process can be local
(on the same computer as the first) or remote (on a networked computer).
Server – Machine 1
Clients – LAN (Any machine from LAN)
VIA:
Virtual Interface Adapter (VIA) protocol works with VIA hardware. This feature will be deprecated in future releases.
How Min and Max Server memory options impact memory usage from SQL Server?
The min Server memory and max Server memory configuration options establish upper and lower limits to the
amount of memory used by the buffer pool of the Microsoft SQL Server Database Engine. The buffer pool starts with
only the memory required to initialize. As the Database Engine workload increases, it keeps acquiring the memory
required to support the workload. The buffer pool does not free any of the acquired memory until it reaches the
amount specified in min Server memory. Once min Server memory is reached, the buffer pool then uses the
standard algorithm to acquire and free memory as needed. The only difference is that the buffer pool never drops its
memory allocation below the level specified in min Server memory, and never acquires more memory than the level
specified in max Server memory.
I have restarted my windows Server. Can you be able to explain how memory allocation happens for SQL Server?
Memory allocation is always depends on CPU architecture.
32 Bit:
Initially allocates memory for “Memory To Leave” (MTL) also known as VAS Reservation (384 MB). This MTL
value can be modified using the start parameter “–g”
Then Allocates memory for Buffer Pool = User VAS – MTL (Reserved VAS) = Available VAS
Maximum BPool Size = 2048 MB – 384 MB = 1664 MB.
64 Bit:
Allocates Memory for Buffer Pool based on Maximum Server Memory configuration
Non-Buffer Pool Memory region (MTL / VAS Reservation) = Total Physical Memory – (Max Server Memory +
Physical Memory Used by OS and Other Apps)
Ex: Windows Server is having 64 GB physical memory; SQL Server Max Server Memory = 54 GB and OS and
other apps are using 6 GB than the memory available for
Non-BPool (MTL / VAS Reservation) = 64 – (54+6) = 4 GB
Max Buffer Pool Size = Max Server Memory = 54 GB
Can you technically explain how memory allocated and lock pages works?
Windows OS runs all processes on its own Virtual Memory known as Virtual Address Space and this VAS is divided
into Kernel (System) and User (Application) mode.
Default: No Lock Pages in Memory is enabled
SQL Server memory allocations made under User Mode VAS using VirtualAlloc() WPI function.
Any memory that allocates via VirtualAlloc () can be paged to disk.
SQLOS resource monitor checks QueryMemoryResourceNotification windows API and when windows sets Memory
Low notification, SQLOS responds to that request by releasing the memory back to windows.
Lock Pages in Memory is Enabled for SQL Server:
SQL Server memory allocations are made using calls to the function AllocateUserPhysicalPages () in AWE API.
Memory that allocates using AllocateUserPhysicalPages () is considered to be locked. That means these pages should
be on Physical Memory and need not be released when a Memory Low Notification on Windows.
BCM (Bulk Changed map): BCM page is used in bulk-logged recovery model to track extends changed due to bulk-
logged or minimally logged operations.
During log backup, the database engine reads BCM page and includes all the extents which have been modified by
the bulk-logged process. If the Value 1 means modified extent and 0 means not modified. After each log backup, all
these extents are marked to 0
If a column in the table is having >50% NULL values then which index to use?
If a column in the table is having >50% NULL values then index selection if very selective.
Index schema is based on B-Tree structure and if the column is having more than 50% NULL values then all data will
reside on one side of the tree result ZERO benefits of the index on query execution.
The SQL Server is having a special index called filtered index which can be used here. You can create an index on
column only on NON NULL values. NULL data will not be included in the index.
Log shipping may fail due to multiple issues can cause data long, Here are some possible reasons:
Changes in shared folder or share access permissions:
Copy Job is responsible for copying log backup from primary to the secondary Server. If you have to change
the shared folder permissions on the primary Server, Copy job will not able to access the share of primary
Server to copy Log backups.
A human error like someone deletes the T-log backup file or truncate T-log on the primary Server:
This is the most common issue. If someone has deleted Log backup by mistake log chain will break and
restore might fail. Also if Someone truncates the T-Log file on primary than log chain will break and new log
backup will not generate.
Low Drive free space is low on secondary:
Standard approach is to have similar drive structure and space on secondary like primary. If secondary has
less drive space that it may get full and impact copy or restore process.
Low I/O, Memory, Network resources:
Copy & Restore job needs resource and if you have a long list of databases then you need high resources.
Secondary Server with low network \ IO or memory can cause Server slowness \ crash or delay in the
restore.
TUF file is missing:
TUF is transaction undo file which contains active transaction details. If TUF file is deleted and you do not
have a backup then you have to reset shipping.
MSDB database is full:
MSDB keeps track of log shipping & restore history. If MSDB got full then Copy & Restore jobs will start
failed.
Explain Locks in SQL Server and how many locks are available?
Locks allow seamless functioning of the SQL Server even in concurrent user sessions. As we as a whole know,
different clients need to get to databases simultaneously. So locks come for rescue to keep information from being
undermined or negated when numerous clients endeavor to do data manipulation tasks DML operations, for
example, read, compose and update on database. "Lock is characterized as a component to guarantee information
integration, consistency while enabling simultaneous access to information. It is utilized to execute simultaneous
control when various clients get to Database to control its information in the meantime".
Locks can be applied to various database components. Please find below areas where a lock can be applied :
RID: (Row ID) This helps us in locking a single row inside a table.
Table: It locks the whole table, even data, and indexes as well.
Key: It intends to lock key available in tables. It implies the primary key, Candidate Key, Secondary key and so forth.
Page: The page represents an 8-kilobyte (KB) data page or index page. The lock can be placed on Page Level
additionally, it implies if a specific page is locked so another client can't refresh the information on it.
Extent: Extent is represented by a Contiguous gathering of eight data pages that can incorporate index pages
moreover.
Database: Entire Database can be locked for some sort of clients who have read authorization on the database.
Exclusive (X): This lock type, when forced, will guarantee that a page or row will be available only for the transaction
that forced the lock, as long as the transaction is running. The X lock will be forced by the transaction when it needs
to manipulate the page or row data by DML operations like inserting, modifying and deleting. This lock can be forced
to a page or row just if there is no other shared or exclusive lock forced as of now on the resources. This ensures that
only one exclusive lock can be applied and no other lock can be applied afterward till the time the previous lock gets
removed.
Shared (S): This lock type, when forced, will save a page or row to be accessible just for reading, which implies that
some other transaction will be not allowed to manipulate record till the time lock remains active. Nonetheless, a
shared lock can be forced by multiple transactions concurrently over a similar page or row and in that manner,
multiple transactions can share the capacity for data reading. A shared lock will permit writing tasks, yet no DDL
changes will be permitted
Update (U): an update lock is like an exclusive lock however is intended to be more adaptable. An update lock can be
forced on a record that as of now has a shared lock. In such a case, the update lock will force another shared lock on
the intended When the transaction that holds the update lock is prepared to change the data, the update lock (U)
will be changed to an exclusive lock (X). While the update lock can be forced on a record that has the shared lock but
the shared lock can't be forced on the record that as of now has the update lock
Intent (I): The idea behind such a lock is to guarantee data modification to be executed appropriately by stopping
another transaction to gain a lock on the next in the hierarchy object. Generally, when a transaction needs to obtain
a lock on the row, it will gain an intent lock on a table, which is a higher chain of the intended object. By obtaining
the intent lock, the transaction won't enable other transactions to procure the exclusive lock on that table.
Schema (Sch): This lock is applied on a table or index when we want to manipulate any changes in that resource. We
can have only one Schema lock at a given point of time. This lock gets applied when we perform operations that
depend on the schema of an object.
Bulk update (BU): This lock is required when bulk operations need to perform. At the point when a bulk update lock
is gained, different transactions won't most likely access a table during the mass load execution. Be that as it may, a
bulk update lock won't avoid another bulk update to be executed in parallel.
Locking hierarchy of database objects can be understood by the below diagram and lock is always obtained from top
to bottom:
The optimistic approach will ensure that in case dirty read is identified based on versioning we will start from fresh.
This strategy is popular where system deals with a high volume of data and even in the case of n tier applications
where we are not always connected to a database with single connections. It is a pool of connection from which we
connect to the database with any connection which is free and available at that time. We can not apply lock in such
cases. This strategy is applied in most of the banking operations.
Pessimistic is just opposite of Optimistic lock as it takes an exclusive lock on resources till the time we finish with our
operations. It keeps data integrity high but performance will be always slower in this case. We need to connect to a
database with an active connection (which is the case in two-tier application).
What is virtual log file?
Each transaction log file internally divided into several virtual log files. Virtual log files have no fixed size, and there is
no fixed number of virtual log files for a physical log file.
How do you get total no of virtual log files in a transaction log file?
The new DMF sys.dm_db_log_info introduced in SQL Server 2017 that will returns Virtual Log File information of the
transaction log files. If we will specify NULL or DEFAULT value, it will return VLF information of the current database.
The built-in function DB_ID can also be specified to get details of a particular database. The sys.dm_db_log_info DMF
replaces the DBCC LOGINFO statement from earlier versions.
1. Automatic Checkpoints
An automatic checkpoint occurs each time the number of log records reaches the number the Database Engine
estimates it can process during the time specified in the recovery interval Server configuration option. You can set
the value of recovery interval using below system stored procedure.
sp_configure 'recovery interval','seconds'
Recovery interval is maximum time that a given Server instance should use to recover a database during a system
restart. Database Engine estimates the maximum number of log records it can process within the recovery interval.
When a database using automatic checkpoints reaches this maximum number of log records, the Database Engine
issues an checkpoint on the database.
The frequency depends on the value of recovery interval option of Server configuration. The time interval between
automatic checkpoints can be highly variable. A database with a substantial transaction workload will have more
frequent checkpoints than a database used primarily for read-only operations.
Database Engine generates automatic checkpoints for every database if you have not defined any value for target
recovery time for a database. If you define user value for target recovery time for any database, database engine will
not use automatic checkpoint and generate indirect checkpoint which I will discuss in next section.
Under the simple recovery model, an automatic checkpoint is also queued if the log becomes 70 percent full.
After a system crash, the length of time required to recover a given database depends largely on the amount of
random I/O needed to redo pages that were dirty at the time of the crash. This means that the recovery
interval setting is unreliable. It cannot determine an accurate recovery duration. Furthermore, when an automatic
checkpoint is in progress, the general I/O activity for data increases significantly and quite unpredictably.
For an online transaction processing (OLTP) system using short transactions, recovery interval is the primary factor
determining recovery time. However, the recovery interval option does not affect the time required to undo a long-
running transaction. Recovery of a database with a long-running transaction can take much longer than the specified
in the recovery interval option.
Typically, the default values provides optimal recovery performance. However, changing the recovery interval might
improve performance in the following circumstances:
If recovery routinely takes significantly longer than 1 minute when long-running transactions are not being rolled
back.
If you notice that frequent checkpoints are impairing performance on a database.
If you decide to increase the recovery interval setting, we recommend increasing it gradually by small increments
and evaluating the effect of each incremental increase on recovery performance. This approach is important because
as the recovery interval setting increases, database recovery takes that many times longer to complete. For example,
if you change recovery interval 10 minutes, recovery takes approximately 10 times longer to complete than
when recovery interval is set to 1 minute.
2. Indirect Checkpoints
This checkpoint was introduced in SQL Server 2012. Indirect checkpoints provide a configurable database-level
alternative to automatic checkpoints. In the event of a system crash, indirect checkpoints provide potentially faster,
more predictable recovery time than automatic checkpoints. We can set the TARGET_RECOVERY_TIME in Indirect
checkpoint for any database by executing below ALTER DATABASE statement.
--Change DBNAME with your database name.
ALTER DATABASE DBNAME SET TARGET_RECOVERY_TIME =target_recovery_time {SECONDS | MINUTES}
The difference between Automatic and Indirect checkpoint is, recovery interval configuration option uses the
number of transactions to determine the recovery time as opposed to indirect checkpoints which makes use of
number of dirty pages.
When indirect checkpoints are enabled on a database receiving a large number of DML operations, the background
writer can start aggressively flushing dirty buffers to disk to ensure that the time required to perform recovery is
within the target recovery time set of the database. This can cause additional I/O activity on certain systems which
can contribute to a performance bottleneck if the disk subsystem is operating above or nearing the I/O threshold.
Indirect checkpoints enable you to reliably control database recovery time by factoring in the cost of random I/O
during REDO.
Indirect checkpoints reduce checkpoint-related I/O spiking by continually writing dirty pages to disk in the
background. However, an online transactional workload on a database configured for indirect checkpoints can
experience performance degradation. This is because the background writer used by indirect checkpoint sometimes
increases the total write load for a Server instance.
Indirect checkpoint is the default behavior for new databases created in SQL Server 2016. Databases which were
upgraded in place or restored from a previous version of SQL Server will use the previous automatic checkpoint
behavior unless explicitly altered to use indirect checkpoint.
3. Manual Checkpoints
Manual checkpoint issued when you execute a T-SQL CHECKPOINT command for a database. By default, manual
checkpoints run to completion. Throttling works the same way as for automatic checkpoints.
4. Internal Checkpoints
Internal Checkpoints are generated by various Server components to guarantee that disk images match the current
state of the log. Internal checkpoint are generated in response to the following events:
Database files have been added or removed by using ALTER DATABASE.
A database backup is taken.
A database snapshot is created, whether explicitly or internally for DBCC CHECK.
An activity requiring a database shutdown is performed. For example, AUTO_CLOSE is ON and the last user
connection to the database is closed, or a database option change is made that requires a restart of the database.
An instance of SQL Server is stopped by stopping the SQL Server (MSSQLSERVER) service. Either action causes a
checkpoint in each database in the instance of SQL Server.
What is your troubleshooting strategy? Suppose a user reported an issue, how will you start to fix the issue?
Good troubleshooters are those who understand the in-depth information about the product\features where issue
arises. To fix any issue, we should follow below points.
First we should understand the issue/error codes, ask questions to the user about when and how they get error.
If we are not getting anything from step 2, we need to read error log file, Job history log, and Event log files based
on the nature of the issue.
Look into SQL Server transactions and analyze the wait types and other parameters like CPU, IO, Memory if it is
related to performance issue.
If we are still not getting any solution, try to search the error on social sites or take help from your seniors because
you are not the only one who has received this error.
Tell me in which sequence SQL Server databases come ONLINE after starting or restarting a SQL Server Instance.
There is no defined sequence for user databases to come online. Although, below is the sequence in which system
databases come online. User databases can come online anytime after master database become accessible.
1. Master
2. MSDB
3. ResourceDB (Although this database is not visible in SSMS)
4. Model
5. Tempdb
What will you do if your log file drive is full during data load?
Run Transaction log backup frequently.
Add another log file in another disk where is enough space to grow the log.
You can enable Auto shrink during data load if you are running with SIMPLE recovery model.
Minimal logging is more efficient than full logging, and it reduces the possibility of a large-scale bulk operation filling
the available transaction log space during a bulk transaction. However, if the database is damaged or lost when
minimal logging is in effect, you cannot recover the database to the point of failure.
Can you name few operations that logs Minimally during bulk-recovery model?
The following operations, which are fully logged under the full recovery model, are minimally logged under bulk-
logged recovery model:
Bulk import operations (bcp, BULK INSERT, and INSERT… SELECT).
SELECT INTO operations
CREATE INDEX operations
ALTER INDEX REBUILD or DBCC DBREINDEX operations.
DROP INDEX new heap rebuild
If I change my database recovery model from FULL to SIMPLE, does transactions will be logged in to log file?
Yes, Transactions will be logged in SIMPLE recovery model as well. The difference is all logged transactions will be
cleared during checkpoint operation in this recovery model
If yes in above question then how is SIMPLE RECOVERY Model different from FULL Recovery Model?
All logged transactions will be cleared during checkpoint operation and transaction log backup is not allowed so
point-in-time recovery is not possible in SIMPLE recovery model. Whereas transaction backup is allowed in full
recovery model and point-in-time recovery is also supported. Logs got cleared only after taking log backup or
switching the recovery model to SIMPLE.
How differential backup works? Or How differential backup captures only updated data since full backup in its
dump file?
Differential Changed Map is a page type that stores information about extents that have changed since the last full
backup. Database engine reads just the DCM pages to determine which extents have been modified and captures
those extents in differential backup file.
Why a database log file is growing like anything that is running in SIMPLE Recovery Model?
It means some transactions are active and running on your database. As we know logs are captured in simple
recovery model as well so that active transaction is getting logged there. The inactive portion in log file clears during
checkpoint operation.
Suppose we are running Daily full backup at 8PM in night and every half an hour transaction log backup. Now your
database is crashed at 3.41PM. How will you recover your database to the point it was crashed?
Below steps we will perform in a sequence manner to recover this database to the point it was crashed.
First we will run tail log backup to capture all transactions that are not captured in previous log backups at
the point database was crashed.
Restore last night Full backup with NORECOVERY that was taken at 8PM.
Apply all transaction log backup since last night full backup with norecovery.
Apply tail log backup on the database with recovery and with STOPAT parameter.
Take the same scenario as above, now you found that there is one log file let’s say at 2 PM got corrupted and not
restorable. What will be the impact on your database recovery?
We cannot recover this database till that point it was crashed and we would have last data till 1.30PM considering
log backup runs on every half an hour.
Suppose we are running Weekly Sunday full backup at 8PM, daily differential backup at 10PM and every half an
hour transaction log backup. Now your database is crashed on Saturday 3.11PM. What would be your fastest way
to recover this database in point in time?
We will perform below steps to recover this database in point-in-time:
Try to run tail log backup at the point database was crashed.
Restore latest Weekly Sunday Full backup with NORECOVERY that was taken at 8PM.
Restore Friday night differential backup with NORECOVERY that was taken at 10PM.
Apply all transaction log backup since Friday differential backup with norecovery.
Apply tail log backup on the database with recovery and with STOPAT parameter.
In addition to above question, suppose you came to know that Friday night differential backup was corrupted then
what would be your strategy to recovery the database in point-in time?
We will perform below steps to recover this database in point-in-time:
Try to run tail log backup at the point database was crashed.
Restore latest Weekly Sunday Full backup with NORECOVERY that was taken at 8PM.
Restore Thursday night differential backup with NORECOVERY that was taken at 10PM.
Apply all transaction log backup since Thursday night differential backup with norecovery.
Apply tail log backup on the database with recovery and with STOPAT parameter.
Suppose you came to know that differential backups ran on Monday and Wednesday are corrupted and
you have only Tuesday and Thursday differential backups along with full backup and all log backups.
Explain your sequence to restore the database?
We will follow same sequence that we follow in previous question. We will apply weekly full backup then
Thursday differential backup along with all transaction log backups.
What is COPY_ONLY full backup and how it is different from regular full backups?
Difference between regular full and copy-only full backup is that copy-only full backup does not break the differential
chain. Neither of them breaks the log chain as neither of them truncates the log file. A copy-only backup cannot
serve as a differential base or differential backup and does not affect the differential base.
What is Fill Factor and what should be the perfect value for it?
A fill factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the
remainder on each page as free space for future growth. Fill factor values 0 and 100 are the same in all respects. The
fill-factor option is provided for fine-tuning index data storage and performance. Although, there is no defined value
that we can say is perfect value for fill factor. You can set it to somewhere around 80% and monitor fragmentation
over time. Then, you can tweak its value up or down depending on how fragmented the indexes get. Read more
about fill factor and its best value.
Explain Page Split and whether it is good for SQL Server or bad?
Whenever you update existing rows with the data that is bigger in size to save on their data page then Page Split
operation occur to make space for this new update. Page split reduces performance so we can say page split is not at
all a good thing for our database.
Why is Secondary Replica Showing in Resolving state after AOAG Automatic failover?
Sometimes AOAG configuration stuck in resolving state and its secondary Replica does not come online. When you
check the Alwayson Availability Group configuration, you find it showing in resolving state after Automatic failover of
AOAG Configuration. Let us get the reason behind secondary replica stuck in resolving state and solution to
overcome such issues in future.
SQL Server AlwaysOn Availability Group is an advanced version of database mirroring and introduced in SQL Server
2012. AOAG is HA & DR solution for SQL Server databases. We can group a bunch of databases as one Availability
Group and failover/failback as one entity whereas it was not possible in database mirroring. Database Mirroring can
be configured only for single database at a time. I have also an AOAG configuration in my environment which is set
for Automatic failover between both replicas. Last week, we faced an issue during automatic failover. Secondary
replica was not transitioning as primary replica during an automatic failover and it was stuck in “Resolving” state. It
was full down time for application because primary replica was lost and availability databases were not coming
online on secondary replica.
Automatic Failover Not Working and Secondary Replica Stuck in Resolving State
As secondary replica is not transitioning as primary replica after failover because secondary replica hung in resolving
state so here, we will check the main root cause and its fix so that secondary replica comes online after every
failover. Let’s check the current configuration of this AOAG. Run below command to get the AOAG details. We can
see the configuration is running into automatic failover mode.
SELECT replica_Server_name, availability_mode, availability_mode_desc, failover_mode, failover_mode_desc
FROM sys.availability_replicas
Now connect to secondary replica in SQL Server Management Studio. Expand the AlwaysOn High Availability folder
followed by Availability Group folder. Here, you can see the current state of this availability group which is showing
in Resolving state.
I checked the SQL Server error log file but i did not get enough information about this issue to proceed. I searched
about this issue and got few suggestions on web that i should check the cluster log. I checked failover cluster
manager for critical events. I got below event in failover cluster manager.
Clustered role 'AG_***' has exceeded its failover threshold. It has exhausted the configured number of failover
attempts within the failover period of time allotted to it and will be left in a failed state.
No additional attempts will be made to bring the role online or fail it over to another node in the cluster.
Please check the events associated with the failure. After the issues causing the failure are resolved the role can
be brought online manually or the cluster may attempt to bring it online again after the restart delay period.
We can see the failover threshold is set to only 1 in the 6 hours. It means if any failover happen more than once in
the 6 hrs time period it will remain in failed state and will not try to come online. Now increase this value as per your
need. I have changed it to 10. Click on apply and Ok button.
Now, secondary replica of availability databases will not come online automatically. We need to bring them online.
We have multiple options to bring them online from resolving state. You can directly run the failover by right click on
availability group in SSMS and then proceed with failover.
Another option is to bring them online from failover cluster manager. You can either right click on the AOAG role and
click on “Bring Online…” option or you can failover it to best possible node. As primary replica is down and not
accessible so it will be failed over to the secondary replica only because this node is only available this time. Let’s
right click on AOAG role in failover cluster manager, choose move and then click on “Best Possible Node”
It will process and come online within few seconds and this time you can see this role is online from node 2 as
current owner is showing as node 2 now.
We can also validate this change while making a database connection to the secondary replica in SSMS. Connect to
secondary replica and expand the AlwaysOn Availability Group folders. You can see now availability group is showing
as primary. That means your database is online from secondary replica to accept the user connections.
You can also test this exercise to make sure automatic failover is working fine or not to prevent this kind of future
outages. Once your earlier primary replica comes online, you shutdown SQL Server services on secondary
replica/current primary replica. Once it will go down, your current secondary replica becomes primary automatically.
Before shutdown SQL Server services, make sure databases should be fully synchronized.
Hee, we have fixed Alwayson issue where secondary replica is not coming online after failover and hung in resolving
state. I hope you like this article. Please follow us on our facebook page and on Twitter handle to get latest updates.
Why Should You Always Turn Off Database Auto Shrink Property?
Auto Shrink is database property that allow SQL Server to automatically shrink database files if its value set to
ON/True. Shrinking a database is not a good practice because it is very expensive operation in terms of I/O, CPU
usage, locking and transaction log generation. Database auto shrink operation also causes your Indexes to be
fragmented because it runs frequently.
Database Auto Shrink or Manual shrink operation runs at database file level. It is not recommended to shrink
your data files except in case of the few exceptions like data deletion. You can run shrink operation if you have
deleted some portion of the data to reclaim that space. As this shrink operation can cause index fragmentation so
make sure to rebuild your fragmented indexes post performing the shrink operation. Shrinking the log file may be
necessary in some cases if your log file is full and need to clear some space. However, shrinking the log file should
not be part of any regular maintenance activity and should be manually done whenever it will require.
It is always recommended to never run shrink operation as part of maintenance activity even you should avoid
running shrink operation manually if you are doing it. If it is needed to run shrink operation, make sure rebuild your
indexes. So you should always turn off Auto Shrink for all databases to avoid any future performance issue on your
database. By default, SQL Server keeps auto shrink turn off on SQL Server instances.
There will be a bad impact on performance if you turn on database auto shrink and autogrowth settings together for
any database. Most of the database files have some value to autogrowth setting enabled for the databases or we set
the database size to the optimum value keeping some room in the data files to grow and avoid frequent autogrowth
events. If we will enable database auto shrink property for such databases, it will shrink the data files and reclaim the
free space that we have kept intentionally to avoid autogrow events. In that case, both operations will be performed
frequently autogrowth and then autoshrink that ultimately lead to the system level fragmentation which causes
severe performance issue as well.
So in short, Autoshrink should not be turned on for any databases. This is bad for several reasons that I have
concluded in below points.
Database Auto Shrink or Manual Shrink causes index fragmentation that will reduce the database performance.
Shrink operation takes lot of IO and CPU resources. If the Server is already pushing the limits of the IO subsystem,
running shrink may push it over, causing long disk queue lengths and possibly IO timeouts.
Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down
performance. It wastes a huge amount of resources, basically running the shrink algorithm for no reason.
If you combine the autogrow and autoshrink options, you might create unnecessary overhead. Make sure that the
thresholds that trigger the grow and shrink operations will not cause frequent up and down size changes. So
Autogrow and Autoshrink together can seriously reduce your system performance.
Database Auto Shrink and autogrow must be carefully evaluated by a trained Database Administrator (DBA); they
must not be left unmanaged.
Autoshrink doesn’t work like if the threshold is hit, auto shrink operation will start to reclaim space. It uses round
robin method to shrink databases if you have multiple databases are set to use auto shrink. SQL Server shrinks a
database if needed. Then, it waits several minutes before checking the next database that is configured for auto
shrink so your database will need to wait for his turn to execute auto shrink operation.
Refresh Data Every (days). Enter the number of days after which the clone should get refreshed. For example, if you
have entered seven in this field, then Era refreshes the clone after every seven days.
Refresh Time. Enter the time at which clone refresh operation should start. Click Create.
A message appears indicating that the refresh operation has been created. You can click the message and then go
to Scheduled on the Operations page to view all the scheduled tasks.
To change the frequency and time slot of the schedule, you can select Refresh > Update from the Schedule drop-
down list, and then provide the new frequency and time slot.
Source
Security of SQL Server environments is considered to be among database administrators’ prime responsibilities.
Fortunately, SQL Server is designed to be a secure database platform. It holds several features that can encrypt data,
limit access and authorization, and protect data from theft, destruction, and other types of malicious behavior.
Yet, innumerable organizations continue to experience SQL database vulnerabilities, SQL injection attacks, brute-
forcing SQL credentials, and other attacks launched to manipulate data retrieval.
The threat to SQL Servers is ubiquitous nowadays. But that doesn’t mean we cannot do anything about it. To protect
the organization from such attacks, DBAs and security professionals should understand the potential threats to the
database management platform and take proactive steps to mitigate the security risks.
SQL Server is one of the most popular data platforms in the world. It is used to run an organization’s critical
operations and processes. As a result, it offers a variety of security tools for protection against malicious attacks and
to secure SQL Server instances.
However, using the default security settings can leave security gaps, leaving the network vulnerable to attacks.
Here’s a SQL Server security checklist to effectively sever the threats to your database platform.
1. Run Multiple SQL Server Security Audits
Regular Server security, login, and permission audits are a prerequisite for preventing potential attacks and to aid in
any forensic analysis of a possible data breach. However, an enterprise-level SQL Server security audit isn’t merely a
security investment, it has become a legal requirement following the new legislation like HIPAA and GDPR.
To begin with, define what you want to audit. You may want to monitor the following in your Server audits.
User logins
C2 Auditing
Common Compliance Criteria
Login Auditing
Server configuration
Schema changes
SQL Server Auditing
SQL Trace
Extended Events
Change Data Capture
DML, DDL, and Logon Triggers
A routine audit can help in improving the health of your database and network. For instance, if a query won’t run an
audit can point out to the underlying reason. Does it point to a security threat? Is it due to an error with the SQL
order of operations?
Similarly, repeated failed logins to the Server, changes and deletions to restricted database objects, and changes to
configurations and permissions indicate that someone is trying to access your Server. Regular security audits
(including login auditing) can help you spot these signs of potential attacks on the Server and arrest them before
they cause significant damage.
3. Keep It Lean
Having unnecessary software and extra applications allows hackers to exploit your SQL database Server.
Further, multiple applications are tough to manage and soon get outdated.
We all know how outdated or unpatched applications introduce security holes in the system, inviting
attackers to run unauthorized code on a SQL Server via an authorized path (SQL injection).
Limit the installation of the SQL database components and features to the ones that are necessary for
certain tasks. This will reduce the resource utilization by the database and simplify the administration, thus
minimizing the security risk.
SQL Server is undoubtedly a popular way to manage database. Due to its efficiency, business organizations largely
depend on this for data management. This dependency often causes performance bottlenecks and ultimately
hampers the productivity of the organization itself. So, it is important to find the bottlenecks and avoid them at any
cost. Read on to know everything you need to know about SQL Server bottlenecks, including how to find SQL Server
performance bottlenecks. First, we will see the definition of SQL Server bottlenecks.
What is Bottleneck in SQL Server
The term bottleneck means the neck of a bottle that reduces the flow from the bottle. Similarly, SQL Server
bottleneck means reduction in the performance of SQL Server. This situation usually occurs when any shared
resources like SQL database is concurrently accessed by too many people. Though bottlenecks are inevitable in every
system, it should be addressed to save users from loss of time and effort.
Symptoms of SQL Server Bottleneck
If you have this question in mind, how to find out if SQL having bottlenecks or not, here is the answer. First, you have
set a certain standard for the performance and then consider the symptoms according to that baseline. This baseline
will help you to determine the bottlenecks and low activity period, as well as to compare the effects of the alteration
made. That is why, setting up a good baseline is important. Here are the symptoms you need to consider:
Disk Bottleneck: If the SQL Server is having slower response time or the disk counters are operating close to
maximum values for a longer period of time, it is having bottlenecks. In such cases, users will also get relevant error
messages in SQL application log and hear noises coming from the disks.
Memory Bottleneck: If the application log contains certain messages like out of memory and memory resource
timeout, the memory bottleneck is to blame. Some of the other symptoms of Memory bottleneck are increased
query execution time, decreased active queries, low buffer cache hit ratio, higher I/O usage, slow system, and low
page life expectancy.
CPU Bottlenecks: This type of the bottleneck is easiest to find out. In this case, CPU will be highly utilized by the SQL
Server all the time but have low overall throughput
SQL Profiler
SQL Profiler is the tool with the ability to fetch and log the complete T-SQL activity of SQL Server. The default
template can be used to capture the execution of SQL statement. In SQL 2005, SQL Profiler also allows users to add
the blocked process report, usually found in the Errors and Warnings Events.
Failure Audit:
An audited security access attempt that fails. For example, if a user tries to access a network drive and fails,
the attempt will be logged as a Failure Audit event.
The Event Log service starts automatically when you start Windows. Application and System logs can be viewed by
all users, but Security logs are accessible only to administrators. Using the event logs in Event Viewer, you can gather
information about hardware, software, and system problems and monitor Windows security events.
To access the Event Viewer in Windows 8.1, Windows 10, and Server 2012 R2:
Right click on the Start button and select Control Panel > System & Security and double-click Administrative tools
Double-click Event Viewer
Select the type of logs that you wish to review (ex: Application, System)
NOTE: To access the Application Logs once in Event Viewer, go to Windows Logs > Application, for shutdown errors
refer to Application and System logs.
The Application event log is used to log application-specific events, such as Internet Information Server or SQL
Server. You can find the Event Viewer tool under the Start menu in Windows, under the Administrative Tools option.
When a database managed service provider connects their monitoring tools to a business’s database to begin their
proactive monitoring, a throughput baseline is established. The baseline is created for all other operations to be
measured against which is highly important when updating, and for any maintenance events. Proactive SQL Server
monitoring consistently checks whether databases are online at regular intervals during core business hours and
non-core business hours. These checks are done through automation that will alert you of an outage. Poor
performance can come from bad database design, non-existing indexes, unmanaged database statistics, and other
various factors. With proactive monitoring, the technical team is actively troubleshooting the database performance
to prevent issues that can derive from poor performance. Additionally, changes happening within the databases are
also being tracked in the case that database function, views, or tables are dropped because of a modification.
The Importance and Goals of SQL Server Database Monitoring and Management
There are numerous benefits of having a proactive method of database monitoring. With active monitoring and
management, there is an ability to solve and prevent urgent problems. If the method of SQL Server monitoring is
reactive, when an issue occurs, there is a need to replicate the issue which can be difficult and costly. Another
benefit of proactive monitoring is that it provides insight into bottlenecks, “noisy neighbors”, and peak utilization
which can result in a few issues occurring. With fewer issues, there is a reduction in the risk of downtime, and it is no
secret that downtime can be costly to a business due to services not being available to the customers, and loss of
productivity from the team since the team is now focusing on bringing the databases back to optimal availability.
To provide a high-level view, here are the greatest benefits that come from proactive monitoring:
Reduces the need for troubleshooting when a database begins to have performance issues, as proactive monitoring
allows DBAs to correct problems early
DBAs can work on bigger priority business functions as there is no need for the team to be putting out fires when
they occur as the systems become predictable
Database managed service provider gets to know your systems and provides consulting on where improvements
can be made such as consolidations, upgrades, etc.
Health checks are performed so that an awareness of the SQL Server is maintained, and the most critical issues can
be addressed first
Alerting for any crucial issues such as consumption changes; these issues become easy to diagnose and correct to
avoid extended downtime or a slow Server
Essentially, with proactive monitoring from a database managed service provider, a business can free the team from
the task of monitoring and managing which then will allow them to focus on supporting business growth and other
important internal tasks, rather than solving database issues that will arise with only reactive monitoring in place.
Locate the event log for backup failure in the Event Viewer
After you access the Event Viewer, use the following steps to review the error logs to establish the cause for the
backup failure:
In the Event Viewer, navigate to Windows Logs -> Application.
On the right-hand side in the Actions menu, navigate to Find.
Type in the name of the database for which the failure occurred, and click Find Next.
Every time you click Find Next, the previous event log for the database displays. Continue clicking Next until you find
the error log, labeled as Error, containing the backup failure. If the error log itself does not include the cause for the
failure, look for logs shortly before or after the error. Refer to the following section, Common reasons for backup
failures, to learn more about the different errors.
If you encounter a log for a successful backup before getting to the failure, you know that a subsequent backup
attempt succeeded. If you want to investigate the root cause for the failure, you can continue until you find the
backup error log and determine the issue.
SQL Server Architecture is a very deep subject. Covering it in a single post is an almost impossible task. However, this
subject is very popular topic among beginners and advanced users. I have requested my friend Anil Kumar, who is
expert in SQL Domain to help me write a simple post about Beginning SQL Server Architecture. As stated earlier, this
subject is very deep subject and in this first article series he has covered basic terminologies. In future article he will
explore the subject further down.
1) Relational Engine: Also called as the query processor, Relational Engine includes the components of SQL Server
that determine what your query exactly needs to do and the best way to do it. It manages the execution of queries
as it requests data from the storage engine and processes the results returned.
Different Tasks of Relational Engine:
Query Processing
Memory Management
Thread and Task Management
Buffer Management
Distributed Query Processing
2) Storage Engine: Storage Engine is responsible for storage and retrieval of the data on to the storage system (Disk,
SAN etc.). to understand more, let’s focus on the concepts.
When we talk about any database in SQL Server, there are 2 types of files that are created at the disk level – Data file
and Log file. Data file physically stores the data in data pages. Log files that are also known as write ahead logs, are
used for storing transactions performed on the database.
Data File: Data File stores data in the form of Data Page (8KB) and these data pages are logically organized in
extents.
Extents: Extents are logical units in the database. They are a combination of 8 data pages i.e. 64 KB forms an extent.
Extents can be of two types, Mixed and Uniform. Mixed extents hold different types of pages like index, system,
data etc (multiple objects). On the other hand, Uniform extents are dedicated to only one type (object).
Pages: As we should know what type of data pages can be stored in SQL Server, below mentioned are some of them:
Data Page: It holds the data entered by the user but not the data which is of type
text, ntext, nvarchar(max), varchar(max), varbinary(max), image and xml data.
Text/Image: It stores LOB ( Large Object data) like text, ntext, varchar(max), nvarchar(max), varbinary(max),
image and xml data.
GAM & SGAM (Global Allocation Map & Shared Global Allocation Map): They are used for saving information related
to the allocation of extents.
PFS (Page Free Space): Information related to page allocation and unused space available on pages.
IAM (Index Allocation Map): Information pertaining to extents that are used by a table or index per allocation unit.
BCM (Bulk Changed Map): Keeps information about the extents changed in a Bulk Operation.
DCM (Differential Change Map): This is the information of extents that have modified since the last BACKUP
DATABASE statement as per allocation unit.
Log File: It also known as write ahead log. It stores modification to the database (DML and DDL).
Sufficient information is logged to be able to:
Roll back transactions if requested
Recover the database in case of failure
Write Ahead Logging is used to create log entries
Transaction logs are written in chronological order in a circular way
Truncation policy for logs is based on the recovery model
SQL OS: This lies between the host machine (Windows OS) and SQL Server. All the activities performed on database
engine are taken care of by SQL OS. It is a highly configurable operating system with powerful API (application
programming interface), enabling automatic locality and advanced parallelism. SQL OS provides various operating
system services, such as memory management deals with buffer pool, log buffer and deadlock detection using the
blocking and locking structure. Other services include exception handling, hosting for external components like
Common Language Runtime, CLR etc.
What is blocking and what’s the best way to deal with it?
To deal with reads and writes from multiple inputs in a database, either a read lock or an exclusive lock has to be
applied to a piece of data. To prevent data changes while a person is reading it, a read lock is applied on that data
and, if data needs to be edited, an exclusive lock is used to prevent people from reading the data while it's being
changed.
These two important features of a database result in the intentional blocking of data. Blocking is important for
administrators to understand because it’s a function they may be called upon to use or troubleshoot. Answering this
question also gives you the chance to explain a database issue in simple, clear terms that even people without the
same level of technical expertise can understand.
Example: “Blocking occurs when one connection is holding a lock on a resource while a second connection wants to
read or write to it. An even more complex situation occurs when two connections each hold a lock but want to
access each other’s resource, which is referred to as deadlocking.
Two effective ways of locating and dealing with block issues are:
*The SP_WHO2: This stored procedure will provide you with a list of queries, and any process that is locked will
display the SPID of the process that is causing the block.*
*The DBCC set of commands, and more specifically DBCCINPUTBUFFER, can also be used to trace the “bad query”
that is blocking other processes.*
*SQL Server Management Activity Monitor: This tool can also be used to view processes to find out which ones are
in a blocked state.”*
Also, when we are using the HAVING clause, GROUP BY should come first, and HAVING should come next.
select e_gender, avg(e_age) from employee group by e_gender having avg(e_age)>30
Syntax:
STUFF(String1, Position, Length, String2)
Here, String1 is the one that would be overwritten. Position indicates the starting location for overwriting the string.
Length is the length of the substitute string, and String2 is the string that would overwrite String1.
Example:
select stuff(‘SQL Tutorial’,1,3,’Python’)
This will change ‘SQL Tutorial’ to ‘Python Tutorial’
Output:
Python Tutorial
What are Views? Give an example.
Views are virtual tables used to limit the tables that we want to display, and these are nothing but the result of a SQL
statement that has a name associated with it. Since views are not physically present, they take less space to store.
Let’s consider an example. In the below employee table, say, we want to perform multiple operations on the records
with gender ‘Female’. We can create a view-only table for the female employees from the entire employee table.
Now, let’s implement it on SQL Server.
Below is our employee table:
select * from employee
What does it mean to normalize a database and why would you do it?
Normalization is the process of organizing data in the most optimized way. This includes creating tables and
establishing relationships between those tables to eliminate redundancy and inconsistant data.
There are rules for database normalization. Each rule is called a “normal form”. If the first rule is observed, the
database is said to be in “first normal form”. If the first three rules are observed, the database is considered to be in
“third normal form”. Although additional levels of normalization are possible, third normal form is considered the
highest level necessary for most applications. If a database adheres to the first rule and the third rule, but is not in
accordance with the second rule, it is still considered in the first normal form.
Benefits of Normalization:
Minimizes amount of space required to store the data by eliminating redundant data.
Minimizes the risk of data inconsistencies within a database.
Minimizes the introduction of possible update and delete anomalies.
Maximizes the stability of the data structure.
What is denormalization?
Denormalization is the opposite of normalization process. It is deliberately introducing redundancy into a table
design. In some situation, redundancy helps query performance as the number of joins can be reduced, and some
calculated values can be kept in a column reducing the overhead of doing calculations with every query. An example
of denormalization is a database that stores the quartely sales figures for each item as a separate piece of data. This
approach is often used in data warehouse or in on-line analytical processing applications.
You are testing the performance of a query. The first time you run the query, the performance is slow. The second
time you run the query, the performance is fast. Why is this?
The second query is faster bacause SQL Server caches the data, and keeps the query plans in memory.
What you can do to remove data from the cache and query plans from memory for testing the performance of a
query repeatedly?
If you want to run your query and test its performance under the same circumstances each time, use command
DBCC DROPCLEANBUFFERS after each run to remove the data from the memory and DBCC FREEPROCCACHE to
remove the query plans from the memory.
How can you fix a poorly performing query?
Some general issues that would cause a query to perform poorly that you could talk about would be: no indexes,
table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, having
procedures and triggers without the SET NOCOUNT ON directive, unnecessarily complicated joins, excessive
normalization of the database, or inappropriate or unnecessary usage of cursors and temporary tables.
Some of the tools and techniques that help you troubleshoot performance problems are: SET SHOWPLAN_ALL ON,
SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, and
the graphical execution plan in Query Analyzer.
Indexes are updated automatically. Is the full-text index also updated automatically?
No. Full-text indexes are not updated automatically.
You have a table with close to 100 million records. Recently, a huge amount of this data was updated. Now,
various queries against this table have slowed down considerably. What is the quickest option to remedy the
situation?
Run the UPDATE STATISTICS command.
UPDATE STATISTICS will update the statistics for the table. The sp_updatestats stored procedure will update all the
tables in the database, which is not required in the situation. There is no need to recreate the index, as the updating
statistics should be first step to see the query reached to the optimized level of speed.
Once the statistics have been updated, the query analyzer will be able to make better decisions about which index to
use.
If Auto update Statitics is on, then you don’t have to do any of this.
You have developed an application which uses many stored procedures and triggers to update various tables.
Users ocassionally get locking problems. Which tool is best suited to help you diagnose the problem?
SQL Server Profiler. Use SQL Server Profiler with a trace to observe Lock conflicts.
For performance, for application stability, for avoiding exhaustion of database connection pools, and more. For
example, in order to establish a single database connection from a C# program to a SQL Server instance, there is a
series of steps that takes place in the background. These steps can include: establishing the actual network
connection to a physical channel (i.e. a TCP/IP socket/port), the initial handshake between the source and the
destination, the parsing of connection string information, database Server authentication, authorization checks, and
much more.
Now, imagine having an application opening and closing database connections all the time (or even worse; not
closing the database connections), without reusing at least some of the already cached database connection pools.
In some cases, this could be a situation that would lead to an exhaustion of database connection pools, and
consequently would possibly lead to an application crash (if not proper exception handling is in place).
The connection pooler removes connections from the pool after they have been idle for a few minutes, or in the case
where they are no longer connected to SQL Server. Connection pooling is very useful. Even though it works
automatically in the background in .NET, it is also indirectly depended on the quality of data access code we write.
Always try to reuse connection strings as much as possible, as well as close each database connection right after the
database tasks are completed. By applying this practice, you will help connection pooler to handle your database
connection needs faster and more efficiently, by reusing connections from the pool.
Why can there be only one Clustered Index and not more than one?
There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When
a table has a clustered index, the table is called a clustered table.
Clustered index defines the way in which data is ordered physically on the disk. And there can only be one way in
which you can order the data physically. Hence there can only be one clustered index per table.
After that, disable all the other SQL Server Services as delineated in the image below using SQL Server Configuration
Manager.
Check the audit table again to verify if there is a single entry for single login.
Logon Trigger on All Servers
I was recently working on security auditing for one of my clients. In this project, there was a requirement that all
successful logins in the Servers should be recorded. The solution for this requirement is a breeze! Just create logon
triggers. I created logon trigger on Server to catch all successful windows authentication as well SQL authenticated
solutions. When I was done with this project, I made an interesting observation of executing a logon trigger multiple
times. It was absolutely unexpected for me! As I was logging only once, naturally, I was expecting the entry only
once. However, it did it multiple times on different threads – indeed an eccentric phenomenon at first sight!
Let us first pore over our example.
Create database, table and logon trigger
1 /* Create Audit Database */
2 CREATE DATABASE AuditDb
3 GO
4 USE AuditDb
5 GO
6 /* Create Audit Table */
7 CREATE TABLE ServerLogonHistory
8 (SystemUser VARCHAR(512),
9 DBUser VARCHAR(512),
10 SPID INT,
11 LogonTime DATETIME)
12 GO
13 /* Create Logon Trigger */
14 CREATE TRIGGER Tr_ServerLogon
15 ON ALL SERVER FOR LOGON
16 AS
17 BEGIN
18 INSERT INTO AuditDb.dbo.ServerLogonHistory
19 SELECT SYSTEM_USER,USER,@@SPID,GETDATE()
20 END
21 GO
Login using SQL Authentication and check audit table
The above example clearly demonstrates that there are multiple entries in the Audit table. Also, on close observation
it is evident that there are multiple process IDs as well. Based on these two observations I can come to one
conclusion. Similar actions like login to the Server took place multiple times.
USE AdventureWorks
GO
----Diable Index
ALTER INDEX [IX_StoreContact_ContactTypeID] ON Sales.StoreContact DISABLE
GO
While enabling the same index, I have seen developers using the following INCORRECT syntax, which results in error.
USE AdventureWorks
GO
----INCORRECT Syntax Index
ALTER INDEX [IX_StoreContact_ContactTypeID] ON Sales.StoreContact ENABLE
GO
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near ‘ENABLE’.
This is because once the index is disabled, it cannot be enabled, but it must be rebuilt. The following syntax will
enable and rebuild the index with optimal performance.
USE AdventureWorks
GO
----Enable Index
ALTER INDEX [IX_StoreContact_ContactTypeID] ON Sales.StoreContact REBUILD
GO
I hope that now you have understood why enabling this syntax in the index throws an error and also how to enable
an index with optimal performance.
You can also disable an index form Index Properties dialog box
Click Options under Select a page, column and uncheck Use index option as shown below
How to Verify Whether IntelliSense Feature is enabled in SQL Server Management Studio (SSMS )
One can verify whether IntelliSense feature is enabled in SSMS click on Query menu and see whether “IntelliSense
Enabled” as highlighted in the snippet below.
Make sure that IntelliSense is enabled for the current query window by checking the Query > IntelliSense Enabled
menu option (it should be enabled):
Since IntelliSense does not work in SQLCMD mode, ensure that you haven't enabled this by checking the Query >
SQLCMD Mode menu option (it should NOT be enabled):
Explain Query Editor Regions
SSMSBoost adds the possibility to use common regions syntax in SQL Editor:
--#region [Name]
--#endregion
Regions will be recognized and processed by our add-in and expand/collapse symbols will be placed near a region
head.
Regions functionality is available at SSMSBoost->Query->Regions:
This feature exists in many programming languages already, but now it is newly introduced in SSMS 2008. The
reason I am highlighting this feature is because there are cases when T-SQL code is longer than hundreds of lines and
after a while it keeps on getting confusing.
The regions are defined by the following hierarchy:
Create region creates unnamed region. If you run it with some part of code selected - it will be wrapped into newly
created region.
Create named region creates a region with a name:
If you run it when some part of code is selected - it will be wrapped into a newly created region.
Reparse/Refresh regions forces re-processing of the current document. All regions will be recreated. This can be
necessary if you apply massive changes to the document. Regions are parsed automatically when a script is opened
in the editor.
Make sure you check SSMSBoost->Settings->Regions for fine-tuning options. For example, you can customize
region Start and End markers.
Object Explorer Enhancements:
Object Explorer Detail initially looks the same as the previous version, but when right clicked on the bar with labels it
reveals what it can do. This feature looks the same as the Vista OS folder option but when looked at how it is
implemented for SQL Server data, it is really amazing. The Object Explorer Detail view can be enabled by either going
to Menu >> View >> Object Explorer Detail or pressing F7.
In Object Explorer Detail the new feature is Object Search. Enter any object name in the object search box and the
searched result will be displayed in the same window as Object Explorer Detail.
Additionally, there are new wizards which help you perform several tasks, from a policy management to disk
monitoring. One cool thing is that everything displayed in the object explorer details screen can be right away copied
and pasted into Excel without any formatting issues.
MultiServer Query:
Usually DBA don’t manage only one database; they have many Servers to manage. There are cases when DBA has to
check the status of all the Servers. I have seen one of the DBA who used to manage 400 Servers, writing query using
XML_CMDSHELL where he wanted to find out what the status of full back up on all the Servers was. In one of the
recent consultancy job, when I had to find out if all the three Servers were upgraded with Services Packs (SP), I ran a
query to find version information on all the three instances separately in three windows.
SSMS 2008 has a feature to run a query on different Servers from one query editor window. First of all make sure
that you registered all the Servers under your registered Server. Once they are registered Right Click on Server group
name and click New Query as shown in the image below.
Now in the opened query window run the following query (you can find it in the sample code for this article):
1 SELECT
2 SERVERPROPERTY('Edition') AS Edition,
3 SERVERPROPERTY('ProductLevel') AS ProductLevel,
4 SERVERPROPERTY('ProductVersion') AS ProductVersion
The query above will give the result shown in the image below. Note that we have only three columns in the SELECT
but our output contains four columns. The very first column is the “Server Name” and it is added by SQL Server to
identify rows belonging to a specific Server.
If all of the above Servers are registered with “central Server” – the option which is right below it, other
administrators can also register to those entire Servers by simple registering one central Server.
DIMENSION MODEL
A dimension model is a database structure technique. The dimensional model contains two entities: Facts and
Dimensions.
The Dimensional model is to optimize the database for the fast retrieval of data.
The center of the star can have one fact table and several associated dimension tables.
FACT TABLE
The fact table is central in a star or snowflake schema.
The primary key in the fact table is mapped as foreign keys to dimensions.
It contains fewer attributes and more records.
The fact table comes after the dimension table.
It has a numeric and text data format.
It is utilized for analysis and reporting.
TYPES OF FACT:-
A Snapshot Fact Table stores some kind of measurements and is captured against a specific time.
Cumulative Fact Table describes what has happened over a while.
The Transaction Fact Table represents an event that occurred at an instantaneous point in time.
DIMENSIONAL TABLE
The dimensional table is located at the edge of a star or snowflake schema.
Dimension tables are used to describe dimensions; they contain dimension keys, values, and attributes.
When we create a dimension, we logically define a structure for our projects.
The foreign key is mapped to the facts table.
The dimensional table is in text data format.
TYPES OF DIMENSIONS:-
Junk Dimensions used to implement the rapidly changing dimension where we can store the attribute that
changes rapidly.
A Conformed Dimension is shared across multiple data marts.
A Degenerated Dimension derived from the fact table and does not have its dimension table.
Role-playing Dimensions are often used for multiple purposes within the same database.
Based on the frequency of data change below represent the types of Dimension tables:-
Static Dimension values are static and will not change.
Slowly Changing Dimensions attribute values changes slowly based on the frequency of data change and
historic preservation.
Rapidly Changing Dimensions attribute values change rapidly.
QUERY PLAN
PARSE:
When we execute the SQL statement firstly SQL engine checks the syntax like compilation error etc.
After that, It generates the Query Processor Tree. And this query processor fetches to the optimizer.
OPTIMIZE:
The optimizer takes the data and analyzes the data statistics. It analyzes how many rows on the table? Is
there unique key on the table? etc.
Thereafter depending on data statistics and query processing tree, it generates estimate plan.
Execute:
Finally, this estimated plan fetches the SQL engine for execution. After execution, the actual plan is
prepared.
LOGICAL AND PHYSICAL OPERATORS
The logical operator has a conceptual plan whereas the Physical operator has actual logic.
When data read on actual database then it called physical read. When data read on cache then it called
logical read.
Logical read = page access in memory. Physical read = page access from disk.
The SQL Server allocates some kind of memory for SQL cache. When we fire SQL statements, again and
again, those records in SQL cache.
Solution:
The SQL Server Agent system tables can be updated without setting any sp_configure parameters, or an equivalent
command, as is the case with the Master database. So building a script to meet your needs should be simple and
straightforward. Before we dive into some scripts to address your need, another option to consider is just stopping
or starting the SQL Server Agent service. Since the SQL Server Agent service is responsible for permitting the Jobs to
run in the first place, then stopping and starting this service would also prevent and then enable the Jobs to run. The
one short coming to that approach may be that you might not always have rights to manage the service or that only
specific types of jobs need to be enabled or disabled. As those conditions arise, using the scripts below may be the
best approach.
Disable All SQL Server Agent Jobs
USE MSDB;
GO
UPDATE MSDB.dbo.sysjobs
SET Enabled = 0
WHERE Enabled = 1;
GO
Next Steps
Depending on your needs and your rights, keep in mind that SQL Server Agent can be stopped to prevent Jobs from
running. When you need them to run based on the schedule simply enable SQL Server Agent and the Jobs should
fire as expected.
SQL Server provide us different system tables those provide us information related to SQL Server objects. For jobs
and schedules we have tables available in MSDB system database.
People often make this mistake, they update the enabled column to 0 in sysjobs table. By doing that you will see
graphically the jobs are disabled but job/s will keep running if schedule is enabled. If you want to go by this route,
you have to disable the schedule for job in sysschedules tables as well.
By using the Sp_update_job stored procedure is correct way to enable/disable job. If you use this stored procedure,
you don't have to worry about disabling schedule for the job. Below script can be used to generate disable/enable
script for all the jobs. You simple have to change @enabled=0 to disable and @enabled=1 in below script.