Always ON
Always ON
Q. Does FILESTEAM, Change Data Capture and Database Snapshot supported are supported by
Availability Group?
Yes, all these features are supported by AlwaysOn Availability Group.
Asynchronous-commit mode: Primary replica commits the transaction on a database without waiting for
the conformation from the secondary replica.
Synchronous-commit mode: Primary replica does not commit the transaction on a database until it gets
the confirmation (written the transaction log records to disk on secondary) from secondary replica.
Synchronous-commit mode:
An availability replica that uses this availability mode is known as a synchronous-commit replica. Under
synchronous-commit mode, before committing transactions, a synchronous-commit primary replica waits
for a synchronous-commit secondary replica to acknowledge that it has finished hardening the log.
Synchronous-commit mode ensures that once a given secondary database is synchronized with the
primary database, committed transactions are fully protected. This protection comes at the cost of
increased transaction latency.
Offloads your secondary read-only workloads from your primary replica, which conserves its resources
for your mission critical workloads. If you have mission critical read-workload or the workload that
cannot tolerate latency, you should run it on the primary.
Improves your return on investment for the systems that host readable secondary replicas.
In addition, readable secondaries provide robust support for read-only operations, as follows:
Temporary statistics on readable secondary database optimize read-only queries. For more
information, see Statistics for Read-Only Access Databases, later in this topic.
Read-only workloads use row versioning to remove blocking contention on the secondary databases.
All queries that run against the secondary databases are automatically mapped to snapshot isolation
transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints
are ignored. This eliminates reader/writer contention.
Q. How many synchronous secondary replicas can I have?
We can have up to 2 synchronous replicas, but we are not required to use any. We could run all
Secondaries in asynchronous mode if desired
Q. Can we configure Automatic failover of Availability Groups with SQL Server Failover cluster instances?
SQL Server Failover Cluster Instances (FCIs) do not support automatic failover by availability groups, so any
availability replica that is hosted by an FCI can only be configured for manual failover.
Q. Suppose primary database became in suspect mode. Will AG have failover to secondary replica?
Issues at the database level, such as a database becoming suspect due to the loss of a data file, deletion of
a database, or corruption of a transaction log, do not cause an availability group to failover.
Q. Does AG support automatic page repair for protection against any page corruption happens?
Yes, It automatically takes care of the automatic page repair.
Q. How many types of Data synchronization preference options are available in Always ON?
There are three options- Full, Join only, or Skip initial data synchronization.
Q. Can I redirect the read-only connections to the secondary replica instead of Primary replica?
Yes, we can specify the read_only intent in the connection string and add only secondaries (not the
primary) to the read_only_routing list. If you want to disallow direct connections to the primary from
read_only connections, then set its allow_connections to read_write.
Q. If a DBA expands a data file manually on the primary, will SQL Server automatically grow the same file
on secondaries?
Yes! It will be automatically expanded on the Secondary replica.
Q. What’s the difference between AGs in SQL 2012 and SQL 2014?
SQL Server 2014’s biggest improvement is that the replica’s databases stay visible when the primary drops
offline – as long as the underlying cluster is still up and running. If I have one primary and four secondary
replicas, and I lose just my primary, the secondaries are still online servicing read-only queries. (Now, you
may have difficulties connecting to them unless you’re using the secondary’s name, but that’s another
story.) Back in SQL 2012, when the primary dropped offline, all of the secondaries’ copies immediately
dropped offline – breaking all read-only reporting queries.
Q: How does licensing work with AlwaysOn Availability Groups in SQL 2012 and 2014?
All replicas have to have Enterprise Edition. If you run queries, backups, or DBCCs on a replica, you have to
license it. For every server licensed with Software Assurance, you get one standby replica for free – but
only as long as it’s truly standby, and you’re not doing queries, backups, or DBCCs on it.
Q: Can I use AlwaysOn Availability Groups with Standard Edition?
Not at this time, but it’s certainly something folks have been asking for since database mirroring has been
deprecated.
Q: If I fail over to an asynchronous replica, and it’s behind, how do I sync up changes after the original
primary comes back online?
When I go through an AG design with a team, we talk about the work required to merge the two databases
together. If it’s complex (like lots of parent/child tables with identity fields, and no update datestamp field
on the tables), then management agrees to a certain amount of data loss upon failover. For example, “If
we’re under fifteen minutes of data is involved, we’re just going to walk away from it.” Then we build a
project plan for what it would take to actually recover >15 minutes of data, and management decides
whether they want to build that tool ahead of time, or wait until disaster strikes.
Possible Reasons:
This issue can be caused by a cluster service issue or by the loss of the quorum in the cluster.
Possible Solutions:
Use the Cluster Administrator tool to perform the forced quorum or disaster recovery workflow. Once
WFSC is started you must re-evaluate and reconfigure NodeWeight values to correctly construct a new
quorum before bringing other nodes back online. Otherwise, the cluster may go back offline again.
Reestablishment may require if there are any High Availability features (Alwayson Availability Groups, Log
Shipping, Database Mirroring) using on effected nodes.
Q. How to force a WSFC (Windows Server Failover Cluster) Cluster to start without a quorum?
This can be done using
Possible Reasons:
The availability group is not configured with automatic failover mode. The primary replica becomes
unavailable and the role of all replicas in the availability group become RESOLVING.
The availability group is configured with automatic failover mode and does not complete successfully.
The availability group resource in the cluster becomes offline.
There is an automatic, manual, or forced failover in progress for the availability group.
Possible Solutions:
If the SQL Server instance of the primary replica is down, restart the server and then verify that the
availability group recovers to a healthy state.
If the automatic failover appears to have failed, verify that the databases on the replica are
synchronized with the previously known primary replica, and then failover to the primary replica. If the
databases are not synchronized, select a replica with a minimum loss of data, and then recover to
failover mode.
If the resource in the cluster is offline while the instances of SQL Server appear to be healthy, use
Failover Cluster Manager to check the cluster health or other cluster issues on the server. You can also
use the Failover Cluster Manager to attempt to turn the availability group resource online.
If there is a failover in progress, wait for the failover to complete.
Q. We have got an alert “Availability group is not ready for automatic failover”. Can you explain about
this warning and your action plan?
This alert is raised when the failover mode of the primary replica is automatic; however none of the
secondary replicas in the availability group are failover ready.
Possible Reasons:
The primary replica is configured for automatic failover; however, the secondary replica is not ready for
automatic failover as it might be unavailable or its data synchronization state is currently not
SYNCHRONIZED.
Possible Solutions:
Verify that at least one secondary replica is configured as automatic failover. If there is not a secondary
replica configured as automatic failover, update the configuration of a secondary replica to be the
automatic failover target with synchronous commit.
Use the policy to verify that the data is in a synchronization state and the automatic failover target is
SYNCHRONIZED, and then resolve the issue at the availability replica.
Q. In your environment data inserted on Primary replica but not able to see that on secondary replica.
When you check that Availability is in healthy state and in most cases data reflects in a few minutes but
in this case it’s didn’t happen. Now you need to check for the bottleneck and fix the issue. Can you
explain your views and workaround in this situation?
Possible Reasons:
Long-Running Active Transactions
High Network Latency or Low Network Throughput Causes Log Build-up on the Primary Replica
Another Reporting Workload Blocks the Redo Thread from Running
Redo Thread Falls behind Due to Resource Contention
Possible Workaround:
Use DBCC OPENTRAN and check if there are any oldest transactions running on primary replica and see
if they can be rolled back.
A high DMV (sys.dm_hadr_database_replica_states) value log_send_queue_size can indicate logs being
held back at the primary replica. Dividing this value by log_send_rate can give you a rough estimate on
how soon data can be caught up on the secondary replica.
Check two performance objects SQL Server:Availability Replica > Flow Control Time (ms/sec) and SQL
Server:Availability Replica > Flow control/sec. Multiplying these two values shows you in the last
second how much time was spent waiting for flow control to clear. The longer the flow control wait
time, the lower the send rate.
When the redo thread is blocked, an extended event called sqlserver.lock_redo_blocked is generated.
Additionally, you can query the DMV sys.dm_exec_request on the secondary replica to find out which
session is blocking the REDO thread, and then you can take corrective action. You can let the reporting
workload to finish, at which point the redo thread is unblocked. You can unblock the redo thread
immediately by executing the KILL command on the blocking session ID. The following query returns
the session ID of the reporting workload that is blocking the redo thread.
Transact-SQL
Select session_id, command, blocking_session_id, wait_time, wait_type, wait_resource
from sys.dm_exec_requests
When Redo Thread Falls Behind Due to Resource Contention; a large reporting workload on the
secondary replica has slowed down the performance of the secondary replica, and the redo thread has
fallen behind. You can use the following DMV query to see how far the redo thread has fallen behind,
by measuring the difference between the gap between last_redone_lsn and last_received_lsn.
Transact-SQL
Select recovery_lsn, truncation_lsn, last_hardened_lsn,
from sys.dm_hadr_database_replica_states.
If you see thread is indeed failing behind, do a proper investigation and take the help of resource governor
and can control the CPU cycles
Note: Have a look at MSDN sites and try to understand these solutions because when you say possible
solutions, immediately you might be asked about resolutions.
Q. After an automatic failover or a planned manual failover without data loss on an availability group,
you find that the failover time exceeds your recovery time objective (RTO). Or, when you estimate the
failover time of a synchronous-commit secondary replica (such as an automatic failover partner) using
the method in Monitor Performance for AlwaysOn Availability Groups, you find that it exceeds your
RTO. Can you explain what are the possible reasons which causes the failover time exceeds your RTO?
Reporting Workload Blocks the Redo Thread from Running: On the secondary replica, the read-only
queries acquire schema stability (Sch-S) locks. These Sch-S locks can block the redo thread from
acquiring schema modification (Sch-M) locks to make any DDL changes. A blocked redo thread cannot
apply log records until it is unblocked. Once unblocked, it can continue to catch up to the end of log
and allow the subsequent undo and failover process to proceed.
Redo Thread Falls Behind Due to Resource Contention: When applying log records on the secondary
replica, the redo thread reads the log records from the log disk, and then for each log record it accesses
the data pages to apply the log record. The page access can be I/O bound (accessing the physical disk) if
the page is not already in the buffer pool. If there is I/O bound reporting workload, the reporting
workload competes for I/O resources with the redo thread and can slow down the redo thread.
Q. Let’s say you have configured Automatic failover on SQL server 2012 AlwaysOn environment. An
automatic failover triggered but unsuccessful in making secondary replica as PRIMARY. How do you
identify that failover is not successful and what are the possible reasons that causes an unsuccessful
failover?
If an automatic failover event is not successful, the secondary replica does not successfully transition to the
primary role. Therefore, the availability replica will report that this replica is in Resolving status.
Additionally, the availability databases report that they are in Not Synchronizing status, and applications
cannot access these databases.
Q. Let’s say you added a new file to a database which is a part of AlwaysOn Availability Groups. The add
file operation succeeded on primary replica but failed in secondary replica. What is the impact and how
you troubleshoot?
This might happens due to a different file path between the systems that hosts primary and secondary
replica. Failed add-file operation will cause the secondary database to be suspended. This, in turn, causes
the secondary replica to enter the NOT SYNCHRONIZING state.
Resolution:
Remove the secondary database from the availability group.
On the existing secondary database, restore a full backup of the filegroup that contains the added file
to the secondary database, using WITH NORECOVERY and WITH MOVE (Specify the correct file path as
per secondary).
Back up the transaction log that contains the add-file operation on the primary database, and manually
restore the log backup on the secondary database using WITH NORECOVERY and WITH MOVE. Restore
the last transaction log file with NO RECOVERY.
Rejoin the secondary database to the availability group.
Q. Can you write T-SQL statement for joining a replica to availability group? (AG name “ProAG”
Connect to the server instance that hosts the secondary replica and issue the below statement:
The same operation can be done using SSMS or using Power Shell
Q. Data synchronization state for one of the availability database is not healthy. Can you tell me the
possible reasons?
If this is an asynchronous-commit availability replica, all availability databases should be in the
SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should
be in the SYNCHRONIZED state. This issue can be caused by the following:
Q. Let’s say we have a premium production server and it is in AlwaysOn Availability Group. You oberve
that CPU utilization is hitting top at a specific time in a day. You did an RCA and found that CPU
utilization reaches top and most CPU is from backup process due to backup compression is on. Now
what do you suggest? Do we have any features for backup
Yes! There is an option to perform backup from secondary replicas. We can set this from Availability Group
properties we can find “Backup Preferences” and from that we can choose one of the option from:
Q.Is there any specific limitations if we need to perform auto backups from secondary backups?
Yes! There are few:
Q. Have you ever applied patches / CU / service packs on Alwayson Availability Groups? Did you face any
issues while applying?
Yes! I have applied CU and service packs on SQL Server 2012 SP2 Cumulative Update 4
After CU4 applied we saw that AlwaysOn vailiabilty Gropus are in Non- Synchronizing state.
After RCA we found that there was a huge blocking between user sessions and a unknown session,
CHECKPOINT with command running as “DB_STARTUP”.
Through of the MSDN SITE we found that Microsoft declared it’s a bug and the solution chosen as below:
We have taken backup from Primary replica and restored on secondary replica
When we are trying to add secondary replica to availability group to our surprise sql server got shut down
and we found the error message:
(Error: 3449, Severity: 21, State: 1.
SQL Server must shut down in order to recover a database (database ID 1). The database is either a user
database that could not be shut down or a system database. Restart SQL Server. If the database fails to
recover after another startup, repair or restore. SQL Trace was stopped due to server shutdown. Trace ID =
‘1’. This is an informational message only; no user action is required. )
Cause:
We did RCA and found the below.
Additionally the availability databases reports that they are in non-synchronizing state and not accessible.