LogShipping Issues
LogShipping Issues
Problem
You can see that your logshipping is broken. In
the SQL Error log, the message below is
displayed :
Error: 14421, Severity: 16, State: 1.
The log shipping secondary database
myDB.logshippingPrimary has restore threshold
of 45 minutes and is out of sync. No restore was
performed for 6258 minutes.
Description of error message 14420 and error
message 14421 that occur when you use log
shipping in SQL Server
http://support.microsoft.com/default.aspx?scid=329133
Cause
Inside the LSRestore job history, you can find out two kind of
messages :
- Restore job skipping the logs on secondary server
Skipped log backup file. Secondary DB: 'logshippingSecondary',
File:
'\\myDB\logshipping\logshippingPrimary_20090808173803.trn'
- Backup log older is missing
*** Error 4305: The file
'\\myDB\logshipping\logshippingPrimary_20090808174201.trn'
is too recent to apply to the secondary database
'logshippingSecondary'.
**** Error : The log in this backup set begins at LSN
18000000005000001, which is too recent to apply to the
database. An earlier log backup that includes LSN
18000000004900001 can be restored.
You can see that another Backup Log was running out of logshipping process. Now, you have just to restore this backup on the secondary and run the LSRestore Job.
Logshipping monitor incorrectly raises error number 14420 instead of 14421 when the
secondary database is out of sync
Symptoms
Consider the following configuration of a Log Shipping environment:
Server C hosts a monitor server instance where in the Log Shipping Monitor job is configured to use
an impersonated proxy account for the connections to Server A and Server B.
When you use this configuration, Log Shipping Monitor job incorrectly raises error message 14420 instead of
14421 when the secondary database is out of sync. The description of these error messages in SQL Server
2005 and SQL Server 2008 is as follows:
Error: 14420, Severity: 16, State: 1
The log shipping primary database %s.%s has backup threshold of %d minutes and has not performed a
backup log operation for %d minutes. Check agent log and logshipping monitor information.
Error: 14421, Severity: 16, State: 1
The log shipping secondary database %s.%s has restore threshold of %d minutes and is out of sync. No
restore was performed for %d minutes. Restored latency is %d minutes. Check agent log and logshipping
monitor information.
The alert message 14221 indicates that the difference between current time (UTC) and the
last_restored_date_utc value in the log_shipping_monitor_secondary table on the monitor server is greater
than value that is set for the Restore Alert threshold whereas the alert message 14220 indicates that the
difference between current time (UTC) and the last_backup_date_utc value in the
log_shipping_monitor_primary table on the monitor server is greater than value that is set for the Backup
Alert threshold.
Cause
The issue happens because of a problem in the Log Shipping user interface. When creating the monitor job
for the secondary, 14220 is passed instead of 14221.
Resolution
To resolve the problem, correct the @threshold_alert parameter value for the secondary database by
executing the following statement on the monitor server (Server C)
use master
go
sp_change_log_shipping_secondary_database @secondary_database =
@threshold_alert = 14421
'dbname',
SQL SERVER Log Shipping Restore Job Error: The file is too recent to apply to the secondary database
If you are a DBA and handled Log-shipping as high availability solution, there are a number of common errors that come that you would over a period of
time become pro on resolving. Here is one of the common error which you must have seen:
Message
20151013 21:09:05.13*** Error: The file C:\LS_S\LSDemo_20151013153827.trn is too recent to apply to the secondary
database LSDemo.(Microsoft.SqlServer.Management.LogShipping)
***
2015101321:09:05.13*** Error: The log in this backup set begins at LSN 32000000047300001, which is too recent to apply to the
database.
An
earlier
log
backup
that
includes
LSN
32000000047000001
can
be
restored.
RESTORE
LOG
is
terminating
abnormally.(.Net
SqlClient
Data
Provider)
***
Aboveerrorisashowninfailureofthehistoryofrestorejob.Ifthefailureismorethanconfiguredthresholds,thenwewould
start
seen
below
error
in
SQL
ERRORLOG
on
secondary
also:
20151014
06:22:00.240
spid60
Error:
14421,
Severity:
16,
State:
1.
2015101406:22:00.240spid60 ThelogshippingsecondarydatabasePinalServer.LSDemohasrestorethresholdof45minutes
andisoutofsync.Norestorewasperformedfor553minutes.Restoredlatencyis4minutes.Checkagentlogandlogshipping
monitorinformation.
To start troubleshooting, we can look at Job activity monitor on secondary which would fail with the below state:
If you know SQL transaction log backup basics, you might be able to guess the cause. If we look closely to the error, it talks about LSN mismatch. Most of
the cases, a manual transaction log backup was taken. I remember few scenarios where a 3 rd party tool would have taken transaction log backup of
database which was also part of a log shipping configuration.
Since we know the cause now, what we need to figure out is where is that out of band backup? Here is the query which I have written on my earlier
blog.
Once we run the query, we would get list of backups happened on the database. This information is picked from MSDB database.
Once we found the problematic backup, we need to restore it manually on secondary database. Make sure that we are using either norecovery or
standby option so that other logs can be restored. Once file is restored, the restore job would be able to pick-up from the same place and would catch up
automatically.
Consider a scenario where you have a Log shipping setup with a STANDY secondary
database and things are working just fine. One fine day you notice that the
secondary database is not in sync with the primary. The seasoned DBA that you are,
you go ahead and looked at the log shipping jobs and identify that the restore is
taking a lot of time.
The obvious question that come to your mind is whether a lot of transactions have
happened recently causing the log backup to be much larger. So you check the
folder and see the .TRN file sizes remain pretty much the same. What next?
I will cover some basic troubleshooting that you can do, to identify why the
restore process is so slow.
To give you a perspective, lets says that earlier a restore of a 4MB Transaction Log
backup used to take less than a minute. Now, it takes about approximately 20-25
minutes.Before I get into troubleshooting, make sure that you have ruled out these
factors:1. The Log backup size (.TRN) is pretty much the same as it was before.
2. The Disk is not a bottleneck on the secondary server.
3. The Copy job is working just fine and there is no delay here. From the job history
you clearly see that Restore is where the time is being spent.
4. The Restore job is not failing and no errors are reported during this time (e.g. Out
of Memory etc.).
Troubleshooting
The 1st thing to do to get more information on what the restore is doing is to enable
these trace flags
DBCC TRACEON (3004, 3605, -1)
2008/01/26(09:32:02), first LSN: 296258:29680:1, last LSN: 298258:40394:1, number of dump devices:
1, device information: (FILE=1, TYPE=DISK:
{'S:\SQL\SQLLogShip\TESTDB\TESTDB_20101229011500.trn'}). This is an informational message. No
user action is required.
2010-12-29 16:24:39.12 spid64
Writing backup history records
2010-12-29 16:24:39.21 spid64
Restore: Done with MSDB maintenance
2010-12-29 16:24:39.21 spid64
RestoreLog: Finished
</Snippet>
From the above output we see that the restore took ~13 minutes. If you look closely
at the output the section highlighted in green is where most of the time is spent.
Now when we talk about log restores, the number of VLFs play a very important
factor. More about the effect of VLFs vs. Restore Time given
here http://blogs.msdn.com/b/psssql/archive/2009/05/21/how-a-log-filestructure-can-affect-database-recovery-time.aspx
Bottom line is that a large number of virtual log files (VLFs) can slow down
transaction log restores. To find out if this is the case here, use the following
command
DBCC LOGINFO (TESTDB) WITH NO_INFOMSGS
The following information can be deciphered from the above output :1. The number of rows returned in the above output is the number of VLFs.
2. The number of VLFs that had to be restored in this log backup can be calculated
from the section highlighted in blue (above).
first LSN: 296258:29680:1, last LSN: 298258:40394:1
3. The Size of each VLF can be calculated based on the FileSize column.
9175040 8.75 MB
9437184 9 MB
10092544 9.62 MB
Problem (s)
So based on the above there are two possibilities,
1. The number of VLFs is rather large which we know will impact restore performance.
2. The size of each VLF is large is cause for concern if STANDY mode is in effect.
The 2nd problem is aggravated if there are batch jobs or long-running transactions
that span multiple backups (e.g. Rebuild Indexes). In this case the work of
repeatedly rolling back the long-running transaction, writing the rollback work to the
standby file (TUF file), then undoing all the rollback work with the next log restore
just to start the process over again can easily cause a log shipping secondary to get
behind.
While we are talking about the TUF file, I know many people out there are not clear on what this is used for. So here goes,
In the standby mode (which we have for secondary database), database recovery is
done when the log is restored and this mode also creates a file with the extension
.TUF (which is the transaction Undo file on the destination server). That is why in
this mode we will be able to access the databases (Read-Only access). So before the
next TLOG backup is applied, the saved changes in the undo file are reapplied to the
database. Since this is in STANDBY mode, for any large transactions, the restore
process also does the work of writing the rollback to the standby file (TUF), so we
might be spending time initializing the whole virtual log.
Solution 1
You need to reduce the number of VLFs. You can do this by running DBCC
SHRINKFILE to reduce the ldf (s) to a small size, thereby reducing the number of
VLFs. Note: You need to do this on the Primary database.
After the shrink is complete verify the VLFs have reduced by running DBCC
LOGINFO again. A good range would be somewhere between 500-1000. Resize the
log file to the desired size using a single growth operation. You can do this by
tweaking the Initial Size setting, also pay attention to the Auto-Growth setting for
the LDF file. Setting it too small a value can lead to too many VLFs.
ALTER DATABASE DBNAME MODIFY FILE (NAME='ldf_logical_name', SIZE=<target>MB)
Also remember that you still have to first apply the pending Log backups before we
get to the one which holds the shrink operation. Once we reach this then you can
measure the restore time to see if the changes above had a positive impact.
Solution 2
For problem 2 where the size of the VLFs is causing havoc with the STANDBY mode,
you will have to truncate the transaction log. This means that log shipping has to be
broken.
You can truncate the TLOG on the source database by setting the recovery model to
SIMPLE (using ALTER DATABASE command). If on SQL 2005 or lower versions you
can use the BACKUP LOG DBNAME with TRUNCATE_ONLY command.
Then make modifications to the Log file Auto-Grow setting to an appropriate value.
Pay attention to the value you set here such that it is not too high, else transactions
will have to wait while the file is being grown or too low that it creates too many
VLFs. Take a full database backup immediately and use this to re-setup the log
shipping.
Tip: You can use the DBCC SQLPERF(LOGSPACE) command to find out what percent of
your log file is used.