SCOM - Understanding and Modifying Data Warehouse Retention and Grooming
SCOM - Understanding and Modifying Data Warehouse Retention and Grooming
25
You will likely find that the default retention in the OpsMgr data warehouse will need to be adjusted for your environment. I often find customers are reluctant to adjust these because they don't know what they want to keep. So they assume the defaults are good and they just keep EVERYTHING. This is a bad idea. A data warehouse will often be one of the largest databases supported by a company. Large databases cost money. They cost money to support. They are more difficult to maintain. They cost more to backup in time, tape capacity, network impact, etc. They take longer to restore in the case of a disaster. The larger they get, the more they cost in hardware (disk space) to support them. The larger they get, can impact how long reports take to complete. For these reasons you should give STRONG consideration to reducing your warehouse retention to your reporting REQUIREMENTS. If you don't have any MAKE SOME! Originally when the product released you had to directly edit SQL tables to adjust this. Then a command line tool was released to adjust these values making the process easier and safer. This post is just going to be a walk through of this process to better understand using this tool and what each dataset actually means. Here is the link to the command line tool: http://blogs.technet.com/momteam/archive/2008/05/14/data-warehouse-data-retention-policy-dwdatarp-exe.aspx
Different data types are kept in the Data Warehouse in unique Datasets. Each dataset represents a different data type (events, alerts, performance, etc..) and the aggregation type (raw, hourly, daily) Not every customer will have exactly the same data sets. This is because some management packs will add their own dataset if that MP has something very unique that it will collect that does not fit into the default buckets that already exist.
So first we need to understand the different datasets available and what they mean. All the datasets for an environment are kept in the Dataset table in the Warehouse database. select * from dataset order by DataSetDefaultName This will show us the available datasets. Common datasets are: Alert data set Client Monitoring data set Event data set Microsoft.Windows.Client.Vista.Dataset.ClientPerf Microsoft.Windows.Client.Vista.Dataset.DiskFailure Microsoft.Windows.Client.Vista.Dataset.Memory Microsoft.Windows.Client.Vista.Dataset.ShellPerf Performance data set State data set Alert, Event, Performance, and State are the most common ones we look at.
However in the warehouse we also keep different aggregations of some of the datasets where it makes sense. The most common datasets that we will aggregate are Performance data, State data, and Client Monitoring data (AEM). The reason we have raw, hourly, and daily aggregations is to be able to keep data for longer periods of time but still have very good performance on running reports. In MOM 2005 we used to stick ALL the raw performance data into a single table in the Warehouse. After a year of data was reached this meant the perf table would grow to a HUGE size and running multiple queries against this table would be impossible to complete with acceptable performance. It also meant grooming this table would take forever, and would be prone to timeouts and failures. In OpsMgr now we aggregate this data into hourly and daily aggregations. These aggregations allow us to summarize the performance, or state data, into MUCH smaller table sizes. This means we can keep data for a MUCH longer period of time than ever before. We also optimized this by splitting these into multiple tables. When a table reaches a pre-determined size, or number of records we will start a new table for inserting. This allows grooming to
be incredibly efficient because now we can simply drop the old tables when all of the data in a table is older than the grooming retention setting.
Ok thats the background on aggregations. To see this information we will need to look at the StandardDatasetAggregation table. select * from StandardDatasetAggregation That table contains all the datasets, and their aggregation settings. To help make more sense of this - I will join the dataset and the StandardDatasetAggregation tables in a single query to only show you what you need to look at: SELECT DataSetDefaultName, AggregationTypeId, MaxDataAgeDays FROM StandardDatasetAggregation sda INNER JOIN dataset ds on ds.datasetid = sda.datasetid ORDER BY DataSetDefaultName This query will give us the common dataset name, the aggregation type, and the current maximum retention setting. For the AggregationTypeId: 0 = Raw 20 = Hourly 30 = Daily Here is my output:
DataSetDefaultName AggregationTypeId MaxDataAgeDays Alert data set 0 400 Client Monitoring data set 0 30 Client Monitoring data set 30 400 Event data set 0 100 Microsoft.Windows.Client.Vista.Dataset.ClientPerf 0 7 Microsoft.Windows.Client.Vista.Dataset.ClientPerf 30 91
Microsoft.Windows.Client.Vista.Dataset.DiskFailure Microsoft.Windows.Client.Vista.Dataset.DiskFailure Microsoft.Windows.Client.Vista.Dataset.Memory Microsoft.Windows.Client.Vista.Dataset.Memory Microsoft.Windows.Client.Vista.Dataset.ShellPerf Microsoft.Windows.Client.Vista.Dataset.ShellPerf Performance data set Performance data set Performance data set State data set State data set State data set
0 30 0 30 0 30 0 20 30 0 20 30
You will probably notice that we only keep 10 days of RAW Performance by default. Generally you don't want to mess with this. This is simply to keep a short amount of raw data to build our hourly and daily aggregations from. All built in performance reports in SCOM run from Hourly, or Daily aggregations by default.
Now we are cooking! Fortunately there is a command line tool published that will help make changes to these retention periods, and provide more information about how much data we have currently. This tool is called DWDATARP.EXE. It is available for download HERE. This gives us a nice way to view the current settings. Download this to your tools machine, your RMS, or directly on your warehouse machine. Run it from a command line. Run just the tool with no parameters to get help: C:\>dwdatarp.exe To get our current settings run the tool with ONLY the s (server\instance) and d (database) parameters. This will output the current settings. However it does not format well to the screen so output it to a TXT file and open it: C:\>dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW > c:\dwoutput.txt Here is my output (I removed some of the vista/client garbage for brevity)
Dataset name Alert data set Client Monitoring data set Client Monitoring data set Configuration dataset Event data set Performance data set Performance data set Performance data set State data set State data set State data set
Aggregation name Max Age Current Size, Kb Raw data 400 18,560 ( 1%) Raw data Daily aggregations Raw data Raw data Raw data Hourly aggregations Daily aggregations Raw data Hourly aggregations Daily aggregations 30 400 400 100 10 400 400 180 400 400 0 ( 0%) 16 ( 0%) 153,016 ( 4%) 1,348,168 ( 37%) 467,552 ( 13%) 1,265,160 ( 35%) 61,176 ( 2%) 13,024 ( 0%) 305,120 ( 8%) 20,112 ( 1%)
Right off the bat I can see how little data that daily performance actually consumes. I can see how much data that only 10 days of RAW perf data consume. I also see a surprising amount of event data consuming space in the database. Typically you will see that perf hourly will consume the most space in a warehouse.
I can know what is using up most of the space in my warehouse. I can know the Dataset name, and Aggregation name to input to the command line tool to adjust it!
First thing I will need to gather my Reporting service level agreement from management. This is my requirement for how long I need to keep data for reports. I also need to know what kind of reports they want to be able to run for this period. From this discussion with management we determined:
We require detailed performance reports for 90 days (hourly aggregations) We require less detailed performance reports (daily aggregations) for 1 year for trending and capacity planning. We want to keep a record of all ALERTS for 6 months. We don't use any event reports, so we can reduce this retention from 100 days to 30 days. We don't use AEM (Client Monitoring Dataset) so we will leave this unchanged. We don't report on state changes much (if any) so we will set all of these to 90 days.
Now I will use the DWDATARP.EXE tool to adjust these values based on my company reporting SLA: dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Hourly aggregations" m 90 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Daily aggregations" -m 365 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Alert data set" -a "Raw data" -m 180 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Event data set" -a "Raw Data" -m 30 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Raw data" -m 90 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Hourly aggregations" -m 90 dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Daily aggregations" -m 90
Now my table reflects my reporting SLA and my actual space needed in the warehouse will be much reduced in the long term:
Dataset name Aggregation name Alert data set Raw data Client Monitoring data set Raw data Client Monitoring data set Daily aggregations Configuration dataset Raw data Event data set Raw data Performance data set Raw data Performance data set Hourly aggregations Performance data set Daily aggregations State data set Raw data State data set Hourly aggregations State data set Daily aggregations
Current Size, Kb 18,560 ( 1%) 0 ( 0%) 16 ( 0%) 152,944 ( 4%) 1,348,552 ( 37%) 468,960 ( 13%) 1,265,992 ( 35%) 61,176 ( 2%) 13,024 ( 0%) 305,120 ( 8%) 20,112 ( 1%)
Here are some general rules of thumb (might be different if your environment is unique)
Only keep the maximum retention of data in the warehouse per your reporting requirements. Do not modify the performance RAW dataset. Most performance reports are run against Perf Hourly data for detail performance throughout the day. For reports that span long periods of time (weeks/months) you should generally use Daily aggregation. Daily aggregations should generally be kept for the same retention as hourly or longer. Hourly datasets use up much more space than daily aggregations. Most people don't use events in reports and these can often be groomed much sooner than the default of 100 days. Most people don't do a lot of state reporting beyond 30 days, and these can be groomed much sooner as well if desired. Don't modify a setting if you don't use it. There is no need. The Configuration dataset generally should not be modified. This keeps data about objects to report on, in the warehouse. It should be set to at LEAST the longest of any perf, alert, event, or state datasets that you use for reporting.