September 2000 PAMS Data Analysis Workbook: Data Validation 1
September 2000 PAMS Data Analysis Workbook: Data Validation 1
Introduction The Importance of Data Validation Data Validation Definitions Data Validation Procedures and Results VOC Definitions and PAMS Target Species Available Tools and Methods Example VOC Data Validation Tools
September 2000
Examples of Problems Encountered in Databases (and Validation Actions) VOC Data Validation Tasks Tips and Tricks for VOC QC and Data Analysis VOC Data Validation Examples Data Access Summary References
1
Introduction
This section provides example procedures for validating data collected at PAMS sites including routine air quality measurements (e.g., ozone, NOx), routine meteorological measurements, and VOC measurements. Data validation of upper-air meteorological measurements collected as a part of the PAMS network are discussed in a separate section of the workbook. Several comprehensive documents exist regarding data quality control and quality assurance of PAMS VOC data (U.S. EPA, 1998), routine air quality and meteorological data, and upper-air meteorological data. The intended audience of this section of the workbook is the data analyst who wishes to explore the rich PAMS database. The principal topic of this section is VOC data validation with a focus on Level 1-3 validation (i.e., internal, temporal, and spatial consistency).
September 2000
September 2000
Concentration, ppbC
In this example, data were averaged over the month of July 1995 at a PAMS site for 0700, 1200, and 1800 ST. The 0700 data stand out as significantly different from the other two time periods. One might expect high concentrations in the morning due to traffic and low mixing heights. However, measurements were at very a rural site. Upon further investigation, several high concentration calibration runs remained in the data set at 0700 ST. Once these data were removed, the three time periods are much more similar as expected for the site. Without careful screening of data, the wrong conclusions may be drawn.
All Data
8 6 4 2 0
7 12 18
10
acety ethyl ethan prpyl propa isbta bute1 nbuta t2bte c2bte mlbe3 ispna pnte1 npnta ispre t2pne c2pne m2be2 dmb22 cypne mlpe4 cypna dmb23 mpna2 mpna3 m2pe1 nhexa t2hex c2hex mcpna dmp24 benz cyhxa m2hxa dmp23 m3hxa tmp224 nhept mcyhx tmp234 tolu m2hep m3hep noct ebenz mpxyl styr oxyl nnon ispbz npbz tmb135 tmb124 ndec nundc
Outliers Removed
Concentration, ppbC
7 12 18
8 6 4 2 0
September 2000
acety ethyl ethan prpyl propa isbta bute1 nbuta t2bte c2bte mlbe3 ispna pnte1 npnta ispre t2pne c2pne m2be2 dmb22 cypne mlpe4 cypna dmb23 mpna2 mpna3 m2pe1 nhexa t2hex c2hex mcpna dmp24 benz cyhxa m2hxa dmp23 m3hxa tmp224 nhept mcyhx tmp234 tolu m2hep m3hep noct ebenz mpxyl styr oxyl nnon ispbz npbz tmb135 tmb124 ndec nundc
September 2000
September 2000
September 2000
Example description of codes used to nullify data in AIRS. Knowledge of these codes helps the data analyst understand why data are missing from a database.
9974 9978 9979 9980 9984 9985 9986 9990 9991 9992 9993 9995 9996
U.S.EPA, 1989
September 2000
Definitions of TNMOC, NMHC, and VOC can vary widely because they are operational (i.e., based on the analytical techniques used).
September 2000 PAMS Data Analysis Workbook: Data Validation 10
AIRS No. Abbreviation Compound 43232 nhept n-Heptane 43261 mcyhx Methylcyclohexane 43252 234tmp 2,3,4-Trimethylpentane 45202 tolu Toluene 43960 2mhep 2-Methylheptane 43253 3mhep 3-Methylheptane 43233 noct n-Octane 45203 ebenz Ethylbenzene 45109 m/pxy m/p-Xylene 45220 styr Styrene 45204 oxyl o-Xylene 43235 nnon n-Nonane 45210 ispbz Isopropylbenzene 45209 npbz n-Propylbenzene 45212 metol m-Ethyltoluene 45213 petol p-Ethyltoluene 45207 135tmb 1,3,5-Trimethylbenzene 45211 oetol o-Ethyltoluene 45208 124tmb 1,2,4-Trimethylbenzene 43238 ndec n-Decane 45225 123tmb 1,2,3-Trimethylbenzene 45218 mdeben m-Diethylbenzene 45219 pdeben p-Diethylbenzene 43954 nundc n-Undecane 43502 form Formaldehyde 43551 acet Acetone (optional) 43503 aceta Acetaldehyde 43000 PAMHC Sum of PAMS target compounds 43102 TNMOC Total NMOC Abbreviations from the PAMS manual. U.S. EPA, 1998
September 2000
12
Calculates species group sums including paraffins, olefins, aromatics, unidentified, carbonyls, and PAMS target species Available free at:
ftp://ftp.sonomatech.com/public/vocdat/
13
Example of identification of suspect data values from the Northeast (NESCAUM, 1993). The ozone concentration of 139 ppb reported at Cape Elizabeth on May 26, 1992 at 4:00 a.m. appears erroneous when viewed in a spatial and temporal context.
September 2000 PAMS Data Analysis Workbook: Data Validation 17
Example of identification of suspect data values from the Northeast (NESCAUM, 1993). Two values are anomalously high when inspected both temporally and spatially.
September 2000 PAMS Data Analysis Workbook: Data Validation 18
Example of identification of suspect data values from the Northeast (NESCAUM, 1993). Reported isolated low values were probably the result of misplaced decimal points.
September 2000 PAMS Data Analysis Workbook: Data Validation 19
September 2000
20
September 2000
21
(1 of 3)
Proceed from the big picture to the details. For example, proceed from inspecting total VOC to species groups to individual species. Inspect every species, even to confirm that a species normally absent met that expectation. Know the site topography, prevalent meteorology, and major emissions sources nearby.
(2 of 3)
Total NMOC vs. species groups (i.e., aromatics, paraffins) Total NMOC vs. all individual species Benzene vs. acetylene and toluene (these species typically correlate, with some toluene outliers where toluene is greater than benzene) Benzene vs. cyclohexane (look for split in the scatter plot indicating misidentification) Benzene vs. ethane (low or missing ethane concentrations when benzene is abundant may indicate cold trap problems) Species that elute close together, e.g., 2,3-dimethylbutane, 2-methylpentane, and 3-methylpentane Isomers (e.g., o-, m-, and p-xylene)
September 2000
24
(3 of 3)
September 2000
25
These checks should be used as a starting point for data validation and not as hard and fast rules; there are always exceptions!
Main et al., 1998
September 2000 PAMS Data Analysis Workbook: Data Validation 26
Time series plots of species groups (top) and individual species (bottom) at a PAMS site during early June 1996. Example of possible contamination of either the shelter air or the analytical equipment. (Level 1, AIRS data) Data during this time period were invalidated.
September 2000
27
Odd
okay
Example of an analytical system change between two months that affected the relationship between three isomers. The p- and o-ethyltoluene concentrations were typically high together when m-ethyltoluene concentrations were reported as 0 ppbC (possible misidentification?) during July. In August, this occurrence was not noted (Main et al., 1999). These data were reinvestigated by the reporting agency.
September 2000
28
Example of finding species misidentification in a data set using a time series plot (top) and scatter plot (bottom). In this example, 2-methylheptane and 3-methylheptane peaks were misidentified as toluene beginning on June 19. Data were collected at a PAMS site during June 1995. (Level 0, AIRS) Typical scatter plots may show well-defined edges but will have data values filling in the area between the edges. These data were corrected by the reporting agency.
September 2000 PAMS Data Analysis Workbook: Data Validation 29
Calibration gas
Examples of typical (top) and calibration (bottom) fingerprints. Hydrocarbon species are listed in order of elution from the gas chromatograph and in these plots are represented by numbers. Typical fingerprints show low concentration of many of the hydrocarbons and higher concentrations of others. The calibration gases typically contain roughly the same concentration of each hydrocarbon (e.g., about 35 ppbC) with a few species missing from the mixture. (Level 1, AIRS data) Calibration data need to be identified as such and not used in any analyses of the ambient data.
September 2000 PAMS Data Analysis Workbook: Data Validation 30
Example of possible calibration carryover in data collected at a PAMS site during July 1995. Note relatively high concentrations of n-undecane, for example, occurring after an hour with missing data and the tailing off of concentrations over the next few hours. (Level 0, AIRS) Typically, only a few species are affected by carryover, and these species should be invalidated in the affected samples.
September 2000
31
Data Access (1 of 2)
Official data sources:
AIRS Data via public web at http://www.epa.gov/airsdata AIRS Air Quality System (AQS) via registered users register with EPA/NCC (703-487-4630)
September 2000
32
Data Access (2 of 2)
Secondary data sources:
Meteorological parameters from National Weather Service (NWS) http://www.nws.noaa.gov Meteorological parameters from PAMS/AIRS AQS register
with EPA/NCC (703-487-4630)
Collocated or nearby SO2, nitrogen oxides, CO, VOC from AIRS AQS Private meteorological agencies (e.g., forestry service, agricultural monitoring, industrial facilities)
September 2000
33
Summary
Data validation is vital because serious errors in data analysis and modeling results can be caused by erroneous individual data values. Once initial data validation steps have been taken, data validation continues throughout the data interpretation process. Overall data validation guidelines include: Proceed from the big picture to the details. Inspect every species, even to confirm that a species normally absent met that expectation. Know the site topography, prevalent meteorology, and major emissions sources nearby. This workbook section provides a discussion of data validation levels, example validation checks, available data validation tools, and suggested steps to take in the data validation process.
September 2000
34
References (1 of 2)
LADCO (1995) Lake Michigan Ozone Study. 1994 data analysis report, version 1.1. Report prepared by Lake Michigan Air Directors Consortium, Des Plaines, IL, May. Main H.H., Roberts P.T., and Chinkin L.R. (1997) PAMS data analysis workshop: illustrating the use of PAMS data to support ozone control programs. Prepared for U.S. Environmental Protection Agency, Research Triangle Park, NC, presented at California Air Resources Board and EPA Region IX, Sacramento, CA, STI-997100-1719-WD7, May. Main H.H., Roberts P.T., and Prouty J.P. (1998) VOCDat user's guide. Report prepared for the U.S. Environmental Protection Agency, Research Triangle Park, NC by Sonoma Technology, Inc., Petaluma, CA, STI-997160-1763DFR2, July. Main H.H., Roberts P.T., and Hurwitt S.B. (1999) Validation of PAMS VOC data in the Mid-Atlantic region. Report prepared for MARAMA, Baltimore, MD by Sonoma Technology, Inc., Petaluma, CA, STI-998481-1835-FR, February. NESCAUM (1993) 1992 regional ozone concentrations in the northeastern United States. Report prepared by the Ambient Monitoring and Assessment Committee and the Data Management Committee of the Northeast States for Coordinated Air Use Management, Boston, MA. NESCAUM (1995) Preview of the 1994 ozone precursor concentrations in the northeastern U.S. 5/1/94 draft report prepared by the Ambient Monitoring and Assessment Committee of the Northeast States for Coordinated Air Use Management, Boston, MA. PAMSgrams available at http://www.epa.gov/ttn/amtic/pamsgram.html Roberts P.T., Dye T.S., Korc M.E., and Main H.H. (1994) Air quality data analysis for the 1991 Lake Michigan Ozone Study. Final report prepared for Lake Michigan Air Directors Consortium, Des Plaines, IL by Sonoma Technology, Inc., Santa Rosa, CA, STI-92022-1410-FR, September.
September 2000
35
References (2 of 2)
Stoeckenius T.E., Ligocki M.P., Shepard S.B., and Iwamiya R.K. (1994a) Analysis of PAMS data: application to summer 1993 Houston and Baton Rouge data. Draft report prepared by Systems Applications International, San Rafael, CA, SYSAPP94-94/115d, November. Stoeckenius T.E., Ligocki M.P., Cohen B.L., Rosenbaum A.S., and Douglas S.G. (1994b) Recommendations for analysis of PAMS data. Final report prepared by Systems Applications International, San Rafael, CA, SYSAPP94-94/011r1, February. Systems Applications International, Sonoma Technology Inc., Earth Tech, and Alpine Geophysics (1995) Gulf of Mexico Air Quality Study. Vol 1: Summary of data analysis and modeling. Final report prepared for U.S. Department of the Interior, Minerals Management Service, Gulf of Mexico OCS Region, New Orleans, LA, OCS Study, MMS 95-0038. U.S. Environmental Protection Agency (1980) Validation of Air Monitoring Data. EPA-600/4-80-030. U.S. Environmental Protection Agency (1984) Quality assurance handbook for air pollution measurement systems, Volume II: ambient air specific methods (interim edition), EPA/600/R-94/0386, April. U.S. Environmental Protection Agency (1989) AIRS user's guide volume iii: AIRS codes and values. Office of Air Quality Planning & Standards Technical Support Division, Research Triangle Park, NC, June. U.S. Environmental Protection Agency (1994) Photochemical assessment monitoring stations implementation manual. Office of Air and Radiation, Office of Air Quality Planning and Standards, Technical Support Division, Research Triangle Park, NC, EPA/454/B-93-051, March. U.S. Environmental Protection Agency (1998) Technical assistance document for sampling and analysis of ozone precursors. National Exposure Research Laboratory, Research Triangle Park, NC, EPA/600-R-98/161, September.
September 2000
36