12: Data Management: Practical Primer Using Epidata. The Epidata Documentation Project. Available
12: Data Management: Practical Primer Using Epidata. The Epidata Documentation Project. Available
12: Data Management: Practical Primer Using Epidata. The Epidata Documentation Project. Available
Introduction
Data management includes all aspects of data planning, handling, analysis, documentation and storage, and takes place during all stages of a study. The objective is to create a reliable data base containing high quality data. Data management is a too often neglected part of study design,1 and includes: Planning the data needs of the study Data collection Data entry Data validation and checking Data manipulation Data files backup Data documentation
Each of these processes requires thought and time; each requires painstaking attention to detail. The main element of data management are database files. Database files contain text, numerical, images, and other data in machine readable form. Such files should be viewed as part of a database management systems (DBMs) which allows for a broad range of data functions, including data entry, checking, updating, documentation, and analysis.
Spreadsheet are to be avoided for all but the smallest data systems since they are unreliable and easily corruped (e.g., easy to type over, lose track of records, duplicate data, mis-enter data, and so on. ). Commercially available database programs are expensive, tend to be large and slow, and often lack controlled data-entry facilities. Specialty data entry programs are ideal for data entry and storage. We use EpiData for this purpose because it is fast, reliable, allows for controlled data-entry, and is open-source. Use of EpiData is introduced in the accompanying lab.
Bennett, S., Myatt, M., Jolley, D., & Radalowicz, A. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData. The EpiData Documentation Project. Available: www.epidata.dk/downloads/dmepidata.pdf. This is distinct from measurement errors, which are differences between the true state of affairs and what appears on the data collection form.
Page 12.1 of C:\DATA\StatPrimer\dataentry.wpd
2
Transpositions (e.g., 19 becomes 91 during data entry) Copying errors (e.g., 0 (zero) becomes O during data entry) Coding errors (e.g., a racial group gets improperly coded because of changes in the coding scheme) Routing errors (e.g., the interviewer asks the wrong question or asks questions in the wrong order) Consistency errors (contradictory responses, such as the reporting of a hysterectomy after the respondent has identified himself as a male) Range errors (responses outside of the range of plausible answers, such as a reported age of 290)
To prevent such errors, you must identify the stage at which they occur and correct the problem. Methods to prevent data entry errors include: Manual checks during data collection (e.g., checks for completeness, handwriting legibility) Range and consistency checking during data entry (e.g., preventing impossible results, such as ages greater than 110) Double entry and validation following data entry Data analysis screening for outliers during data analysis
EpiData provides a range and consistency checking program and allows for double entry and validation, as demonstrated in the accompanying lab.