Secondary Data Analysis File Note
Secondary Data Analysis File Note
In social science research, you may often hear the terms primary data and secondary data. Primary
data is data that was collected by the researcher, or team of researchers, for the specific purpose or
analysis under consideration. Here, a research team conceives of and develops a research project,
collects data designed to address specific questions, and performs their own analyses of the data
they collected. The people involved in the data analysis therefore are familiar with the research design
and data collection process.
Secondary data analysis, however, is the use of data that was collected by someone else for some
other purpose. In this case, the researcher poses questions that are addressed through the analysis
of a data set that they were not involved in collecting. The data was not collected to answer the
researcher’s specific research questions and was instead collected for another purpose. The same
data set can therefore be a primary data set to one researcher and a secondary data set to a different
researcher.
There are a great deal of secondary data resources and data sets available for sociological research,
many of which are public and easily accessible. Read more about commonly used secondary data
sets.
A second major advantage of using secondary data is the breadth of data available. The federal
government conducts numerous studies on a large, national scale that individual researchers would
have a difficult time collecting. Many of these data sets are also longitudinal, meaning that the same
data has been collected from the same population over several different time periods. This allows
researchers to look at trends and changes of phenomena over time.
A third major advantage of using secondary data is that the data collection process is often guided by
expertise and professionalism that may not be available to individual researchers or small research
projects. For example, data collection for many federal data sets is often performed by staff members
who specialize in certain tasks and have many years of experience in that particular area and with
that particular survey. Many smaller research projects do not have that level of expertise available, as
data is usually collected by students working at a part-time or temporary job.
A related problem is that the variables may have been defined or categorized differently than the
researcher would have chosen. For example, age may have been collected in categories rather than
as a continuous variable, or race may be defined as “White” and “Other” instead of containing every
major race category.
Another major disadvantage to using secondary data is that the researcher/analyst does not know
exactly how the data collection process was done and how well it was done. The researcher is
therefore not usually privy to information about how seriously the data are affected by problems such
as low response rate or respondent misunderstanding of specific survey questions. Sometimes this
information is readily available, as is the case with many federal data sets. However, many other
secondary data sets are not accompanied by this type of information and the analyst must learn to
read between the lines and consider what problems might have been encountered in the data
collection process.