A Review of Pornography Use Research

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

A review of pornography use research: Methodology and results from

four sources
Michael Gmeiner1, Joseph Price2, Michael Worley3
1,2,3
 Brigham Young University, Provo, Utah, United States

Abstract
The widespread electronic transmission of pornography allows for a variety of new data
sources to objectively measure pornography use. Recent studies have begun to use these
data to rank order US states by per capita online pornography use and to identify the
determinants of pornography use at the state level. The aim of this paper is to compare
two previous methodologies for evaluating pornography use by state, as well as to measure
online pornography use using multiple data sources. We find that state-level rankings from
Pornhub.com, Google Trends, and the New Family Structures Survey are significantly
correlated with each other. In contrast, we find that rankings based on data from a single
large paid subscription pornography website has no significant correlation with rankings
based on the other three data sources. Since so much of online pornography is accessed
for free, research based solely on paid subscription data may yield misleading conclusions.
Keywords: Pornography, internet use, data, representative
DOWNLOAD PDF

Introduction
While most researchers would agree that pornography has become more pervasive in recent
decades, the accurate measurement of the level of pornography use in the population remains
an empirical challenge for social scientists. The array of technologies used to access
pornography has changed over time, making it almost impossible to consistently measure the
same metric of pornography use. High-speed internet, which has penetrated markets gradually
over the last fifteen years, enables unprecedented affordability, anonymity, and ease of access
in pornography consumption (Cooper, 1998), contributing to the apparent general rise in
pornography use (Wright, 2011). Hertlein and Stevenson (2010) also note other features
particular to broadband internet pornography in contributing to growth of the industry: closer
approximation to the physical world, acceptability, ambiguity, and accommodation between
one’s “real” and “ought” self.
Past approaches to pornography use measurement have relied heavily on survey data (see
Buzzell, 2005). The electronic nature of online pornography, however, increasingly makes
possible a number of alternative methods for obtaining reliable proxies of pornography use,
including those gathered from subscription or online search data. The ability to use an objective
measure based on subscription or search data is advantageous since survey-based data
generally suffers from a social desirability bias: respondents may underreport activities that
violate social norms (Fisher, 1993). In addition, subscription data does not depend on an
individual’s opinion about what constitutes pornography; a natural limitation of subjective
survey questions about pornography use.
Two recent studies have tapped into innovative sources of data about online pornography use.
Edelman (2009) uses subscription data from a single top-ten provider of paid pornographic
content to create a ranking of which states use the most online pornography and correlates
these with several state-level measures of social or religious attitudes. MacInnis and Hodson
(2014) use Google Trends search term data as a proxy for pornography use and examine the
relationship between state-level pornography use and measures of religiosity and conservatism.
They find that states with more right-leaning ideological attitudes have higher rates of
pornography-related Google searches.
This paper assesses some of the claims made in past studies about the rank order of states and
the relationship between state-level pornography use and various state-level social measures.
We also give a framework that future researchers can use to assess the representativeness of
future state-level or even county-level datasets about pornography use. Edelman (2009) was a
pioneer in accessing the subscription data of a single provider of paid pornographic content and
this use of individual consumer data from private companies will become a useful tool for
gathering data on hard-to-measure behavior. Key for the future use of this type of rich data will
be identifying the degree to which the data from a single firm can provide the same insights as a
nationally representative sample.
In this paper, we expand on the data used in these two recent studies and combine it with two
additional data sources. Since each of the four data sources we use in this paper yields a
measure of the level of pornography use, we estimate the validity of each source by comparing it
against the state-level rankings that we obtain for the other sources.

Data
Our paper draws on four data sources that include information on state-level variation in
pornography use. The first two data sources are nationally representative samples while the last
two are based on paid subscriptions or page views connected to a specific provider of
pornographic content. In each data source our measure of pornography use is based on
circumstances in which individuals seek out pornographic content rather than accidentally
viewing pornography.
Our first dataset is based on a nationally representative sample of 2,988 respondents in the New
Family Structures Survey (NFSS). The data collection was conducted by Knowledge Networks
(KN), a research firm with a record of generating high-quality data. Knowledge Networks
recruited members of its panel randomly by telephone and mail surveys, households are
provided with internet access if needed. This panel has advantages in that it is not limited to
current Internet users or computer owners, and does not accept self-selected volunteers.
The NFSS includes a question about whether the respondent intentionally viewed pornography
in the previous year. This type of question has the advantage of capturing pornography use
across whatever source the individual is using to access. There are other nationally
representative samples such as the General Social Survey that include pornography questions.
We use the data from the NFSS because it can be easily accessed by other scholars and includes
state identifiers in its publically available form. In contrast, state identifiers can only be obtained
in the confidential version of the General Social Survey. For the analysis in this paper, we use the
set of forty-six states from the NFSS survey for which there were at least 50 respondents.
The second data source, Google Trends, functions as a time series index of the volume of
searches entered into Google in a specific geographic area. These data have proven useful in
economic and medical endeavors such as predicting influenza outbreaks (Carneiro & Mylonakis,
2009) and forecasting short term economic indicators such as consumer confidence or
unemployment (Choi & Varian, 2012). Preis, Moat, and Stanley (2013) quantify trading behavior
using Google Trends, showing that certain terms are linked with stock value increasing or
decreasing. The adult entertainment industry can likewise be examined by using Google Trends
search data to the extent that important features of its industry can be measured quantitatively.
The most important challenge in using Google Trends data is selecting the specific terms on
which we draw data. The terms selected must be an actual indicator of pornography use for our
analysis to be useful. Ho and Watters (2004) analyzed structural trends in pornographic
websites. As part of their analysis they create a list of terms which appear frequently on
pornographic websites and which frequently fail to appear on non-pornographic websites. The
top four terms were “porn”, “xxx”, “sex”, and “f***”. Using search statistics we find that searches
for these four terms are highly correlated. In contrast, searches of the term “pornography” are
uncorrelated with any of these four terms and is a term that is likely to be used by people
seeking information about pornography rather than accessing actual pornographic content.
There is also a distinction between “hard” and “soft” pornography, with “soft” generally referring
to media that is sexual in nature, but does not depict penetration. The four terms previously
listed will draw data only on users seeking hard content, but we still consider this to be an
effective analysis for two reasons. Soft porn is not considered to be pornography by many
viewers, and as a result it is pervasive even in mainstream media, including television and
movies. Second, we find that the relative searches for soft pornography terms are minimal in
comparison to searches for hard pornography terms. We did a relative search value for the
search terms “porn” and “nude girls” over 2005-2013. Searches for both terms were normalized
such that the maximum search volume took on the value 100, occurring for the term “porn”. In
comparison to the normalized maximum, “nude girls” never has a search volume index greater
than 6.
The data from Google Trends do not indicate the actual number of searches for a specific term
in a geographic area. Each data point is normalized by dividing the number of searches for the
term by the total number of all searches in that area. The data is therefore controlled for both
population and the differences in search volume among states. Google Trends also eliminates
repeated searches by a single individual in a short period of time to prevent a single individual
from skewing the results.
Data are available at the state-week level from Google Trends. We use data over the year July
2013-July 2014. Our observations are adjusted to a 1-100 scale. A state with the highest
normalized searches of a specific term during a one week period in our dataset has a reading of
100. Using this data on each term we construct an index of pornography searches for each state-
week of our data with a weighted sum using the four terms. We weight “porn” and “sex” more
heavily because their relative searches are much greater than compared to “f***”, and “xxx”.
Specifically, we use the mean relative weighting of each term over the past year. We then use
this weighted search volume ranking of states by Google Trends to geographically model the
adult entertainment industry.
One of the advantages of using data from Google Trends as opposed to website-specific
subscription data is that it includes the information about individuals searching out both free
and paid adult entertainment. Doran (2008) notes that about 80-90% of visitors to pornographic
websites only access free pornographic material, suggesting that analysis of paid adult
entertainment may obscure actual patterns of pornography consumption in general.
Our third data source records the number of subscriptions to one of the top-ten largest
providers of paid pornographic content used in a recent study by Edelman (2009). Edelman’s
analysis of this dataset was a novel contribution to the literature; previous studies of
pornography use had only examined survey data. The specific data used was the zip code
associated with all credit card subscriptions between 2006 and 2008. This particular content
provider has hundreds of sites covering a broad range of adult entertainment. Edelman (2009)
acknowledges, however, that “it is difficult to confirm rigorously that this seller is representative.”
Although the source of this subscription data is a top-10 seller of adult entertainment, the
subscriptions are very low relative to the patterns of pornography use we observe in survey data
like the NFSS, where 47% of adults report using pornography in the last year. The state with the
most subscriptions per broadband household is Utah with 5.47 for every 1,000 households with
broadband. The lowest state is Montana with 1.92 subscriptions for every 1,000 households with
broadband. These low rates suggest that the market share for individual content providers of
pornography is small, making it difficult to know whether the data from one provider can
provide an accurate cross-state comparison. As mentioned before, the vast majority of
individuals who access pornography online only access free content rather than using a paid site
such as those studied by Edelman (Doran, 2010).
Our fourth data source is page view data from Pornhub.com, which was the third largest online
host of adult entertainment in the United States at the time. We use the Pornhub data due its
size as well as the availability of data. Pornhub made the page views per capita during the year
2013 publicly available and reported this data separately by state. The Pornhub data is similar in
nature to Edelman’s data in that it is a provider-side objective measure of pornography use.
However, the data records page views instead of subscribers; intuitively, the data would reveal
patterns of heavy per-person use as well as patterns of proliferation among the population. The
data also has the relative advantage of including both paid and unpaid use.
Assessing the representativeness of new data sources

The big data revolution is beginning to dramatically open up the types of data sources that can
be used to measure and study behaviors, such as pornography use. The subscription data used
by Edelman (2009) represents the type of large datasets that will increasingly become available
to scholars in their research. An important first step in using this type of proprietary data will be
assessing the degree to which the data from a single provider is representative of the general
population of interest. In this section, we provide a framework assessing the representativeness
of a dataset by comparing it to the patterns observed from another data that is known to be
nationally representative or by comparing it to a combination of other data sources that
collectively are likely to represent the true underlying pattern of behavior.
In Table 1 we list the top ten and bottom ten states for pornography use based on each of the
four sources: subscription data, Pornhub, NFSS, and Google Trends. Mississippi is one state that
ranks in the top four states in pornography use across all four datasets and Idaho consistently
ranks near the lowest rates of any states across most of the measures. In contrast, other states
such as Arkansas and Utah rank in the top ten along some measures but in the bottom ten
along other measures. These results suggest that identifying which state seems to have the
highest rates of pornography use based on a single data source can be a bit problematic.

Table 1.  Rank Order of States Based on Four Different Data Sources Controlled
for Broadband Internet Access.
Rank NFSS Google Trends Paid Subscription Pornhub
2006-2008
2012 2013-2014 2013

1 (highest) Nevada Mississippi Utah Kansas


2 Mississippi Texas Alaska Nevada
3 Tennessee Arkansas Mississippi Illinois
4 Kansas Louisiana Hawaii Mississippi
5 Missouri Tennessee Washington D.C. Georgia
6 Wyoming Oregon Oklahoma Texas
7 Washington D.C. Kentucky Arkansas Missouri
8 Oklahoma Michigan North Dakota Oklahoma
9 Illinois Missouri Louisiana Colorado
10 Indiana Georgia Florida Kentucky

42 (10th lowest) New Hampshire Maine Michigan South Carolina


43 New Jersey New Jersey Wyoming Vermont
44 Virginia Connecticut Connecticut Arkansas
45 New York Maryland Delaware South Dakota
46 Idaho Utah New Jersey West Virginia
47 New Mexico Washington D.C. Oregon Wyoming
48 Colorado Vermont Ohio Montana
49 Vermont Massachusetts Tennessee Maine
50 Utah New Hampshire Idaho Idaho
51 (lowest) Delaware Delaware Montana Utah

Notes:  Pornhub data does not include Washington D.C. and hence ranks only to 50. The NFSS sample
excludes for which there were less than 50 respondents. For datasets without the full set of 51 states/DC,
51 refers to the lowest rank and 42 refers to the 10th lowest rank.

In Table 2 panel A we estimate the correlation between each of the data sources using the actual
measures of pornography use from each source rather than the ordinal ranking which is
reported in Table 1 from these measures. The paid subscription data has, by far, the weakest
correlation with the other three sources and is even negatively correlated with the NFSS survey
data. The paid subscription data has a correlation of -0.0358 with the NFSS, 0.076 with Google
Trends, and 0.0066 with Pornhub. None of these correlations are statistically significant;
corresponding t-statistics are all less than 0.6 (which correspond to directional p-values greater
than .3). In contrast, the other three rankings show relatively notable correlations. Google
Trends and Pornhub have a correlation of .487, NFSS and Google Trends have a correlation of .
655 and Pornhub and NFSS have a correlation of .551. All of these correlations are statistically
significant with a t-statistic between Google Trends and Pornhub of 3.78, between NFSS and
Google Trends of 5.68, and between Pornhub and NFSS of 4.28. All of these correspond to
directional p-values of less than .0004.
In panel B we report correlations using the ordinal rankings created from each data source.
Correlations between NFSS, Google trends, and Pornhub have comparable correlation
coefficients and significance to those in panel A, likewise the correlation between Google trends
and paid subscription is similar. The panel is notable because when using ordinal rankings paid
subscription data better correlate with Pornhub and NFSS survey data, however the correlations
are still insignificant. The two panels allow us to draw similar conclusions, however the larger
coefficients for paid subscription data are worth noting despite the fact that they are
insignificant and notably weaker than the correlations of the other sources with each other. We
believe the correlations using the actual measures of pornography use rather than ordinal
rankings best represents the industry because it accounts for the actual difference in
pornography use rather than just the specific ordering of the states.

Table 2. Correlation between the Four Data Sources.


Paid Subscription NFSS Google Trends
A. Continuous measures
NFSS -0.0358

(0.25)
Google Trends 0.0760 0.6547

(0.52) (5.68)
Pornhub 0.0066 0.5510 0.4867

(0.05) (4.28) (3.78)


B. Rank correlations
NFSS .2670

(1.838)
Google Trends 0.0821 0.6886

(0.577) (6.299)
Pornhub 0.2424 0.5344 0.4490

(1.749) (4.194) (3.518)

Notes:  Correlation coefficients between datasets of each metric of pornography use controlled by broadband
internet access. T-statistics are provided in parenthesis.

The significant correlation between the three non-paid subscription data sources, despite the
different variables they measure (search volume, page views and proportion of pornography
viewers), suggest that they are measuring a real underlying pattern of variation in pornography
use across states; one that is not correlated with the subscription data used by Edelman (2009).
Sensitivity of estimates to data source used

In order to illustrate the importance of accounting for the differences in state pornography rates
across different data sources, we replicate the results of a recent study that found that more
religious and more conservative states were more likely to search for sexual content on Google
(MacInnis & Hodson, 2014). We examine whether the conclusions of that paper apply to other
measures of pornography use using the other data sources that we have described in this
paper. The results of this replication are given in Table 3. We standardized the pornography-use,
religiosity, and conservatism measures by subtracting the mean and dividing by the standard
deviation to allow for comparisons across the different pornography use measures (this
approach is equivalent to converting each of the measures into a Z-score).

Table 3.  Correlations between State-Level Religiosity or Conservatism and Each Metric
of Pornography Use.
Religiosity Conservatism
No controls Controls No controls Controls

Google Trends 0.610*** 0.223 0.479*** 0.266*


(.163) (.176) (.151) (.146)
NFSS 0.213 -0.0782 0.215 0.0879
(.195) (.310) (.200) (.304)
Pornhub 0.129 0.0930 -0.0732 -0.0265
(.153) (.232) (.163) (.207)
Paid Subscriptions 0.299 0.487 0.167 0.221
(.192) (.314) (.183) (.254)

Notes: N = 50. Controls include state population, state GDP, percentage of individuals below the poverty
line, and internet use both in-home and out-of home. Data on conservatism and religiosity by state in 2013
is drawn from Gallup, GDP by state from the U.S. Census Bureau, and number in poverty, internet use, and
population from the Census. Measures of pornography, conservatism, and religiosity are all normalized to
have a standard deviation of one. Robust standard errors in parentheses. ***, **, and * indicate statistical
significance at the 1%, 5%, and 10% levels respectively.

In the original study, MacInnis and Hodson (2014) gave results based on Google Trends data
separately for specific search terms such as sex, porn, and XXX, similar to the terms that we are
using in our Google Trends measure. The results in the first row of Table 3 show that we also
find a statistically significant relationship between religiosity and conservatism in most cases
when we use the Google Trends data. However, the other rows in Table 3 show that we get a
much weaker statistical relationship when using any of the other three data sources. These
results suggest that if MacInnis and Hodson (2014) had used any of the other three data sources,
they probably would have come to a different conclusion in their paper about the strength of
the relationship they were examining.
The fact that MacInnis and Hodson (2014) find a statistically significant relationship between
state-level religiosity and state-level pornography use is interesting considering that past studies
using individual level data find that individuals who regularly attend church are much less likely
to use pornography (Doran & Price, 2014; Patterson & Price, 2012; Stack, Wasserman, & Kearns,
2004). This type of pattern in which group-level relationships are opposite what is found at the
individual level has also been found in the relationship between education and religion (Glaeser
& Sacerdote, 2008) and the relationship between income and political affiliation (Glaeser &
Sacerdote, 2007).

Discussion
Each of the data sources considered above captures a different cross-sectional view of the
online pornography industry, and each has important vulnerabilities for researchers interested
in general levels of pornography use by state. NFSS survey data, for example, probably
underreports pornography consumption because of social desirability bias and subjects’ faulty
memory. Google Trends data fails to capture any pornography use that is accessed through
means other than a Google search. Pornhub and paid subscription data may be limited in their
representativeness; they measure use with respect to only a single firm in the industry.
When data from any source is used in research, results must be presented in context of the data
that lead to those results. Issues arise when individuals mistakenly interpret a given data source
as representing the entirety of the pornography industry. There are many other settings in
which similarly non-representative data may be erroneously over-generalized. Researchers and
individuals must be aware of the external validity of their findings while the media and readers
must be careful not to overgeneralize results.
We also recognize a limitation of our data sources in that they capture the pornography industry
in different historical moments; Google Trends (2013-2014), paid subscription (2006-2008),
Pornhub (2013), and NFSS (2012). Paid subscription data were collected approximately 6-7 years
prior to the other sources. This time difference may bias our results, however the general trends
in the data sources as a whole are such that we believe our findings to be accurate. Major shifts
in the relative use of pornography across states from 2006-2013 would be needed for this bias
to occur which we believe is unlikely.
When attempting to rank order individuals regarding some form of activity, multiple sources (if
available) must be viewed for the sake of contrasting results. Should the orderings be similar
their accuracy can be more readily assumed. Should they differ, an opportunity arises to
understand more regarding the issue. In our particular case, the differences are likely to arise
because the sources capture different types of pornography use.
Past research on pornography use has touched on the degree to which it might affect important
areas of interest such as divorce, happiness, worker productivity and sexual violence (Bergen &
Bogle, 2000; Doran & Price, 2014; Patterson & Price, 2012; Young & Case, 2004). When such
research is being conducted data must be from a reliable and generalizable source (or sources).
Results and findings of any such effects must be considered in light of the age, gender, and
sexual identity of individuals as well – factors which are not considered in this paper (Sevcikova
& Daneback, 2014; Stoops, 2015; Traeen & Daneback, 2013; Tripodi et al. 2015). In such research
opportunities pornography use by state may play a role in the analysis. Given the results of this
paper the data source of such a variable must be heavily considered in such a regression and
result must be interpreted in context of the data source.
Conclusion
Data provided by specific companies have the potential to provide important insights into public
issues. A major challenge is determining when the data of a single company, even a very large
one, can provide insights that are representative of the entire population. Assuming relative
rates of pornography across states did not have major changes from 2006-2013, the results of
our paper suggest that in some cases the information from a single company may make for a
misleading picture of the geographic patterns of a specific behavior. This can be particularly
important for pornography use since the vast majorities of individuals who access pornography
online only access free content rather than using a paid site (Doran, 2008).
The results of this paper draw on four different data sources about pornography use including
two that involve nationally representative data (Google Trends and NFSS). We find a significant
correlation between three of our data sources suggesting that they all reflect a similar
underlying pattern in pornography use across states. In contrast paid subscription data, the one
source that has received a fair amount of media attention, actually correlates rather poorly with
the other sources. We also show that choices across data sources can affect the conclusions that
studies draw and suggest that future studies include sensitivity tests across data sources when
examining issues for which it is challenging to get an ideal measure of the specific behavior.

You might also like