Crunchit! 2.0 Quick Start Guide: Texas A&M University
Crunchit! 2.0 Quick Start Guide: Texas A&M University
Alan Dabney
Texas A&M University
©2011 by W.H. Freeman and Company
ISBN: 1-4292-6208-7
www.whfreeman.com
What is CrunchIt!?
CrunchIt! Version 2 is a web-based statistical calculator and data analysis tool. It can
perform most of the statistical functions described in your W. H. Freeman and Company
text.
Test Drive
Let’s take a quick test drive to get a glimpse into what CrunchIt! can do. First of all—
here’s what CrunchIt! looks like:
It looks a lot like spreadsheets you’re familiar with, right? On the left are links to
different datasets. In the middle are links to the main CrunchIt! tools: ‘Data’ for loading
and playing with data, ‘Statistics’ for computing summary statistics and specific
statistical operations, ‘Graphics’ for making pictures, and so on. Let’s load some data. If
we click on ‘Chapter 4’, we’ll see a bunch of datasets. Here’s the data from ‘Exercise
33’:
These data come from a study of which colors best attract beetles to sticky board; we’d
like to use the color that tends to bring in (and catch) the most bugs. Let’s save a copy
of these data to a comma-delimited (CSV) text file by clicking ‘Data’ then ‘Save Data
Table’:
This brings up a window that prompts us to select a location for the file:
We’ll just use the default and save to the Desktop. Now, let’s make a set of box plots,
with one box per color. Go to ‘Graphics’, then ‘Box Plot’:
And, we’ve got separate columns of counts for each color. Now, let’s make the box
plots:
Shift-click to highlight the new columns...
Let’s look at some summary numbers by selecting ‘Statistics’ > ‘Summary Statistics’,
then ‘Column’, to indicate that we want to compute summaries on one column:
The following window will appear:
I’ve selected the ‘Yellow’ column; by default, all the possible summaries are selected,
but we can turn some of them off if we’d like. Here’s what we get:
If you recall, we could tell from the box plot that the median was between 45 and 50; it
turns out that it’s actually 46.5. We also saw that, while the overall variability was low
(the standard deviation came out as about 6.8), there were a couple of extreme values.
These are shown by the minimum and maximum values of 38 and 59, respectively. We
can recompute the data summaries without these two extreme values by using the
‘Filter data’ feature:
In the ‘Select’ box, we specify the ‘Yellow’ column. We then need to filter the data so
that only numbers greater than 38 and less than 59 are included in the calculations. To
do this, we need two filtering criteria. We create an extra filter criterion by clicking the
‘+’ sign:
The median didn’t change (because we removed one large and one small value, we
knew it wouldn’t), the mean dropped just a bit, and the standard deviation dropped a lot
(s.d.’s are pretty sensitive to extreme values).
Going back to the box plots, it looks pretty clear that there are some substantial
differences in beetle counts by wood color. Even so, let’s do a formal test for whether
the mean counts with each color are the same. We’ll use one-way ANOVA:
This brings up the following window:
Notice that we’ve got the option of specifying our data by ‘Columns’ or ‘Factored’. The
‘Columns’ choice means the counts for each color appear in different columns, whereas
‘Factored’ puts all the counts in one column, with a separate “factor” column that
indicates color. Since we’ve unstacked our data, we can go either route. Here we’ll use
‘Factored’:
Our “response” is the ‘Beetles’ column containing the counts, whereas the “treatment”
is the ‘Color’ column containing the color for each count. Here’s what we get (note that I
didn’t do any filtering):
The last column contains the p-value, which is extremely small, indicating strong
evidence against the assumption of equal counts for the different board colors (as we
expected, based on the box plots).
There’s a lot more to explore in CrunchIt! (see the ‘Detailed Specifications’ section
below), but this should give you a feel for the main features. Enjoy!
Detailed Specifications
Data are entered manually or loaded from publisher-created files into a spreadsheet,
then statistical graphs may be created from the data, or computations may be
performed on them. Results of computations and graphs will typically appear in a new
results window on your screen. To close the results window, click the button at the top
right. There are four main menu options shown on the left of the spreadsheet that will
be used typically. They are
Data
Statistics
Graphics
Preferences
The Data Table is the spreadsheet into which you enter or load data. Each column
(labeled as Col 1, Col 2, etc.) should contain the values for a single variable in your
dataset. For this reason, we may refer to columns and variables interchangeably. It is
recommended that you name each column by entering text just below the numbered
column header, to the right of the #.
Often, each row of the data table will correspond to a single observation, subject, or run
of an experiment.
To open and use a data set from your text, double-click on the chapter in the list at left,
then double-click on the appropriate example, exercise, or table number.
Entering Data
Each cell may hold a numeric value or text. Numeric values must be entered in decimal
format. No mathematical expressions will be evaluated. Do not enter commas.
• 3
• −2.7
• 0.00000956
• 200000
• five
• 2.4e4
• 1,250
• 1+1
Data
Clicking this menu option opens a menu that allows you to clear the data table, save it,
or load data from a file.
Data: Clear Data Table
This menu item erases all entries and labels from the Data Table.
This menu item allows you to load a data file that is stored on your computer. CrunchIt!
will recognize data files in comma-separated value (.csv) and data (.dat) formats. The
first line of the data file is taken to be the variable name, so be sure your files are
labeled this way.
This menu item allows you to load a data file that is stored on the World Wide Web.
Link to Data
This option opens CrunchIt! with the current data pre-loaded, but any changes will be
erased. This is a convenient option if you have made changes to a data set you
wish to discard.
This menu item allows you to store the current data table to your computer in comma-
separated value (.csv) format. You may then load this file into CrunchIt! at a later time,
or import it into a spreadsheet program.
This menu item allows you to save the table of results returned by one of CrunchIt!’s
statistical functions. Statistics output is saved in a comma-separated format that can be
opened with a spreadsheet program like Excel. If you add the “.doc” extension, the
results can be opened with Word. Graphics normally will save as JPEG (.jpg) files; results
of computations and tests will be saved as files that can be opened with a spreadsheet
program like Excel.
CrunchIt! has a built-in random-number generator that allows you to fill cells in the Data
Table with Uniform random data. This is mainly useful for trying out statistical tests and
graphs on made-up data.
This allows you to add additional columns (variables) into a dataset, remove columns,
and add or delete rows. Added columns will be on the right side of the spreadsheet and
added rows will be at the bottom (the default is for fifty (50) available data rows). NOTE:
There is a minimum spreadsheet size of 11 columns and 50 rows; you cannot delete
rows or columns so that the dataset falls below these dimensions.
This function will take two or more columns and stack them on top of one another. The
result will have one column with the original column name (labeled “ind” to identify the
“group”) and data in another column.
This function takes a single column of values, together with a separate column of group
"labels", then adds a separate column for the values of each label. For example, ex04-
33 in BPS has beetle counts in one column and wood color in another; there are six
counts for each of the four colors. The unstack command would create four new
columns of beetle counts, one for each of the four colors.
Statistics
In general, statistical calculations and tests are performed by clicking on the desired
item in the Statistics menu, choosing one or more columns/variables, and entering
appropriate parameters. The results of the test will appear in a separate results window.
To copy the results into another program (like Word), right-click in the results window
and click Select All; right-click again and select Copy. You can then paste the contents
of the results window into your document. To directly print the results, select Print with
the first right click.
The dialog for each menu item has a check box labeled "Insert results into main grid."
Activating this option causes the results of the statistical calculation to be entered into
the data grid itself, rather than appearing in the results pane at the right.
Some statistical tests can be performed only on numeric columns/variables. These are
columns containing only numbers, and no text.
Alternative Hypotheses
Many of the statistical tests are hypothesis tests and consequently require you to select
one of the following options:
The selected value indicates the nature of the alternative hypothesis. For instance, in the
case of a one-sample Z test, the null hypothesis could be that the underlying distribution
has mean 0. If ‘Two-sided’ were selected, the alternative hypothesis would be that the
true mean is any value other than 0; if less were chosen, the alternative hypothesis
would be that the true mean is less than 0. A confidence interval for the parameter will
also be created that corresponds to the alternate hypothesis selected. NOTE: For the
most part, your text only discusses two-sided confidence intervals (that is, the parameter
lies between values a and b), with some amount of confidence (specified using the input
box); one-sided intervals place all the “error” (α) in a single tail of the distribution. These
can be interpreted as “the parameter is at most b” or “the parameter is at least a.”
This function will calculate a variety of standard statistics on any given numeric column.
The available statistics are:
This function allows you to calculate the correlation matrix (direction and strength of
linear relationships) for any number of numeric columns in your data grid. You should
always plot the data to ensure the relationships are linear and not curved in some
manner.
This function allows you to calculate the covariance matrix for any number of numeric
columns in your data grid. The covariance matrix has the variances of the variables (s2)
on the diagonal and the product of the correlation and standard deviations (rsxsy) in off-
diagonal entries. (This concept is usually not covered in an Introductory Statistics
course.)
Statistics: Tables: Frequency
This function generates a frequency table for any column of data. The resulting table will
include the number of times each unique value appears, as well as the corresponding
percentage.
Depending on the selected option, the table may be ordered in one of three ways:
A cutoff value may also be specified. Use the slider to enter a percentage; any value that
occurs in the column with a frequency less than that percentage will be grouped into an
"Other" category. For example, entering .1 results in values whose frequency is less than
ten percent being grouped together as "Other."
Given any pair of categorical columns, this function will generate a contingency (two-
way) summary table for those columns, along with the results of a Chi-squared test of
the independence of the selected columns.
Statistics: Z Statistics
The Z Statistics functions perform one- and two-sample tests based on the standard
normal distribution and a “known” (or assumed) population standard deviation. These
tests may only be applied to numeric columns. For the one-sample test, a single variable
and the standard deviation of the population from which it is drawn are both required.
The null hypothesis will be that the mean of the population from which the variable is
drawn is equal to a hypothesized value.
For the two-sample test, two numeric variables are required, along with the standard
deviation for the population from which each is drawn. The null hypothesis will be that
the mean of the two populations differ by a hypothesized value.
Statistics: Proportions
The Proportions functions will perform one- and two-sample tests on both raw data and
summarized data.
For the one-sample test, select a column, a success criterion, and a null proportion. A
success criterion is one of the values contained in the column. For instance, the column
might be labeled "Color," with each cell containing either "Red," "Green," or "Blue." Your
success criterion might be "Blue," and your null proportion might be 0.33. In this case,
the null hypothesis is that the proportion of "Blue" values is 0.33 in the underlying
population.
With the "One Sample with Summary" option, a one-sample proportion test may also be
calculated based on a summary of the data; instead of selecting a variable and providing
a success criterion, you simply supply the number of successes and total number of
observations.
Statistics: T Statistics
CrunchIt! will perform one-sample, two-sample, and paired T-tests on numeric columns.
For the one-sample T-test, the null hypothesis is that the distribution has the specified
mean. For the two-sample and paired T-tests, the null hypothesis is that the means of
the distributions differ by the specified amount.
The two-sample T-test requires you to indicate whether to use the pooled variance to
estimate the variance of the difference (typically this box is left unchecked).
The paired T-test requires that both columns have the same number of entries.
Given two numeric columns, this function tests the hypothesis that the ratio of variances
of the underlying populations is a specified value. Note: use of this test assumes Normal
populations and can give misleading results for any departures from Normality.
Statistics: Regression
CrunchIt! can perform both linear and logistic regression with multiple independent
variables. On the left side of the dialog select one or more columns containing
explanatory variables (also called "independent variables" or "treatment variables"). On
the right side of the dialog select a single column containing a response variable (also
called a "dependent variable").
For linear regression, all the columns must be numeric. For logistic regression, the
explanatory variables must be numeric, but the response variable may contain text.
Also, for logistic regression the value of the response variable that corresponds to
success must be specified.
Statistics: ANOVA
Crunchit! will perform one- and two-way ANOVA. One-way ANOVA will accept data in
separate columns for each “treatment” (Columns) or with all data in a single column with
a separate column to indicate the treatment (group). Two-way ANOVA requires all data
in a single column with two group indicator variables and a balanced design—that is,
each treatment combination has the same number of observations.
Given a numeric column, the Sign Test One-Sample function performs a sign test to test
the null hypothesis that the underlying median is the specified value.
The Sign Test Two-Sample function for paired data takes two numeric columns of the
same length and tests the null hypothesis that the median of the differences in each row
is the specified value.
This function performs a Chi-squared goodness of fit test. Do the data agree with a
specified discrete distribution?
This function performs a Wilcoxon signed rank test on a single numeric column of data.
The null hypothesis is that the underlying distribution has the specified median.
This function performs a Wilcoxon signed rank test on two numeric columns. The null
hypothesis is that the medians of the underlying distributions differ by the specified
amount. (If this amount is 0, the null hypothesis amounts to both distributions being the
same.)
Given a pair of numeric columns, this function performs a Mann-Whitney test (the
nonparametric equivalent to a two-sample t test). The null hypothesis is that the
locations (medians) of the underlying distributions differ by the specified amount; if the
difference is 0, this amounts to a null hypothesis that the distributions are the same.
This procedure requires the two samples to be in different columns; if the data are in a
single column with another column indication "group membership," use Kruskal-Wallis to
perform the test.
These functions perform a Kruskal-Wallis rank sum test of the null hypothesis that the
location parameters (medians) of the underlying distributions of two or more groups are
the same.
If each column contains a group, use the Kruskal-Wallis function. If one column contains
the group labels ("factor specification variables"), and one column contains the response
variable, use the Kruskal-Wallis Factored function.
The distribution calculators built into CrunchIt! allow you to calculate the probability that
a random variable will take on certain values, given a wide variety of both continuous
and discrete distributions.
For both the Continuous Distribution Calculator and the Discrete Distribution Calculator,
first choose a specific distribution, such as normal or binomial. Then enter appropriate
parameters for the distribution. For example, the normal distribution has two
parameters: the mean and the standard deviation (sd), whereas the binomial distribution
requires the number of trials (n), and the probability of success for each trial (p).
Once the distribution and its parameters have been selected, choose a comparison and a
value of X. Click "Submit," and CrunchIt! will calculate the probability of a random
variable with the specified distribution taking on a value that satisfies the selected
comparison with respect to X.
• Distribution: normal
• mean: 1
• sd: 0.5
• Comparison: less than or equal to
• X: 0
CrunchIt! would inform you that the probability that a value sampled from the Normal
distribution with mean −1 and standard deviation 0.5 would be less than or equal to 0 is
0.977.
Graphics
Just like the statistical tests, plots are generated by clicking on the desired menu item in
the Graphics menu, choosing one or more columns/variables, and entering appropriate
parameters. The resulting image will appear in a results window. For most plots, the title
and x- and y-axis labels may be specified.
To copy the graph into a document, right-click in the graph and Select all, then right-
click again and select Copy. You can then paste the graph into the document.
Alternately, you can select Print from a right click.
The two-way option creates either side-by-side or stacked bar plots. Here, you specify
the Group factor (the outer category—a bar will be formed for each value of this
variable), the Series Factor (the inner category label), and a column of counts.
Pie charts may be generated in two ways. The Get Frequencies option takes a single
column and generates a pie chart of that column's frequency table. The With Data option
should be used if you already have the table that you want to turn into a pie chart; it
takes one column of category or group labels and one column of counts.
Graphics: Histogram
The Histogram function generates a histogram from a single numeric column. Three
kinds of histograms may be generated:
This function generates a box-and-whisker plot from a numeric column. If more than one
column is specified, side-by-side box plots will be created.
This function generates a Cleveland dot plot from a single numeric column.
Provided one numeric column of X-values and one numeric column of Y-values, this
function generates a scatter plot. The plot may consist of dots, lines, or both. If lines are
drawn, they are connected in row order. Use this option for time plots (with numeric time
indices) with the line option to connect the points.
Given any number of numeric columns, this function produces a matrix of scatter plots.
In each plot, one column provides the X-values, and one column provides the Y-values.
Graphics: QQ Plot
This function generates a normal QQ plot of the values in a numeric column. The line
passes through the first and third quartiles.
This function generates a parallel coordinates plot for any number of numeric columns.
This is used to graph high-dimension data, and is not typically used in Introductory
Statistics courses.
This plot is used to display multivariate data. Each “spoke” represents a variable, and
each star will represent an observation. This plot is not typically used in Introductory
Statistics courses.
Preferences
This function allows you to set the number of decimal places shown in output—either full
precision or, using the slider, a specified number of places.
Miscellaneous
Filtering Data
In the interface for Statistics and Graphics functions, there is an option to ‘Filter Data’.
This is used to select for display or analysis only a subset of the data. There are three
components to the ‘Filter Data’ widget. The first is a drop-down menu for selecting the
variable for which you wish to create filtering criteria. The second is a drop-down menu
for specifying the operation to be applied to the selected variable. The third is a text box
for entering the number to use as the filtering threshold. There is also a ‘+’ button on
the right, with which you create additional filtering widgets for more complicated filtering
criteria. See the ‘Test Drive’ section at the top of this document for an example.
With functions that involve multiple columns, there will sometimes be the option of
carrying out the analysis by ‘Columns’ or ‘Factored’. The ‘Columns’ option is appropriate
when the values of the dependent variable are in separate columns for each value of the
independent variable. The ‘Factored’ option is appropriate when all values of the
dependent variable are in a single column, and the different values of the independent
variable are in their own column. See the ‘Test Drive’ section at the top of this
document for an example.
Technical Requirements
If you have or can install the recommended browser on your computer, you can run
CrunchIt!
Internet Access
Operating Systems
CrunchIt! runs on any of the three major platforms (Mac, PC, Unix). The only requirement
is a compatible browser and version of the Flash plug-in. Virtually all computers
purchased in the last five years will have this software pre-installed.
Recommended Browsers
To download the latest available browser versions for your operating system, click the
link(s) below.
You will need to disable any pop-up blocking software. For more information on disabling
different pop-up blocking software visit:
http://www.safetyontheweb.com/support/disablepb.asp
The Macromedia Flash 6.0 (or above) plug-in is also required. While this is likely to be on
your computer already, to download Flash go to:
http://get.adobe.com/flashplayer/otherversions/
If you are uncertain about whether your system can run CrunchIt!, go to:
http://courses.bfwpub.com/syscheck/
Technical Support
For Students:
http://www.bfwpub.com/newcatalog.aspx?
page=support/studenttechsupport.html#topform
For Instructors:
http://www.bfwpub.com/newcatalog.aspx?
page=support\instructortechsupport.html#topform