0% found this document useful (0 votes)
63 views7 pages

Whatisstat Unit 2

This document provides an overview of statistics and its applications. It defines statistics as a way to get information from data, distinguishing between descriptive statistics which organizes and summarizes data, and inferential statistics which draws conclusions about populations from samples. Key concepts discussed are populations, samples, and statistical inference. The document outlines many applications of statistics in fields like business analytics, epidemiology, psychology, and physics. It also mentions large data repositories and the role of computers in statistics.

Uploaded by

mohanpragnesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views7 pages

Whatisstat Unit 2

This document provides an overview of statistics and its applications. It defines statistics as a way to get information from data, distinguishing between descriptive statistics which organizes and summarizes data, and inferential statistics which draws conclusions about populations from samples. Key concepts discussed are populations, samples, and statistical inference. The document outlines many applications of statistics in fields like business analytics, epidemiology, psychology, and physics. It also mentions large data repositories and the role of computers in statistics.

Uploaded by

mohanpragnesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

What is statistics?

Kamakshaiah Musunuru
Associate Professor-Business Analytics
Dhruva College of Management
[email protected]

Contents
1 Business Statistics
1.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Inferential Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
2
2

2 Key Statistical Concepts


2.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2
2
2
2

3 Statistical Applications in Business

4 Large Real Data Sets


4.1 Data repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4
5

5 Statistics and Computers

6 Important Questions
6.1 Short-answer questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7
7

Business Statistics
Statistics is a way to get information from data. The term statistician is used to describe
so many different kinds of occupations that it has ceased to have any meaning. It is used,
for example, to describe a person who calculates baseball statistics as well as an individual
educated tin statistical principles. The former is statistics practitioner later statistician. A
statistics practitioner is a person who uses statistical techniques properly. Examples of statistics practitioners include the following.
a financial analyst who develops stock portolios based on historical rates of return.
an economist who uses statistical models to help explain and predict variables such as

inflation rate, unemployment rate, and changes in the gross domestic product; and
a market researcher who surveys consumers and converts the responses into useful information.
The term statistician refers to an individual who works with the mathematics of statistics. His
or her work involves research that develops techniques and concepts that in the future may
help the statistics practitioner. Statisticians also statistics practitioner, frequently conducting
empirical research and consulting. For instance, your instructor is probably a statistician.

1.1

Descriptive Statistics

Descriptive statistics deals with methods of organizing, summarizing, and presenting data in a
convenient and informative way. One form of descriptive statistics uses graphical techniques that
allow statistics practitioners to present data in ways that make it easy for the rader to extract
useful information.
Another form of descriptive statistics uses numerical techniques to summarize data. One such
method that you have already used frequently calculates the average or mean. Certain techniques
like measures of central tendency and measures of variability.

1.2

Inferential Statistics

Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data.

Key Statistical Concepts

Statistical inference problems involve three key concepts:the population, the sample, and the statistical inference.

2.1

Population

A population is the group of all items of interest to a statistics practitioner. It is frequently very large
and may, in fact, be infinitely large. in the language of statistics, population does not necessarily
refer to a group of people. It may, for example, refer to the population of ball bearings produced
at a large plant.
The descriptive measure of population is called parameter. For instance, the parameter of
interest could be the mean number of soft drinks consumed by all the students at the university.

2.2

Sample

A sample is a set of data drawn from the studied population. A descriptive measure of a sample
is called a statistic. We use statistics to make inferences about parameters. For instance, we may
compute the statistic as a mean number of soft drinks consumed in teh last week by the 500 students
in the sample. We may then use the sample mean to infer the value of the population mean, which
is the parameter of interest in this problem.

2.3

Statistical Inference

Statistical inference is the process of making an estimate, prediction, or decision about a population based on sample data. Because, populations are almost always very large, investigating
each member of the population would be impractical and expensive. It is far easier and cheaper
to take a sample from the population of interest and draw conclusions or make estimates about
the population on the basis of information provided by the sample. However, such conclusions
and estimates are not always going to be correct. For this reason, we build into the statistical
inference a measure of reliability. There are two such measures confidence level and significance
level. The confidence level is the proportion of times that an estimating procedure will be correct.
For example, we can produce an estimate of the average number of soft drinks to be consumed by
all 50, 000 students that has a confidence level of 95%. In other words, estimates based on this form
of statistical inference will be correct 95% of the time. When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the
conclusion will be wrong. For example, suppose that, as a result of the analysis, we may conclude
that more than 50% of the electorate will vote for BJP, and thus that will win; and 5% significance
is that samples that lead us to conclude that BJP wins the election, will be wrong 5% of the time.

Statistical Applications in Business

Statistics is the mathematical science involving the collection, analysis and interpretation of data.
A number of specialties have evolved to apply statistical theory and methods to various disciplines.
Certain topics have tatistical in their name but relate to manipulations of probability distributions
rather than to statistical analysis. The following is the list of several applications, however the list
is not exhaustive.
Actuarial science is the discipline that applies mathematical and statistical methods to assess
risk in the insurance and finance industries.
Astrostatistics is the discipline that applies statistical analysis to the understanding of astronomical data.
Biostatistics is a branch of biology that studies biological phenomena and observations by
means of statistical analysis, and includes medical statistics.
Business analytics is a rapidly developing business process that applies statistical methods to
data sets (often very large) to develop new insights and understanding of business performance
opportunities
Chemometrics is the science of relating measurements made on a chemical system or process
to the state of the system via application of mathematical or statistical methods.
Demography is the statistical study of all populations. It can be a very general science that
can be applied to any kind of dynamic population, that is, one that changes over time or
space.
Econometrics is a branch of economics that applies statistical methods to the empirical study
of economic theories and relationships.

Environmental statistics is the application of statistical methods to environmental science.


Weather, climate, air and water quality are included, as are studies of plant and animal
populations.
Epidemiology is the study of factors affecting the health and illness of populations, and serves
as the foundation and logic of interventions made in the interest of public health and preventive
medicine.
Geo statistics is a branch of geography that deals with the analysis of data from disciplines
such as petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geography.
Operations research (or Operational Research) is an interdisciplinary branch of applied mathematics and formal science that uses methods such as mathematical modeling, statistics, and
algorithms to arrive at optimal or near optimal solutions to complex problems.
Population ecology is a sub-field of ecology that deals with the dynamics of species populations
and how these populations interact with the environment.
Psychometric is the theory and technique of educational and psychological measurement of
knowledge, abilities, attitudes, and personality traits.
Quality control reviews the factors involved in manufacturing and production; it can make
use of statistical sampling of product items to aid decisions in process control or in accepting
deliveries.
Quantitative psychology is the science of statistically explaining and changing mental processes and behaviors in humans.
Reliability Engineering is the study of the ability of a system or component to perform its
required functions under stated conditions for a specified period of time
Statistical finance, an area of econophysics, is an empirical attempt to shift finance from its
normative roots to a positivist framework using exemplars from statistical physics with an
emphasis on emergent or collective properties of financial markets.
Statistical mechanics is the application of probability theory, which includes mathematical
tools for dealing with large populations, to the field of mechanics, which is concerned with
the motion of particles or objects when subjected to a force.
Statistical physics is one of the fundamental theories of physics, and uses methods of probability theory in solving physical problems.
Statistical thermodynamics is the study of the microscopic behaviors of thermodynamic systems using probability theory and provides a molecular level interpretation of thermodynamic
quantities such as work, heat, free energy, and entropy.

Large Real Data Sets

As we studied in the first unit, the organizations are literally starving for right data for rights
decisions depends on right and meaningful data. There is abundant of data everywhere, however
finding right data to solve business problems is always under question in spite of large data repositories everywhere. The follwing are certain important data repositories for statisticians (the list is
not exhaustive).

4.1

Data repositories

You may click at highlighted words in pink color for more information on respective data
repository.
AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public
data sets that can be seamlessly integrated into AWS cloud-based applications.
BigML big list of public data sources.
Bioassay data, described in Virtual screening of bioassay data, by Amanda Schierz, J. of Cheminformatics, with 21 Bioassay datasets (Active / Inactive compounds) available for download.
Bitly 1.usa.gov data, anonymized clicks on gov links.
Canada Open Data, pilot project with many government and geospatial datasets.
Causality Workbench data repository.
at Texas Advanced Computing Center, supporting data-centric science.
Data Source Handbook, A Guide to Public Data, by Pete Warden, OReilly (Jan 2011).
Datacatalogs.org, open government data from US, EU, Canada, CKAN, and more.
Data.gov.uk, publicly available data from UK (also London datastore.)
Data.gov/Education, central guide for education data resources including high-value data sets,
data visualization tools, resources for the classroom, applications created from open data and
more.
DataMarket, visualize the worlds economy, societies, nature, and industries, with 100 million
time series from UN, World Bank, Eurostat and other important data providers.
Datamob, public data put to good use.
DataSF.org, a clearinghouse of datasets available from the City County of San Francisco,
CA.
DataFerrett, a data mining tool that accesses and manipulates TheDataWeb, a collection of
many on-line US Goverment datasets.
Delve, Data for Evaluating Learning in Valid Experiments

EconData, thousands of economic time series, produced by a number of US Government


agencies.
Enron Email Dataset, data from about 150 users, mostly senior management of Enron.
Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage
content.
FEDSTATS, a comprehensive source of US statistics and more
For more information you may find resources at this place.

Statistics and Computers

Also known as computational statistics, or statistical computing, is the interface between statistics
and computer science. It is the area of computational science (or scientific computing) specific to
the mathematical science of statistics. This area is also developing rapidly, leading to calls that a
broader concept of computing should be taught as part of general statistical education.
The terms computational statistics and statistical computing are often used interchangeably,
although Carlo Lauro (a former president of the International Association for Statistical Computing
proposed making a distinction, defining statistical computing as the application of computer science
to statistics, and computational statistics as aiming at the design of algorithm for implementing
statistical methods on computers, including the ones unthinkable before the computer age (e.g.
bootstrap, simulation), as well as to cope with analytically intractable problems.
As far as nature of computing software concerned; they are categorized as below:
Open source statistical packages:
DAP A free replacement for SAS
gretl gnu regression, econometrics and time-series Library
Mondrian (software) - data analysis tool using interactive statistical graphics with a link
to R.
Octave programming language (very similar to Matlab) with statistical features
OpenMx A package for Structural equation modeling running in R.
R A free implementation of the S language.
PSPP A free software alternative to IBM SPSS Statistics
Perl Data Language - Scientific computing with Perl
Pandas HPC data structures and data analysis tools for Python in Python and Cython
(statsmodels, scikit-learn)
RapidMiner, a machine learning toolbox
Weka is also a suite of machine learning software written at the University of Waikato.

SciPy (a Python library for scientific computing) contains the stats sub-package which
is partly based on the venerable STAT (a.k.a. PipeStat, formerly UNIXSTAT)
software
Freeware statistical packages
BV4.1
GeoDA
IDAMS/WinIDAMS
MINUIT
Proprietary statistical packages
Excel
SAS
IBM-SPSS
Minitab
Mathematica
S-PLUS
STATISTICA
STATA

Important Questions
1. What is the distinction between statistician and a statistics practitioner ?
2. What is descriptive statistics? Explain about measures of central tendency and measures of
dispersion.
3. What is inferential statistics? Explain about the goal of statistician while computing inferential statistics.
4. Explain about the following:
Population
Sample
Statistical inference
5. Give a brief note on different statistical applications in business.
6. What is a data repository? Give few examples.
7. Describe about various types of computing tools available for statistical analysis. Categorize,
list few important tools widely used by statisticians.

You might also like