0% found this document useful (0 votes)
20 views

Sta172 Lecture Notes Updated

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Sta172 Lecture Notes Updated

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

STA172 - STATISTICAL COMPUTING I

Module 1 & Part of Module 3


NDUKA, UCHENNA C. (Ph.D)
DEPARTMENT OF STATISTICS, UNIVERSITY OF NIGERIA, NSUKKA,

NIGERIA

1/23
1.1 Overview of Data Generation

▶ Data Generation - Data generation refers to the


process of creating, collecting, and assembling data
through various methods and techniques. It involves the
systematic gathering of information to build datasets that
can be used for analysis, research, decision-making, and
other purposes. The data generation process encompasses
activities such as data collection, recording, and storage,
and it plays a fundamental role in providing the raw
material for statistical analysis, machine learning, research
studies, and other data-driven applications.

2/23
Denition
▶ The goal of data generation is to produce accurate,
reliable, and relevant data that reects the characteristics
of the phenomenon being studied or observed. This
process involves choosing appropriate data collection
methods, ensuring data quality, handling ethical
considerations, and using tools and technologies to record
and store the data securely.
▶ Data generation is a critical step in the broader data
lifecycle, and the quality of generated data signicantly
inuences the validity and reliability of subsequent
analyses and conclusions. Researchers, scientists, and
practitioners in various elds rely on well-executed data
generation processes to obtain insights, make informed
decisions, and contribute to the advancement of
knowledge in their respective domains.
3/23
Importance of DG
▶ Importance of generating data for statistical
analysis
1. Basis for Analysis: Data serves as the foundation
for statistical analyses. Without appropriate and relevant
data, statistical techniques and methods have no input to
process.
2. Informed Decision-Making: Reliable data provides
the basis for making informed decisions. Statistical
analyses help extract patterns, trends, and relationships
from the data, assisting decision-makers in understanding
the implications of various choices.
3. Research and Exploration: Researchers use data
generation to explore hypotheses, test theories, and
contribute to the body of knowledge in their respective
elds. New data helps advance understanding and may
lead to the development of new models or insights.
4/23
Importance of DG
▶ 4. Quality Assurance: Data generation is a critical
aspect of ensuring data quality. Properly collected and
documented data contributes to the reliability and validity
of statistical analyses, reducing the likelihood of biased or
inaccurate results.
5. Predictive Modeling: Statistical analyses enable the
development of predictive models. By identifying patterns
and relationships within existing data, these models can
be applied to make predictions or forecasts in new
situations.
6.Performance Evaluation: In various elds, data
generation and subsequent statistical analysis are used to
evaluate the performance of systems, processes, or
interventions. This evaluation is essential for making
improvements and optimizing outcomes.
5/23
1.2 Table of Random Numbers

▶ Random numbers: Random numbers are a sequence of


numbers that lack any pattern, predictability, or order.
True randomness is often associated with natural
phenomena, such as atmospheric noise or radioactive
decay, but in computer science and statistics, random
numbers are typically generated using algorithms. These
algorithms, known as random number generators (RNGs),
produce sequences of numbers that mimic randomness,
although they are ultimately deterministic.

6/23
▶ A table of random numbers is a structured
arrangement of numbers devoid of any discernible pattern
or order.
▶ These tables are frequently employed in statistical
sampling, simulations, and other applications where a
source of unpredictability or randomness is necessary.
▶ Random numbers tables are particularly useful in
designing experiments, conducting surveys, and
implementing simulations that require a random and
unbiased selection process.
▶ The primary purpose of a table of random numbers is to
oer a systematic and impartial way of selecting values
for experimentation or analysis.

7/23
Example of RNT

Table: Table of Random Numbers

0.28999 0.07640 0.05954


0.67818 0.51647 0.82783
▶ 0.88404 0.56848 0.67874
0.51184 0.61581 0.43623
0.52828 0.91765 0.36030
▶ In this example, each cell of the table contains a random
number between 0 and 1. The numbers are generated
using a pseudorandom number generator.

8/23
Methods of using table of random numbers for
data generation
▶ Random Sampling: Use the table to randomly select
elements from a population. Assign each element a
unique identier, and use the random numbers to pick
samples without bias. This is useful in survey sampling or
experimental design.
▶ Random Assignment: For experimental studies, use
the table to randomly assign subjects to dierent
experimental conditions. This ensures that each
participant has an equal chance of being assigned to any
specic group.

9/23
1.3 Practical Exercise
▶ Consider the following set of data on sales in millions of
Naira

Table: (a) Data on sales (in millions)

10 11 9 10 12 7 10 9
8 10 9 9 12 8 11 10
10 9 8 9 12 10 8 10
10 10 11 11 10 11 11 8
9 10 8 9 10 10 9 11
9 10 10 11 8 10 12 11
11 8 9 8 11 9 9 9
11 11 11 10 12 9 10 11
9 9 12 8 10 10 11 9
9 12 10 9 9 9 9 10
10/23
Table: (b) Data on sales (in millions)

10 9 11 9 9 9 11 9
10 7 11 11 10 10 11 8
10 9 9 10 12 11 10 10
8 9 8 14 11 12 10 11
12 8 14 10 9 10 10 9
8 10 10 8 8 9 12 12
11 11 8 10 12 9 9 11
9 10 10 8 10 11 10 10
12 9 10 10 10 10 11 8
11 11 10 11 10 8 9 11
9 10 12 11 10 12 11 11
12 11 10 11 8 10 11 12

11/23
▶ Using Table (a) do the following:
1. give each entry in the table a number (1 - 80);
2. set your sample size n = 32;
3. use Ran# function in your calculator, select 32
observations from the sales data (a) without repetition.
4. use the necessary functions in your calculator obtain
the mean and standard deviation of the selected
observations.
5. repeat (3) and (4) 10 times to obtain 10 sets of
averages and 10 sets of standard deviations.
6. use the necessary functions in your calculator obtain
the mean and standard deviation of all the observations in
(a).
7. compare the mean and standard deviations with the
ones obtained from your samples.

12/23
▶ Using Table (b) do the following:
1. give each entry in the table a number (1 - 96);
2. set your sample size n = 50;
3. use RanInt function in your calculator, select 50
observations from the sales data (a) without repetition.
4. use the necessary functions in your calculator obtain
the mean and standard deviation of the selected
observations.
5. repeat (3) and (4) 10 times to obtain 10 sets of
averages and 10 sets of standard deviations.
6. use the necessary functions in your calculator obtain
the mean and standard deviation of all the observations in
(b).
7. compare the mean and standard deviations with the
ones obtained from your samples.

13/23
3.1 Introduction to Statistical Calculations
Statistical calculations form the backbone of quantitative
analysis, providing essential tools for interpreting and making
sense of data. In various elds such as economics, nance,
biology, and social sciences, statistical methods are employed
to uncover patterns, trends, and relationships within data sets.
This introduction aims to shed light on the fundamental
concepts and purposes that underpin statistical calculations.
Purpose of Statistical Computations - Statistical
computations serve several crucial purposes in the realm of
data analysis:
▶ Summarization: Statistical calculations help summarize
large and complex data sets into meaningful measures,
providing a concise overview of essential characteristics.

14/23
▶ Description: They facilitate the description of data by
oering insights into its central tendencies, variability, and
distribution.
▶ Inference: Statistical computations enable the drawing
of inferences and conclusions about populations based on
sample data, supporting decision-making processes.
▶ Prediction: Through regression analysis and other
predictive models, statistical calculations empower
analysts to make informed predictions about future trends
or outcomes.
Role of Calculator in Statistical Analysis - The use of
calculators in statistical analysis has become indispensable due
to the following reasons:
▶ Eciency: Calculators streamline complex mathematical
operations, making statistical calculations more ecient
and less prone to human error.
15/23
▶ Accessibility: Modern calculators come equipped with
built-in statistical functions, providing easy access to
measures of central tendency, dispersion, and other
statistical parameters.
▶ Complexity Handling: In scenarios involving large data
sets or intricate computations, calculators facilitate the
handling of complex statistical procedures with speed and
accuracy.
▶ Real-world Application: The integration of statistical
functions into calculators allows for immediate application
in various real-world situations, from business and nance
to scientic research.

16/23
3.2 Measures of Central Tendency and Dispersion
In statistical analysis, measures of central tendency and
dispersion provide valuable insights into the characteristics and
distribution of a data set. They help summarize the data and
understand its variability, aiding in making informed decisions
and drawing meaningful conclusions. Let's delve into each
concept:
Measures of Central Tendency - Measures of central
tendency represent the central or typical value around which
data points tend to cluster. They are primarily:
▶ Mean (Average): The mean is the sum of all data values
divided by the total number of observations. It is highly
sensitive to outliers and extreme values.
▶ Median: The median is the middle value of a data set
when arranged in ascending or descending order. It is less
aected by outliers and is often preferred for skewed
distributions. 17/23
▶ Mode: The mode is the value that appears most
frequently in a data set. A data set can have one mode
(unimodal), multiple modes (multimodal), or no mode.
Measures of Dispersion - Measures of dispersion quantify
the spread or variability of data points around the central
tendency. They provide insights into the consistency and
variability within the data set. The main measures of
dispersion include:
▶ Range: The range is the dierence between the maximum
and minimum values in a data set. It is simple to
calculate but sensitive to outliers.
▶ Variance: The variance measures the average squared
deviation of each data point from the mean. It provides a
more comprehensive understanding of data dispersion but
is not in the original units of the data.
18/23
▶ Standard Deviation: The standard deviation is the square
root of the variance. It represents the average distance of
data points from the mean and is widely used due to its
intuitive interpretation.
Step-by-step calculator computations for mean,
median, mode, range, variance, and standard
deviation
▶ Hands-on using the following data set: 45, 43, 42, 80, 84,
82, 56, 59, 52, 71, 72, 76, 67, 62, 65, 23, 26, 27, 34, 35,
37, 48, 49, 50, 53
▶ Using your scientic calculator, obtain the mean, median,
mode, standard deviation, variance, and range.

19/23
3.3 Time Series Analysis
Time series data is a type of sequential data where
observations are recorded at regular intervals over time. It is a
fundamental component of many elds, including economics,
nance, weather forecasting, and social sciences. Time series
analysis involves studying the patterns, trends, and behaviors
exhibited by the data over time. Here's an overview of key
aspects of time series data:
▶ Temporal Structure: Time series data is organized
chronologically, with observations recorded at successive
time points. The time intervals between observations are
usually regular (e.g., daily, monthly, yearly)
▶ Components of Time Series: Trend, Seasonal
variations, Cyclical variations, and Irregular variations.

20/23
Time Series Analysis Techniques:
▶ Descriptive Analysis: Examining the basic
characteristics of the time series data, such as mean,
median, variance, and standard deviation.
▶ Trend Analysis: Identifying and modeling the underlying
trend in the data to understand long-term behavior.
Linear trend: y = a + a t
Quadratic trend: y = a + a t + a t
t 0 1
2

Logarithmic trend: y = a + a ln(t)


t 0 1 2

Exponential trend: y = a e
t 0 1
b1 t
t 0

t 1 2 3 4 5 6
y
▶ Hands-on t
450 501 523 550 570 601
t 7 8 9 10 11 12
yt 624 700 758 805 809 801

21/23
Practical Exercise
A) Consider the following data on monthly Australian beer
production: 164, 148, 152, 144, 155, 125, 153, 146, 138, 190,
192, 192, 147, 133, 163, 150, 129, 131, 145, 137, 138, 168,
176, 188, 139, 143, 150, 154, 137, 129, 128, 140, 143, 151,
177, 184, 151, 134, 164, 126, 131, 125, 127, 143, 143, 160,
190, 182, 138, 136, 152, 127, 151, 130, 119, 153
Obtain the linear, quadratic, Logarithmic, and exponential
trend equations.
B) The following data are on sales of Shampoo over three year
period: 266, 145.9, 183.1, 119.3, 180.3, 168.5, 231.8, 224.5,
192.8, 122.9, 336.5, 185.9, 194.3, 149.5, 210.1, 273.3, 191.4,
287, 226, 303.6, 289.9, 421.6, 264.5, 342.3, 339.7, 440.4,
315.9, 439.3, 401.3, 437.4, 575.5, 407.6, 682, 475.3, 581.3,
646.9
Obtain the linear, quadratic, Logarithmic, and exponential
trend equations.
22/23
▶ Embrace the journey, discover your potential !

23/23

You might also like