0% found this document useful (0 votes)
28 views19 pages

B.SC (CS With AI) Unit - 1

Maths bsc cs sem 3

Uploaded by

keerthivasanr869
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views19 pages

B.SC (CS With AI) Unit - 1

Maths bsc cs sem 3

Uploaded by

keerthivasanr869
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

II B.

Sc (Computer Science with Artificial Intelligence)

Subject: Statistics –I Subject Code : 226E3B

Unit-1

Sampling

1
How do we study a population?
A population may be studied using one of two approaches: taking a census, or
selecting a sample.
It is important to note that whether a census or a sample is used, both provide
information that can be used to draw conclusions about the whole population.

What is a census (complete enumeration)?


A census is a study of every unit, everyone or everything, in a population. It is
known as a complete enumeration, which means a complete count.

Population: Population refers to the total set of observations that can be made.
Eg: All the students in a college.

Finite population: If a population has finite number of elements.


Eg: Number of chair in a college, Number of leaves etc. (countable)

Infinite population: If a population has infinite number of elements.


Eg: number of start in the sky, Number of points on line, Number of fish in the
sea. (Uncountable)

Sample: Subset, or part of a population. Eg: A few drops of blood, is a sample

Sample Size: Number of units in a sample

Census: Count of an entire population

Sampling unit: One unit from a population. (Element) E.g., if the population
is defined to be l00 trees on a lot, then the sampling unit is a single tree.

Parameter: It is a constant that describes a population. Eg. In a college there are


40% boys. This describes the population. Hence it is a parameter.

Statistic (not to be confused with Statistics): Statistic is a constant that


describes the sample. Eg: Out of 200 students of the same college 45% boys.
This 45% will be statistic as it describes the sample

2
Notation:
Statistic Parameter
Mean ̅
𝒙 𝝁
Standard Deivation(S.D) s 𝝈
Variance 𝒔𝟐 𝝈𝟐
Size n N

Sampling Distribution: A probability distribution consisting of all possible


values of a sample statistic is known as sampling distribution

Standard error: Standard deviation of the sampling distribution of the statistic


x
is called standard error  x 
n

Sampling Error: The sampling error is the difference between a population


parameter and a sample statistic.

Characteristics of good sampling technique


1. Much cheaper.
2. Saves times
3. Much reliable.
4. Scientific in nature.
5. Very suitable for carrying out different surveys.

Advantages of sampling and limitation (Disadvantages)

Advantages Disadvantages
Very accurate Chances for bias
Very reliable Problems of accuracy
Take less time Untrained manpower
Low cost of sampling Absence of the informants
Scope of sampling is high Chances of committing the error in
sampling

3
TEN Mark Question and Answers
Explain Probability Sampling or Random sampling or Method of sampling.
1. Simple Random Sampling (SRS): Simple Random Sampling is a
probability sampling in which each unit in the population has an equal chance
of being included in the sample.
There are two types of Simple Random Sampling - Simple Random Sampling
Without Replacement (SRSWOR) and Simple Random Sampling With
Replacement (SRSWR).
Suppose you are going to buy apple from a fruit shop. You are selecting five
apples one by one from a basket of apples without replacing the selected ones.
This type of sampling in which all units have an equal chance of being included
in the sample is called as simple random sampling without replacement. If the
sampling is done by replacing the selected unit it is called simple random
sampling with replacement. If a population consists of N units and a sample of n
units to be taken, the possible number of samples in SRSWOR is N𝐶𝑛 and in
SRSWR is 𝑁 𝑛
Randomization is a method and is done by using a number of techniques as:
(a) Tossing a coin. (b) Throwing a dice. (c) Lottery method. (d) Blind folded
method. (e) By using random table of ‘Tippett’s Table’.
Lottery Method
For example, suppose we have to select five students out of 50 to visit an old
age home. We assign numbers from 1 to 50 to the students. 50 identical slips are
made for these students. These slips folded and put in a box and shuffle
thoroughly. Then five slips are drawn. Suppose the numbers drawn are 34, 6,
48, 37 and 20. Then the students bearing these numbers are selected for visiting
the home.

4
Advantages
(a) Easy to understand. (b) Easy to analyse and interpret result
(c) Easy to detect errors. (d) Usually representative of the population.
Disadvantages
(a) Selection according to strictly random basis is not possible.
(b) Lack of control of the investigator.
(c) Random sampling does not suit heterogeneous groups.

2. Systematic Sampling: A sampling method in which one unit is selected at


random and the remaining units are selected at an interval of predetermined
length is called systematic Sampling
Suppose we want to select a systematic sample of 8 units out of 48 units. To do
48
this we first find the sampling interval k = = 6. The first unit in the sample
8

is selected by a random number r between 1 and 6. Let it be 3. Then the third


unit will be selected to the sample. There after every sixth unit will be selected
automatically into the sample. Hence the resulting systematic sample will
contain the units with the following serial numbers 3, 9, 15, 21, 27, 33, 39,45
Advantages
(a) Easy to understand. (b) It reduces the field cost.
(c) Easy to analyse and interpret result
(d) Usually representative of the population.
(e).This is a simple method of selecting a sample.
Disadvantages
(a) Periodicity in list of population elements.
(b) Information of each individual is essential.
(c) This method can’t ensure the representativeness.
(d) There is a risk in drawing conclusions from the observations of the sample.

5
3. Stratified Sampling: The universe of entire population is divided into a
number of strata (or) group so it is named as stratified sampling. Once whole
universe is divided into various groups certain numbers of items are taken from
group at random.
Eg: All the student of a college may be divided in groups of boys and girls.
Advantages
(a) Easy to understand. (b) Easy to analyse and interpret result
(c) Usually representative of the population.
(d) Allows subgroup comparisons.
(e) Results represent population without weight.
Disadvantages
(a) Require subgroup identification of each population element.
(b) May be costly and difficult to prepare lists of elements in each subgroup.
(c) Requires Knowledge of the proportion of each subgroup in the population.
(d) It is costly and time consuming method.

Example: Stratified Random Sampling

Consider a population which consists of males and females who are smokers
or non smokers.
Population

Male Female

Smoker Non -Smoker Smoker Non-Smoker

6
Example: Simple Random Sampling

If a population consists of N units and a sample of n units to be taken,


thepossible number of samples in SRSWOR is Ncn and in SRSWR is Nn.
If a population consists of 5 numbers 2,3,6,8 and 11.Consider all simple
random samples of size 2 that can be drawn
1. with replacement
2. without replacement

Samples: Using SRSWOR


(2,3), (2,6), (2,8), (2,11), (3,6), (3,8), (3,11), (6,8), (6,11), (8,11)

5C2 = 10 samples
Samples: Using SRSWR
(2,2), (2,3), (2,6), (2,8), (2,11), (3,2), (3,3), (3,6), (3,8),(3,11), (6,2), (6,3),
(6,6),(6,8), (6,11), (8,2), (8,3), (8,6), (8,8), (8,11), (11,2), (11,3), (11,6),
(11,8),(11,11)
52 = 25 samples.
Definition

Statistics: Collection, presenting, analyzing and interpreting of numerical data.

Data or (variable): Originally collected observations are called data.

Data Value or Datum: Datum is a single measurements or observation.

Raw Data: Data collected in original form.

Array: The numerical raw data is arranged in ascending or descending order is


called an array

Variable : A variable is a characteristic or an attribute that can assume different


values. Eg: Height and weight of a person
Qualitative Variables : Variables which assume non-numerical values.
Eg. Eye Color (White, Black and Blue), Religion , Gender (Female and Male)

7
Quantitative Variable : Variables which can assume a numerical values.
Example: Age , Weight , Height , Income , Expenditure
Discrete Variables: Variable which can assume a finite number of possible
values. Eg: Number of pages in a book, Number of apples in a basket.
Continuous Variables : Variable which and assume an infinite number of
possible values. Eg: Students heights, ages, weights, Income of a family

Explain Scale of measurement


Nominal Data: Data of the Nominal level consist of name, labels and
categories. Examples: - Blood type (O, AB, A, B), Gender (male, female)
Ordinal Data: Data of the Ordinal level consist of data that can be ordered.
Examples: - Class ranking, Size of shirt (40, 42, 44),
Letter grades (A, B, C, D, F),
Interval Data: Data of the Interval level consist of data that can be ordered and
differences are meaningful. E.g: IQ Score, temperature, calendar date.
Ratio Data: Data of the Ratio level consist of data that can be ordered;
differences are meaningful and zero corresponding to none of the value.
E.g: Age, Weight, Height.

8
Types of data
Primary and Secondary Data
Primary Data
The data is collected directly from the sources then it is called primary data.
(or Original Data or First Time) Eg: 1.List of Absentees in a class.
2. Ram has collected the data of statistics marks from the students in person.
Secondary data
Secondary data consists of second hand information which has already been
collected. Example: Population census data, annual rain fall, budget records.
Difference between Primary and Secondary data.
Primary data Secondary data
Original data Not original data
First hand information Second hand information
More money Less money
More time Less time
After use become secondary data Data cannot converted to primary data

Explain methods of primary data


Direct Personal interview
In this method the investigator, collect the data personally. i.e. he approaches the object,
conduct enquiry on the spot, collect information and so on.
Advantages and Disadvantages of Personal interview method
Advantages Disadvantages
Highest response rate Most expensive
Allows all types of questions Informants can be influenced
Allows clearing doubts regarding questions Takes more time

Indirect oral investigation


In this method the investigator, do not collect data directly instead he gets it through his
enumerator.
Advantages and Disadvantages of Indirect oral investigation method
Advantages Disadvantages
It is simple and convenient The information cannot be relied because of
absence of direct contact
It saves time, money and labour Interview with an improper man will spoil
the results
The information is unbiased Witnesses may color the information
according to their interests

9
Investigation through Local Reports
In this method, data are through local agents or correspondent. They collect information in
their own fashion according to their likes and dislikes
Advantages and Disadvantages of Information from Correspondents method
Advantages Disadvantages
Speedy information is possible Data may not be original
Extensive information can be had Uniformity cannot be maintained
It is useful where information is needed The information may be biased
regularly

Mailed Questionnaire
This method of investigation is done by the investigator sending questionnaire to the
respondent
Advantages and Disadvantages of Mailed questionnaire method
Advantages Disadvantages
Least expensive Long response time
Only method to reach remote areas Cannot be used by illiterates
Informants can be influenced Doubts cannot be cleared regarding questions

Telephonic interviews:
Data is collected through an interview over the telephone with the interviewer.
Advantages and Disadvantages of Telephonic Interview method
Advantages Disadvantages
Relatively low cost Limited use
Relatively high response rate Reaction cannot be watched
Less influence on informants Respondents can be influenced

Note:
Investigator: The person who conducts statistical investigation.
Enumerator: The person you helps investigator in the collection of data.
Respondent: The person or an institution that provide information to the
investigator or enumerator is called Respondent or Informant.

10
Explain methods of Secondary Data
Published data are available in various resources including
Libraries
A common place to look for secondary data is a library. Here, data can be
obtained from magazines, journals and newspapers.
Government agencies
Government data can be obtained from publications issued by local, state,
national and international governments. Such data include laws, regulations,
statistics and consumer information.
Internet
Secondary data can be obtained from search engines such as Yahoo, Google,
MSN.com, etc., on the internet.
Government publications.
• Office records in panchayats, municipalities etc.
• Survey reports of various research organizations.
• Survey reports in Journals, Newspapers and other publications.
• Websites.
Unpublished
Official records and files of the government and private and offices
Studies made by research institutions.
Diaries
Letters.

11
Classification of Data

Classification: The process of arranging things or group.

Objective of classification (Use of classification)


1. To easy understanding.
2. To easy comparison.
3. To simplify complex data.
4. Helps in understanding and interpreting the data easily.

Explain type of Classification.

Qualitative Classification: Classification of data based on some non-


measurable characteristics such as religion, occupation, employment, Gender
etc. is known as qualitative classification.
Eg: Classification of the population based on the mother tongue.
Quantitative Classification: Classification of data based on some measureable
characteristics is known as quantitative classification.
Eg: age, experience, income, prices, production, sales
Geographical Classification: In this type, the data is classified based on the
place , area or region.
Eg: 1. Production of rice in different states 2. Population of India state wise
Chronological Classification: In this type, the data is classified according to
the time of its occurrence.
Eg: 1. rainfall for 12 months. 2. Population of country over a period of year.

12
Tabulation
Tabulation: A table is a systematic arrangement of statistical data in rows and
columns

Objective of Tabulation (Use of Tabulation):


1. It helps in comparing data.
2. It saves space and time.
3. Tabulated data can be easily presented in the form of diagrams and graphs
4. To make analysis and interpretation easy.

Explain main parts of a table.


Table No.
Title
Stub Caption Total

Body
Total
Foot- note
1. Number and Title indicating the serial number of the table and subject matter
of the table.
2. Stub i.e., the headings of the row.
3. Caption i.e., the headings of the column.
4. Body i.e., figures to be entered in the table.
5. Foot-note is source from which the data have been obtained.

Explain types of table (tabulation).


Simple Table – Data are presented according to one characteristic only.
E.g. The student classified by the medium of institution.
Double Table – Data are presented about two interrelated characteristic of a
particular variable. E.g. The student classified by the medium of institution
and their gender.
Three Way Table – This table gives information regarding three interrelated
characteristics of a particular variable.
Manifold Table – this table explains more than three characteristics of the data
E.g. The student classified by the medium of institution, their gender and
their residence.

13
Tally Marks
Tally marks are the representation of the data in the form of vertical lines. We
put one vertical line (|) for each of the four counts. A diagonal line (\) is put for
the fifth count.

These marks are tally marks.


1. Tally marks for Number 4 is ||||
2. Number 5 is represented as ||||
3. The representation of 6 as |||| | and so on.
Frequency Distribution
Frequency table (or) frequency distribution is a method to present raw data in
the form from which can easily understand the information contained in the raw
data.
Eg: The number of children in 20 families

Types of Data

1. Raw Data (Individual or ungrouped)

2. Discrete Data

3. Continuous Data(Grouped data)

Raw Data (Individual or ungrouped)

1. Weight of 7 students in class

42, 40, 45, 52, 41, 59, 70

2. Marks for ten students in statistics

32, 22, 45, 48, 13, 19, 34, 39, 40, 12

3. Age of 6 children's

4, 3, 2, 5, 6, 3

14
Discrete Data

Marks(X) No. of Students(f) Age(X) No. of Students(F)


0 11 1 7
1 12 2 10
2 10 3 15
3 17 4 8
4 14 5 4

Total 64 Total 44

Size(X) No. of Shoes(f)


4 5
5 10
6 15
7 12
8 4

Total 46

Continuous Data (Grouped data)

Marks(C.I) No. of Students (F) C.I f


10-20 2 5 - 9 10
20-30 42 10 - 14 20
30-40 28 15 - 19 15
40-50 11 20 - 24 12
.Total N = 85 .Total N =5 7

15
Class: Interval in called a class.
Class Limit: Upper limit and lower limit.
Class Interval: Difference between upper and lower limit.
𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
Mid Value or Mid Point =
2

Exclusive Type (Closed Interval): There is no gap between upper and next
lower limit.
Weight No. of Students
L.L 10-20 U.L 15
20-30 20
30-40 35
40-50 19
50-60 5

Inclusive Type (Open Interval): There is a gap between upper and lower limit
of the next class.
Marks(C.I) No. of Students Marks(C.I) No. of Students
10-19 6 9.5-19.5 6
20-29 10 19.5-29.5 10
30-39 15 29.5-39.5 15
40-49 5 39.5-49.5 5

Frequency: The number of times that a certain value appears.


Frequency Distribution: Raw data in table form with classes and frequencies.

16
Explain Frequency Distribution.
Frequency distribution organizes data into categories (ranges) and shows
how many times each category occurs.
Discrete Data: Discrete data can only be specific values.
Eg: The student’s marks are given below:
Marks 5 10 15 20
No.of Students 3 12 8 2
Continuous Data: Continuous data can be any value within a range.
Eg: The heights of students are given below:
Height 3-4 4-5 5-6 6-7
No. of Students 5 10 6 1

Explain cumulative frequency distribution.


Cumulative frequency distribution is a table that shows the cumulative
frequencies of data values up to a certain point.
Less Than Cumulative Frequency Distribution: Adds up frequencies from
the lowest value to each successive class.
More Than Cumulative Frequency Distribution: Calculates cumulative
frequencies from the highest to the lowest class.
Marks No.of Less than CF More than CF
Students

10-20 8 20 8 10 47

20-30 12 30 8+12=20 20 47-8=39

30-40 18 40 20+18=38 30 39-12=27

40-50 9 50 38+9=47 40 27-18=9

N=47 50 9-9=0

17
Bivariate Frequency Distribution
If only one characteristic of the sampling units is measured for the study, it is
called Univariate Data. If two characteristics are measured simultaneously
from each unit, it is known as Bivariate Data. Similarly data containing
measurements of more than two characteristics of each unit is called
Multivariate Data.

For example if only the height of the students is measured for the study, it is
Univariate Data. Usually we represent it by x, y, z, etc.

Height (in inches) : 52, 51, 57, 62, 68

If we measure the height and weight of each student for a study, it is a


Bivariate Data. We represent it by (x, y) or (x1 , y1 ) where first variable is the
height and the second variable is the weight.

Height in inches and weight in Kg:

(52, 45), (51, 62), (57, 58), (62, 70), (68, 73)

The same data can also be represented as,

Height (x): 52 51 57 62 68

Weight (y): 45 62 58 70 73

The frequency distribution table of a Bivariate Data is called Bivariate


Frequency Table.

18
Contingency Table

Definition:

A contingency table is a simple table used to show how two different categories
or groups are related. It helps us see how often different combinations of these
categories occur.

Example: Let’s say we have a group of students and we want to see how their
choice of favorite fruit (Apple or Banana) relates to their choice of favorite
color (Red or Blue).

Here’s what the table might look like:

Red Blue Total


Apple 10 5 15
Banana 8 12 20
Total 18 17 35

In this table:
- The rows show the favorite fruit (Apple or Banana).
- The columns show the favorite color (Red or Blue).
- Each cell shows the number of students who like a particular fruit and color
combination.
What It Shows:
10 students like Apple and Red.
5 students like Apple and Blue.
8 students like Banana and Red.
12 students like Banana and Blue.
The totals at the bottom and right side show the overall counts for each
category.

19

You might also like