0% found this document useful (0 votes)
6 views

Adv Data Analysis

Uploaded by

vijay tonny
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Adv Data Analysis

Uploaded by

vijay tonny
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

BUSINESS:

Success of that business is dependent on important decision making.

2 Options:

Either decision will be correct or wrong.

Data helps in making correct decisions.

Data driven / backed decision making.

Each business in market generates a lot of data.

Like operations, services, transactions, procurement, expenses.

The key to the profit of the business is rely on the data the company provides.

STATISTICS:

Statistics is a branch of mathematics, which deals with processing, analysing and summarizing the
data.

Every business generates data, companies rely on data for excellence.

Statistical methods depends on type of data.

DATA ANALYST is responsible for data collection, data cleaning, data pre processing, data analysis,
data visualization, EDA.

NOTE:

There are 2 types of data, population and sample information.

Population means complete data, sample is just a portion of population.

SUB BRANCHES OF STATISTICS:

1. Descriptive statistics
2. Inferential statistics.

DESCRIPTIVE STATISTICS:

It’s a study of describing and summarising the data

Convert data into useful information.

We apply some statistical methods to convert data into useful information.

Descriptive statistics deals with population(it deals with complete data).

Statistical methods like mean/avg, median, mode, variance, standard deviation, correlation.

INFERENTIAL STATISTICS:

Inferential statistics works with sample information(portion of population).

Inferential statistics is not the accurate analysis.


In descriptive statistics, there are multiple methods used to convert raw data into information.
Statistical methods depends on the data type.

TYPES OF DATA:

1.NUMERICAL DATA:

Numerical data is the data which include real numbers.

Numerical data represents the quantity of the data.

This is also called as quantitative data.

TYPES OF NUMERICAL DATA:

1.CONTINUOUS:

Continuous data is always a numerical data

Pick any 2 valid data points in the dataset.

If every real number between the picked data points is also a valid value for data, then the data is
called as continuous data.

2.DISCRETE:

Discrete values are countable between any 2 points.

In discrete, If we pick any 2 valid points. The real numbers between the 2 valid points is not a data
point in the dataset.

2.CATEGORICAL DATA:

Categorical data include classes or categories or quality / property.

It is also called as qualitative data.

TYPES OF CATEGORICAL DATA:

1.ORDINAL:

Ordinal data is categorical data where order is present among categories / classes.

Ex-grade of apple.

2.NOMINAL:

Nominal data is the categorical data with no specified order among the classes / cateegories.

Ex- colours

3.BINARY:

It has only 2 categories which are opposite of each other.

Ex-True / False

NOTE:

Before applying any statistical method, 1st understand the data type.
STATISTICAL METHOD:

1.MEAN / AVERAGE:

Mean is a statistical method.

Gives a number to represent centre of the data.

Used to compare multiple dataset for decision making.

Formula  sum of all data points/number of data points.

The result is called as metric(engineer calls it) or statistic(statistician calls it).

APPLICATIONS OF MEAN:

Representing central value of the data.

Mean is used for comparing 2 datasets.

2.MEDIAN:

It is a measure of central tendency of the data.

It gives the middle value of the data when it is sorted.

The data must be sorted in ascending order.

Here, there are 2 Cases 

(i) When number of data points are odd,


Median = (n+1)/2

(ii) When number of data points are even,


Median = nth term + (n+1)th /2

When outliers are present in the data, then median is more reliable metric then mean.

Mean is highly influenced by the mean.

Median is sightly / not influenced by the data.

Outliers present in data, effects the mean value which is not good for proper data analysis.

MODE:

Mode gives the data point/term which is most frequent in the data.

Mode is the term that occurs most number of times in the data.

NOTE:

Mean and median is used for Numerical data only.

While mode is used for both numerical and categorical data.

A data set can have multiple modes but the mean and median of the data set will be only one.

You might also like