0% found this document useful (0 votes)

23 views19 pages

Statistics Concepts

Uploaded by

vaidehi emani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views19 pages

Statistics Concepts

Uploaded by

vaidehi emani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Essential Statistics concepts to build basic foundation for

Modern Data Scientists📊

Source: Pixels images
In the world of Data Science, there are some important ideas

that makes efficient progress in workflow and also as super tool.

These ideas help data scientists make sense of all the information
they work in it.

Yes, it is none other than Statistics. The basics foundational

concepts that build the process in data science.

In this article, we are going to explore how statistical concepts

contribute to data science. Whether you’re new to data science or
have been doing it for a while, these ideas are like a guidebook. They
help you understand numbers better and use them to make smart
decisions.

So, let’s deep dive into these essential statistical ideas that make data
science so powerful.

First, we can get clear on this what data science is?

The title itself explains you, taking Data and applying scientifical
concepts like statistics, probability and calculus to derive the
meaningful insights out of it.

Data Science is understanding Past information and

predicting future information.
Source: Pixels Images

Examples:

Data science helps us predict the future, like a weather forecast

telling us if it will rain tomorrow. It is not a magic it uses number
and machine learning. It’s about finding the truth in data. It helps us
answer questions and solve problems.

Now we can get into Why statistics is needed in data science

and how it contributes in it?

Statistics is the backbone of data science.

It provides the necessary tools, methods, and principles for data

scientists to explore, analyze, and extract valuable insights from
data. Without statistics, data science would lack the rigor and
reliability needed to make data-driven decisions and solve complex
problems.

It contributes to every process in Data science such as

✅Data Exploration and Summarization

✅Data Cleaning and Preprocessing

✅Inferential Analysis

✅Predictive Modeling

✅Feature Selection

✅Model Evaluation

✅Time Series Analysis

Source: Pixels Images

In statistics, it is broadly classified into various types which applies

in Data science are listed below.

1. Descriptive Statistics

2. Inferential Statistics

3. Regression Analysis
4. Data Sampling

5. Feature Selection

6. Statistical Evaluation on Model

1. Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with

the presentation and summary of data. Its primary goal is to
provide a clear and concise overview of data, allowing for easier
interpretation and understanding.

It involves various concepts to make understanding data easier.

They are

✅Mean (Average)- Measure the average value in the distribution of

numerical data.

✅Median- Provide the average information with more efficient way

compared to Mean and it is not affected by outlier in data.

✅Variance- Measure the Spread in data.

✅Standard Deviation — The square root of the variance, providing

a more interpretable measure of data variability.

✅Percentile- It is a measure that indicated the percentage of data

points that are equal to or below a specific value in a dataset.

✅IQR (Interquartile range)- It is the measure of range between

first quartile and third quartile which helps to identify middle of 50

% of data.

✅Histogram- It is the measure of frequency or count of data points

falling into specific intervals (bins) along the horizontal axis.

✅PDF (Probability Density Function)-It is a statistical function

that describes the likelihood of a continuous random variable taking

on a specific value within a given range.

✅CDF (Cumulative Density Function)- It is a statistical function

that gives the cumulative probability that a random variable is less

than or equal to a specific value.

✅Skewness- It describes the asymmetry in the distribution of data.

✅Kurtosis- It measures the tailedness of the data distribution.

Source: Pixels Images

2.Inferential Statistics

Inferential statistics is a branch of statistics involves data to

make inferences, predictions, or generalizations about
populations based on sample data. It helps us to draw conclusions or
make statements about a larger group (population) by analyzing a
smaller, representative subset of that group (sample).

✅Hypothesis Testing- It formulate hypotheses about population

parameters (e.g., population mean) and use sample data to test

whether these hypotheses are supported or refuted.

✅Estimation- It estimate population parameters based on sample

data.
✅Confidence Interval- It provide a range of values within which a

population parameter is likely to fall.

✅Statistical Tests- A wide range of statistical tests, such as t-tests,

chi-squared tests, ANOVA, and regression analysis, are used in

inferential statistics to compare groups, assess relationships, and
make predictions.

✅Level of Significance- It often denoted by α, which represents

the probability of making a Type I error ie., incorrectly rejecting a

true null hypothesis.

Source: Pixels Images

3. Regression Analysis
Regression analysis is the statistical technique used in Data science
which quantify the relationship between one or
more independent variables (predictors) and a dependent
variable (outcome) in order to make predictions or understand the
impact of the predictors on the outcome.

✅Linear Regression- It makes relationship between a dependent

variable and one or more independent variables by fitting a linear

equation to the data.

✅Multiple Regression- It incorporate two or more independent

variables to predict a single dependent variable.

✅Polynomial Regression- It make relationship between variables

appears to be nonlinear, this model fits a polynomial (e.g., quadratic

or cubic) equation to the data.

✅Ridge Regression and Lasso Regression- Variations of linear

regression that incorporate regularization techniques to handle

multicollinearity and prevent overfitting.
Photo by Enayet Raheem on Unsplash

4. Data Sampling

Data sampling is a statistical technique used in data science to select

a subset of data points from a larger dataset. The purpose of
sampling is to make data analysis more manageable, cost-effective,
and practical, especially when working with large or extensive
datasets.

✅Random Sampling- In this method, every item or member in the

population has an equal chance of being selected for the sample. It

reduces bias and ensures that the sample is representative of the
population.
✅Stratified Sampling- The population is divided into subgroups

or strata based on certain characteristics (e.g., age, gender, location).

Then, random sampling is performed within each stratum to ensure
representation of all groups.

✅Systematic Sampling- The starting point is randomly chosen,

and then every “kth” item is included in the sample. It’s simple and
often more efficient than simple random sampling.

Source: Pixels Images

5.Feature Selection

It the Statistical techniques which guides in selection of relevant

features (variables) for predictive modeling. Techniques
like feature importance and correlation analysis help data
scientists choose the most influential factors.
✅Correlation-Based Feature Selection- Selects features based

on their correlation with the target variable, removing redundant or

highly correlated features.

✅Tree-Based Feature Importance- Decision tree and ensemble

models (e.g., Random Forest, Gradient Boosting) can provide

feature importance scores, which can be used to select the most
important features.

✅Mutual Information- Measures the dependency between

features and the target variable, selecting features with high mutual
information.

✅L1 Regularization (Lasso)- Encourages sparsity in the model by

penalizing the absolute values of feature coefficients, effectively

selecting a subset of features.
Source: Pixels Images

6.Statistical Evaluation on Model

It involves various statistical metrics and tests to quantitatively

measure how well the model performs.

✅Accuracy- Accuracy measures the proportion of correctly

classified instances in a classification model.

✅Mean Absolute Error (MAE)- MAE measures the average

absolute difference between the predicted values and the actual

values.
✅Mean Squared Error (MSE)- MSE calculates the average of the

squared differences between predicted and actual values.

✅Root Mean Squared Error (RMSE)- RMSE is the square root

of MSE, providing an interpretable metric in the same units as the

target variable.

✅R-squared (R²) or Coefficient of Determination- R²

measures the proportion of the variance in the dependent variable

that is explained by the independent variables in the model.

✅Area Under the Receiver Operating Characteristic (ROC

AUC)- It measures the area under the receiver operating

characteristic curve, which plots the trade-off between true positive
rate (recall) and false positive rate at various thresholds.

✅Confusion Matrix- A table that shows the number of true

positives, true negatives, false positives, and false negatives,

providing detailed insights into the performance of a classification
model.

✅Precision- Measures the ratio of true positive predictions to the

total positive predictions, emphasizing the model’s ability to avoid

false positives.
✅Recall- Measures the ratio of true positives to the total actual

positives, emphasizing the model’s ability to find all relevant

instances.

✅F1-Score- The harmonic mean of precision and recall, offering a

balance between the two metrics.

Hi Sweta,
I’m writing this mail to express my interest in PMO position in AXISCADES
Engineering Technologies Limited. I believe my skills and experience would
be a strong asset in your organization and I am confident that I would
make a valuable addition to the team.
I have 5 years of experience in PMO activities as I lead as Deputy Manager
or planning and commercial in Shapoorji Pallonji group and Project
management scheduling on Primavera in Accenture. I’m also PMP certified.
Currently am on sabbatical due to child care and now pursuing PGP in data
science and business analytics from McCombs School of Business -
University of Texas, Austin.
I’m a passionate learner and performer. I I am excited about the possibility
of contributing my skills to your company and being part of PMO team.

Please find attached my resume for your review. I would be delighted to

discuss my application further and answer any questions you may have.

Please find below the details as requested:

 Experience in PMO activities (Mention in Years & & Highlight): 3

years

 Experience in handling Project Management Activities (Mention in

Years & & Highlight): All 5 years
 Experience in Strong Communication skills (emails/phones) and
interpersonal skills, open& flexible for redundant follow ups
( Mention in Years & Highlight): 3 years

 Experience in data processing, communication, and alignment with

Team members on the timesheet tracking and issue resolving
(Mention in Years & & Highlight): 3 years

 Experience in Resources Management & Recruitment Process.

(Mention in years & & Highlight):3 years in resource planning

 Experience in Traction on selected candidates (Mention in years & &

Highlight): No experience

 Experience in MS Office (Word, Excel, and PowerPoint) (Mention in

years & Highlight): 5 years

 Experience in Project Reports Generation (Mention in years &

Highlight): 5 years

 Technical Hands on or Expert in Skill details: Primavera (P6)

Microsoft Project Citrix ERP, Python: Numpy, Pandas, Matplotlib,
Tableau, Microsoft BI Advance MS Excel MS Word, MS Office

 Total Experience : 5 years

 Notice Period: Immediate joinee

 Current Organization Name & Joined date: Last worked Accenture ,

on sabbatical leave due to maternity (2016- present)

 Current CTC Fixed:8L

 Expected CTC Fixed : As per market standard

 Holding Offer CTC: No

 Last Working Day:02-01-2016

 Current Location: Bangalore

 Preferred Location Bangalore (Yes / No): Yes

 Work From Office (Yes / No): Yes

 Qualification: Post graduation programme in Advance construction

management from NICMAR university

 Passing Year : 2011

 Contact No: 7702569889

 Email:[email protected]

 Reason for the New opportunity: To restart my career

Thank you for considering my application.

Regards,
Vaidehi kanagala

21CS64 Data Science and Visualization (PE)
No ratings yet
21CS64 Data Science and Visualization (PE)
37 pages
Data Science Dse
No ratings yet
Data Science Dse
24 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
39 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
30 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
Unit IV
No ratings yet
Unit IV
22 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
Unit I and Unit II Dev
No ratings yet
Unit I and Unit II Dev
36 pages
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
No ratings yet
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
22 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Bd4151 Foundations of Data Science
No ratings yet
Bd4151 Foundations of Data Science
70 pages
Introduction To Data Science Methodology
No ratings yet
Introduction To Data Science Methodology
45 pages
Ds Sem
No ratings yet
Ds Sem
71 pages
Datasciencevictoryy
No ratings yet
Datasciencevictoryy
16 pages
K
No ratings yet
K
11 pages
Da Mid 2
No ratings yet
Da Mid 2
12 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
Statistics
No ratings yet
Statistics
7 pages
ML Unit1
No ratings yet
ML Unit1
15 pages
Data Science
No ratings yet
Data Science
11 pages
Data Science Tools Final
No ratings yet
Data Science Tools Final
11 pages
Prob and Stats in AI Unit-4
No ratings yet
Prob and Stats in AI Unit-4
24 pages
Assignment DSBDS Insem
No ratings yet
Assignment DSBDS Insem
6 pages
S2-Slo1 & Slo2
No ratings yet
S2-Slo1 & Slo2
3 pages
Module1 Data Science
No ratings yet
Module1 Data Science
15 pages
Unit - II - Part I - Importance of Statistics in Data Science
No ratings yet
Unit - II - Part I - Importance of Statistics in Data Science
10 pages
Data Science (Quick Guide) For College Exams
No ratings yet
Data Science (Quick Guide) For College Exams
34 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Data Science
No ratings yet
Data Science
14 pages
24
No ratings yet
24
4 pages
What Exactly Is Data Science
No ratings yet
What Exactly Is Data Science
15 pages
Osx Y4 Rka 7 S CXX 2 HQC A312 Q
No ratings yet
Osx Y4 Rka 7 S CXX 2 HQC A312 Q
3 pages
DTS 201 Lecture Note
No ratings yet
DTS 201 Lecture Note
24 pages
Introduction Data Science Edited
No ratings yet
Introduction Data Science Edited
33 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
FDSMSE Imp
No ratings yet
FDSMSE Imp
6 pages
Maxbox Starter138 Top7 Statistical Methods
No ratings yet
Maxbox Starter138 Top7 Statistical Methods
7 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
EBook - Data Science 4
No ratings yet
EBook - Data Science 4
14 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
Project
No ratings yet
Project
2 pages
Datascience
No ratings yet
Datascience
12 pages
Data Science PDF
No ratings yet
Data Science PDF
11 pages
Unit 1,2
No ratings yet
Unit 1,2
17 pages
Dsur Ea2352001010391 W3
No ratings yet
Dsur Ea2352001010391 W3
3 pages
Unit Ii-Ds
No ratings yet
Unit Ii-Ds
12 pages
ADS IA 1 Syllabus Prep
No ratings yet
ADS IA 1 Syllabus Prep
5 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Science Assignment
No ratings yet
Data Science Assignment
9 pages
Ass-3 Ds
No ratings yet
Ass-3 Ds
7 pages
Intro To Data Science Study Guide
No ratings yet
Intro To Data Science Study Guide
2 pages
Data Science Is A Multidisciplinary Field That Uses Scientific Methods
No ratings yet
Data Science Is A Multidisciplinary Field That Uses Scientific Methods
2 pages
Final Data Science Course (Practicals)
No ratings yet
Final Data Science Course (Practicals)
5 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
5 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
3 pages
Intro To TOK - Crash Course For Teachers
100% (1)
Intro To TOK - Crash Course For Teachers
36 pages
Thesis New
No ratings yet
Thesis New
72 pages
PRE-TEST For Research Methods
100% (1)
PRE-TEST For Research Methods
2 pages
Lawson - Current Debates in Economics - Book
No ratings yet
Lawson - Current Debates in Economics - Book
315 pages
Scientific Process Skills - Quizizz
No ratings yet
Scientific Process Skills - Quizizz
5 pages
Human Resource Management HR 333
No ratings yet
Human Resource Management HR 333
2 pages
Coderva School Improvement Plan
No ratings yet
Coderva School Improvement Plan
16 pages
Thesis Format Aug 2023
No ratings yet
Thesis Format Aug 2023
14 pages
Ajol File Journals - 660 - Articles - 229309 - Submission - Proof - 229309 7780 557267 1 10 20220805
No ratings yet
Ajol File Journals - 660 - Articles - 229309 - Submission - Proof - 229309 7780 557267 1 10 20220805
27 pages
Introduction To The Chicago School: Deductive Qualitative Analysis and Grounded Theory
No ratings yet
Introduction To The Chicago School: Deductive Qualitative Analysis and Grounded Theory
5 pages
Lesson 4 Computing The Variance of A Discrete Probability Distribution
No ratings yet
Lesson 4 Computing The Variance of A Discrete Probability Distribution
13 pages
The Effectiveness of Science, Technology, Engineering and Mathematics (STEM) Learning Approach Among Secondary School Students
No ratings yet
The Effectiveness of Science, Technology, Engineering and Mathematics (STEM) Learning Approach Among Secondary School Students
11 pages
Adaptive Learning Systems in Mathematics Classrooms: January 2018
No ratings yet
Adaptive Learning Systems in Mathematics Classrooms: January 2018
18 pages
Benchmarking of Quality Management Practices
No ratings yet
Benchmarking of Quality Management Practices
13 pages
A Case Study On Service Recovery Frontline Employees' Perspectives and The Role of Empowerment
No ratings yet
A Case Study On Service Recovery Frontline Employees' Perspectives and The Role of Empowerment
11 pages
Final Examination - PART 2
100% (1)
Final Examination - PART 2
4 pages
What Does A Journalist Look Like Visualizing Journalistic Roles Through AI
No ratings yet
What Does A Journalist Look Like Visualizing Journalistic Roles Through AI
24 pages
Research: Is To Again
No ratings yet
Research: Is To Again
31 pages
Estimating Resource Requirements For Work Activities
No ratings yet
Estimating Resource Requirements For Work Activities
8 pages
Synopsis Final
No ratings yet
Synopsis Final
10 pages
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
No ratings yet
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
23 pages
Emerging Strong Program TIF
No ratings yet
Emerging Strong Program TIF
35 pages
Task 1 610 Oscar Diaz
No ratings yet
Task 1 610 Oscar Diaz
8 pages
Detailed Lesson PlanPEandHealth - TecsonNo.4
No ratings yet
Detailed Lesson PlanPEandHealth - TecsonNo.4
2 pages
Nursing Journals
No ratings yet
Nursing Journals
4 pages
Drug Abuse Hyderabad
No ratings yet
Drug Abuse Hyderabad
5 pages
Examination Paper 2019
No ratings yet
Examination Paper 2019
7 pages
Research Methods in Islamic Sciences
No ratings yet
Research Methods in Islamic Sciences
7 pages
The Leadership Compass Activity: Strengthening Solidarity
No ratings yet
The Leadership Compass Activity: Strengthening Solidarity
4 pages
Teaching Research Poster at UGA
No ratings yet
Teaching Research Poster at UGA
1 page
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet