SlideShare a Scribd company logo
20ACS04 –
PROBLEM SOLVING AND
PROGRAMMING USING
PYTHON
PREPARED BY
Mr. P. NANDAKUMAR
ASSISTANT PROFESSOR,
DEPARTMENT OF INFORMATION TECHNOLOGY,
SVCET.
COURSE CONTENT
UNIT-V INTRODUCTION TO NUMPY, PANDAS,
MATPLOTLIB
Exploratory Data Analysis (EDA), Data Science life cycle,
Descriptive Statistics, Basic tools (plots, graphs and summary
statistics) of EDA, Philosophy of EDA. Data Visualization: Scatter
plot, bar chart, histogram, boxplot, heat maps, etc.
EXPLORATORY DATA ANALYSIS (EDA)
Exploratory Data Analysis (EDA) is an approach that is used to
analyze the data and discover trends, patterns, or check assumptions in
data with the help of statistical summaries and graphical
representations.
Types of EDA
Depending on the number of columns we are analyzing we can divide
EDA into three types.
1. Univariate Analysis
2. Bi-Variate analysis
3. Multivariate Analysis
EXPLORATORY DATA ANALYSIS (EDA)
1. Univariate Analysis – In univariate analysis, we analyze or deal with
only one variable at a time. The analysis of univariate data is thus the
simplest form of analysis since the information deals with only one
quantity that changes. It does not deal with causes or relationships and the
main purpose of the analysis is to describe the data and find patterns that
exist within it.
2. Bi-Variate analysis – This type of data involves two different variables.
The analysis of this type of data deals with causes and relationships and
the analysis is done to find out the relationship between the two variables.
3. Multivariate Analysis – When the data involves three or more variables,
it is categorized under multivariate.
EXPLORATORY DATA ANALYSIS (EDA)
Depending on the type of analysis we can also subcategorize EDA into
two parts.
1. Non-graphical Analysis – In non-graphical analysis, we analyze
data using statistical tools like mean median or mode or skewness
2. Graphical Analysis – In graphical analysis, we use visualizations
charts to visualize trends and patterns in the data
DATA SCIENCE LIFECYCLE
Data Science Lifecycle revolves around the use of machine learning
and different analytical strategies to produce insights and predictions
from information in order to acquire a commercial enterprise
objective.
The complete method includes a number of steps like data cleaning,
preparation, modelling, model evaluation, etc. It is a lengthy procedure
and may additionally take quite a few months to complete.
DATA SCIENCE LIFECYCLE
The following are some primary motives for the use of Data science
technology:
 It helps to convert the big quantity of uncooked and unstructured records
into significant insights.
 It can assist in unique predictions such as a range of surveys, elections, etc.
 It also helps in automating transportation such as growing a self-driving
car, we can say which is the future of transportation.
 Companies are shifting towards Data science and opting for this
technology. Amazon, Netflix, etc, which cope with the big quantity of
data, are the use of information science algorithms for higher consumer
experience.
THE LIFECYCLE OF DATA SCIENCE
DESCRIPTIVE STATISTICS
In Descriptive statistics, we are describing our data with the help of various
representative methods like by using charts, graphs, tables, excel files etc.
In descriptive statistics, we describe our data in some manner and present it in
a meaningful way so that it can be easily understood.
Most of the times it is performed on small data sets and this analysis helps us
a lot to predict some future trends based on the current findings.
Types of Descriptive statistic:
 Measure of central tendency
 Measure of variability
DESCRIPTIVE STATISTICS
DESCRIPTIVE STATISTICS
Measure of central tendency:
It represents the whole set of data by single value.It gives us the location of
central points. There are three main measures of central tendency:
1. Mean
2. Mode
3. Median
DESCRIPTIVE STATISTICS
Mean:
It is the sum of observation divided by the total number of observations. It is
also defined as average which is the sum divided by count.
where, n = number of terms
Python Code to find Mean in python:
import numpy as np
# Sample Data
arr = [5, 6, 11]
# Mean
mean = np.mean(arr)
print("Mean = ", mean)
DESCRIPTIVE STATISTICS
Mode:
It is the value that has the highest frequency in the given data set. The data set
may have no mode if the frequency of all data points is the same. Also, we
can have more than one mode if we encounter two or more data points having
the same frequency.
Code to find Mode in python:
from scipy import stats
# sample Data
arr =[1, 2, 2, 3]
# Mode
mode = stats.mode(arr)
print("Mode = ", mode)
DESCRIPTIVE STATISTICS
Median:
It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is median
and if it is even then the median would be the average of two central
elements.
where, n=number of terms
Python code to find Median:
import numpy as np
# sample Data
arr =[1, 2, 3, 4]
# Median
median = np.median(arr)
print("Median = ", median)
DESCRIPTIVE STATISTICS
Measure of variability:
Measure of variability is known as the spread of data or how well is our data
is distributed. The most common variability measures are:
1. Range
2. Variance
3. Standard deviation
DESCRIPTIVE STATISTICS
Range:
The range describes the difference between the largest and smallest data point
in our data set. The bigger the range, the more is the spread of data and vice
versa.
Range = Largest data value – smallest data value
Python Code to find Range:
import numpy as np
# Sample Data
arr = [1, 2, 3, 4, 5]
#Finding Max
Maximum = max(arr)
# Finding Min
Minimum = min(arr)
# Difference Of Max and Min
Range = Maximum-Minimum
print("Maximum = {}, Minimum = {} and Range = {}".format(Maximum,
Minimum, Range))
DESCRIPTIVE STATISTICS
Variance:
It is defined as an average squared deviation from the mean. It is being
calculated by finding the difference between every data point and the average
which is also known as the mean, squaring them, adding all of them and then
dividing by the number of data points present in our data set.
where N = number of terms
u = Mean
Python code to find Variance:
import statistics
# sample data
arr = [1, 2, 3, 4, 5]
# variance
print("Var = ", (statistics.variance(arr)))
DESCRIPTIVE STATISTICS
Standard Deviation:
It is defined as the square root of the variance. It is being calculated by finding
the Mean, then subtract each number from the Mean which is also known as
average and square the result. Adding all the values and then divide by the no
of terms followed the square root.
where N = number of terms
u = Mean
Python code to perform Standard Deviation:
import statistics
# sample data
arr = [1, 2, 3, 4, 5]
# Standard Deviation
print("Std = ", (statistics.stdev(arr)))
BASIC TOOLS OF EDA
TYPES OF EXPLORATORY DATAANALYSIS:
1. Univariate Non-graphical - this is the simplest form of data analysis as
during this we use just one variable to research the info. The standard
goal of univariate non-graphical EDA is to know the underlying sample
distribution/ data and make observations about the population. Outlier
detection is additionally part of the analysis.
2. Multivariate Non-graphical - Multivariate non-graphical EDA technique
is usually wont to show the connection between two or more variables
within the sort of either cross-tabulation or statistics.
BASIC TOOLS OF EDA
TYPES OF EXPLORATORY DATAANALYSIS:
3. Univariate graphical - Non-graphical methods are quantitative and
objective, they are not able to give the complete picture of the data;
therefore, graphical methods are used more as they involve a degree of
subjective analysis, also are required. Common sorts of univariate
graphics are:
 Histogram
 Stem-and-leaf plots
 Boxplots
 Quantile-normal plots
BASIC TOOLS OF EDA
TYPES OF EXPLORATORY DATAANALYSIS:
4. Multivariate graphical - Multivariate graphical data uses graphics to
display relationships between two or more sets of knowledge. The sole
one used commonly may be a grouped barplot with each group
representing one level of 1 of the variables and every bar within a gaggle
representing the amount of the opposite variable.
Other common sorts of multivariate graphics are:
 Scatterplot
 Run chart
 Heat map
 Multivariate chart
 Bubble chart
BASIC TOOLS OF EDA
TOOLS REQUIRED FOR EXPLORATORY DATAANALYSIS:
 R: An open-source programming language and free software environment
for statistical computing and graphics supported by the R foundation for
statistical computing.
 Python: An interpreted, object-oriented programming language with
dynamic semantics. Its high level, built-in data structures, combined with
dynamic binding, make it very attractive for rapid application development,
also as to be used as a scripting or glue language to attach existing
components together.

More Related Content

PPTX
Exploratory Data Analysis.pptx for Data Analytics
harshrnotaria
 
PPTX
Types of Data in Machine Learning, Number aand Categorical
msiad
 
PPTX
Exploratory Data Analysis (EDA) .pptx
ZahidRiazHaans
 
PPTX
Introduction of data science
TanujaSomvanshi1
 
PPTX
Exploratory Data Analysis week 4
Manzur Ashraf
 
PDF
Data Science - Part III - EDA & Model Selection
Derek Kane
 
PDF
Data_Analytics_for_IoT_Solutions.pptx.pdf
ChellamuthuHaripriya
 
PPT
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
hemalatha909597
 
Exploratory Data Analysis.pptx for Data Analytics
harshrnotaria
 
Types of Data in Machine Learning, Number aand Categorical
msiad
 
Exploratory Data Analysis (EDA) .pptx
ZahidRiazHaans
 
Introduction of data science
TanujaSomvanshi1
 
Exploratory Data Analysis week 4
Manzur Ashraf
 
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Data_Analytics_for_IoT_Solutions.pptx.pdf
ChellamuthuHaripriya
 
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
hemalatha909597
 

Similar to UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON (20)

PPT
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
ABIGESH1
 
PPTX
CH 4_TYBSC(CS)_Data Science_Visualisation
sangeeta borde
 
PDF
Introduction to EDA and Data Analytics with Power BI
teodoroferiarevanojr
 
PDF
DAVLectuer3 Exploratory data analysis .pdf
ZaheerAbbas82578
 
PPTX
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Stats Statswork
 
PPTX
11-11_EDA Samia.pptx 11-11_EDA Samia.pptx
samialachgar1
 
PPTX
intoduction of probabliity and statistics
Taranpreet Singh
 
PPTX
Visualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Megha Sharma
 
PPTX
EDA.pptx
yovi pratama
 
PDF
Chapter-Four.pdf
SolomonNeway1
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
PPT
EXPLORATORY DATA ANALYSIS
BabasID2
 
PPT
EXPLORATORY DATA ANALYSIS with tools.ppt
geethar79
 
PPTX
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
rakeshreghu98
 
PPTX
Organizational Data Analysis by Mr Mumba.pptx
bentrym2
 
DOCX
UNIT-4.docx
scet315
 
PPTX
Educational Statistics with Software Application.pptx
MariettaPaje1
 
PPTX
Unit2.pptx Statistical Interference and Exploratory Data Analysis
Priyanka Jadhav
 
PPT
Univariate, bivariate analysis, hypothesis testing, chi square
kongara
 
PPT
day1.ppt
ChemOyasan1
 
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
ABIGESH1
 
CH 4_TYBSC(CS)_Data Science_Visualisation
sangeeta borde
 
Introduction to EDA and Data Analytics with Power BI
teodoroferiarevanojr
 
DAVLectuer3 Exploratory data analysis .pdf
ZaheerAbbas82578
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Stats Statswork
 
11-11_EDA Samia.pptx 11-11_EDA Samia.pptx
samialachgar1
 
intoduction of probabliity and statistics
Taranpreet Singh
 
Visualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Megha Sharma
 
EDA.pptx
yovi pratama
 
Chapter-Four.pdf
SolomonNeway1
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
EXPLORATORY DATA ANALYSIS
BabasID2
 
EXPLORATORY DATA ANALYSIS with tools.ppt
geethar79
 
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
rakeshreghu98
 
Organizational Data Analysis by Mr Mumba.pptx
bentrym2
 
UNIT-4.docx
scet315
 
Educational Statistics with Software Application.pptx
MariettaPaje1
 
Unit2.pptx Statistical Interference and Exploratory Data Analysis
Priyanka Jadhav
 
Univariate, bivariate analysis, hypothesis testing, chi square
kongara
 
day1.ppt
ChemOyasan1
 
Ad

More from Nandakumar P (17)

PPTX
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
PPTX
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
PPTX
UNIT 3: Data Warehousing and Data Mining
Nandakumar P
 
PPTX
UNIT 2: Part 2: Data Warehousing and Data Mining
Nandakumar P
 
PPT
UNIT 2: Part 1: Data Warehousing and Data Mining
Nandakumar P
 
PPTX
UNIT - 1 Part 2: Data Warehousing and Data Mining
Nandakumar P
 
PPT
UNIT - 1 : Part 1: Data Warehousing and Data Mining
Nandakumar P
 
PPTX
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
PPTX
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
PPTX
Python Course for Beginners
Nandakumar P
 
PPT
CS6601-Unit 4 Distributed Systems
Nandakumar P
 
PPT
Unit-4 Professional Ethics in Engineering
Nandakumar P
 
PPT
Unit-3 Professional Ethics in Engineering
Nandakumar P
 
PPT
Naming in Distributed Systems
Nandakumar P
 
PPT
Unit 3.1 cs6601 Distributed File System
Nandakumar P
 
PPTX
Unit 3 cs6601 Distributed Systems
Nandakumar P
 
PPT
Professional Ethics in Engineering
Nandakumar P
 
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 3: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 2: Part 2: Data Warehousing and Data Mining
Nandakumar P
 
UNIT 2: Part 1: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
Nandakumar P
 
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Nandakumar P
 
Python Course for Beginners
Nandakumar P
 
CS6601-Unit 4 Distributed Systems
Nandakumar P
 
Unit-4 Professional Ethics in Engineering
Nandakumar P
 
Unit-3 Professional Ethics in Engineering
Nandakumar P
 
Naming in Distributed Systems
Nandakumar P
 
Unit 3.1 cs6601 Distributed File System
Nandakumar P
 
Unit 3 cs6601 Distributed Systems
Nandakumar P
 
Professional Ethics in Engineering
Nandakumar P
 
Ad

Recently uploaded (20)

PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PDF
Arihant Class 10 All in One Maths full pdf
sajal kumar
 
PPTX
How to Manage Global Discount in Odoo 18 POS
Celine George
 
PPTX
Congenital Hypothyroidism pptx
AneetaSharma15
 
PDF
Exploring-Forces 5.pdf/8th science curiosity/by sandeep swamy notes/ppt
Sandeep Swamy
 
PPTX
IMMUNIZATION PROGRAMME pptx
AneetaSharma15
 
PDF
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Sandeep Swamy
 
PDF
Phylum Arthropoda: Characteristics and Classification, Entomology Lecture
Miraj Khan
 
PDF
Types of Literary Text: Poetry and Prose
kaelandreabibit
 
PPTX
NOI Hackathon - Summer Edition - GreenThumber.pptx
MartinaBurlando1
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPT
Python Programming Unit II Control Statements.ppt
CUO VEERANAN VEERANAN
 
PPTX
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
PDF
Wings of Fire Book by Dr. A.P.J Abdul Kalam Full PDF
hetalvaishnav93
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
PDF
1.Natural-Resources-and-Their-Use.ppt pdf /8th class social science Exploring...
Sandeep Swamy
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Arihant Class 10 All in One Maths full pdf
sajal kumar
 
How to Manage Global Discount in Odoo 18 POS
Celine George
 
Congenital Hypothyroidism pptx
AneetaSharma15
 
Exploring-Forces 5.pdf/8th science curiosity/by sandeep swamy notes/ppt
Sandeep Swamy
 
IMMUNIZATION PROGRAMME pptx
AneetaSharma15
 
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Sandeep Swamy
 
Phylum Arthropoda: Characteristics and Classification, Entomology Lecture
Miraj Khan
 
Types of Literary Text: Poetry and Prose
kaelandreabibit
 
NOI Hackathon - Summer Edition - GreenThumber.pptx
MartinaBurlando1
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
Python Programming Unit II Control Statements.ppt
CUO VEERANAN VEERANAN
 
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
Wings of Fire Book by Dr. A.P.J Abdul Kalam Full PDF
hetalvaishnav93
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
1.Natural-Resources-and-Their-Use.ppt pdf /8th class social science Exploring...
Sandeep Swamy
 

UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON

  • 1. 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON PREPARED BY Mr. P. NANDAKUMAR ASSISTANT PROFESSOR, DEPARTMENT OF INFORMATION TECHNOLOGY, SVCET.
  • 2. COURSE CONTENT UNIT-V INTRODUCTION TO NUMPY, PANDAS, MATPLOTLIB Exploratory Data Analysis (EDA), Data Science life cycle, Descriptive Statistics, Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA. Data Visualization: Scatter plot, bar chart, histogram, boxplot, heat maps, etc.
  • 3. EXPLORATORY DATA ANALYSIS (EDA) Exploratory Data Analysis (EDA) is an approach that is used to analyze the data and discover trends, patterns, or check assumptions in data with the help of statistical summaries and graphical representations. Types of EDA Depending on the number of columns we are analyzing we can divide EDA into three types. 1. Univariate Analysis 2. Bi-Variate analysis 3. Multivariate Analysis
  • 4. EXPLORATORY DATA ANALYSIS (EDA) 1. Univariate Analysis – In univariate analysis, we analyze or deal with only one variable at a time. The analysis of univariate data is thus the simplest form of analysis since the information deals with only one quantity that changes. It does not deal with causes or relationships and the main purpose of the analysis is to describe the data and find patterns that exist within it. 2. Bi-Variate analysis – This type of data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship between the two variables. 3. Multivariate Analysis – When the data involves three or more variables, it is categorized under multivariate.
  • 5. EXPLORATORY DATA ANALYSIS (EDA) Depending on the type of analysis we can also subcategorize EDA into two parts. 1. Non-graphical Analysis – In non-graphical analysis, we analyze data using statistical tools like mean median or mode or skewness 2. Graphical Analysis – In graphical analysis, we use visualizations charts to visualize trends and patterns in the data
  • 6. DATA SCIENCE LIFECYCLE Data Science Lifecycle revolves around the use of machine learning and different analytical strategies to produce insights and predictions from information in order to acquire a commercial enterprise objective. The complete method includes a number of steps like data cleaning, preparation, modelling, model evaluation, etc. It is a lengthy procedure and may additionally take quite a few months to complete.
  • 7. DATA SCIENCE LIFECYCLE The following are some primary motives for the use of Data science technology:  It helps to convert the big quantity of uncooked and unstructured records into significant insights.  It can assist in unique predictions such as a range of surveys, elections, etc.  It also helps in automating transportation such as growing a self-driving car, we can say which is the future of transportation.  Companies are shifting towards Data science and opting for this technology. Amazon, Netflix, etc, which cope with the big quantity of data, are the use of information science algorithms for higher consumer experience.
  • 8. THE LIFECYCLE OF DATA SCIENCE
  • 9. DESCRIPTIVE STATISTICS In Descriptive statistics, we are describing our data with the help of various representative methods like by using charts, graphs, tables, excel files etc. In descriptive statistics, we describe our data in some manner and present it in a meaningful way so that it can be easily understood. Most of the times it is performed on small data sets and this analysis helps us a lot to predict some future trends based on the current findings. Types of Descriptive statistic:  Measure of central tendency  Measure of variability
  • 11. DESCRIPTIVE STATISTICS Measure of central tendency: It represents the whole set of data by single value.It gives us the location of central points. There are three main measures of central tendency: 1. Mean 2. Mode 3. Median
  • 12. DESCRIPTIVE STATISTICS Mean: It is the sum of observation divided by the total number of observations. It is also defined as average which is the sum divided by count. where, n = number of terms Python Code to find Mean in python: import numpy as np # Sample Data arr = [5, 6, 11] # Mean mean = np.mean(arr) print("Mean = ", mean)
  • 13. DESCRIPTIVE STATISTICS Mode: It is the value that has the highest frequency in the given data set. The data set may have no mode if the frequency of all data points is the same. Also, we can have more than one mode if we encounter two or more data points having the same frequency. Code to find Mode in python: from scipy import stats # sample Data arr =[1, 2, 2, 3] # Mode mode = stats.mode(arr) print("Mode = ", mode)
  • 14. DESCRIPTIVE STATISTICS Median: It is the middle value of the data set. It splits the data into two halves. If the number of elements in the data set is odd then the center element is median and if it is even then the median would be the average of two central elements. where, n=number of terms Python code to find Median: import numpy as np # sample Data arr =[1, 2, 3, 4] # Median median = np.median(arr) print("Median = ", median)
  • 15. DESCRIPTIVE STATISTICS Measure of variability: Measure of variability is known as the spread of data or how well is our data is distributed. The most common variability measures are: 1. Range 2. Variance 3. Standard deviation
  • 16. DESCRIPTIVE STATISTICS Range: The range describes the difference between the largest and smallest data point in our data set. The bigger the range, the more is the spread of data and vice versa. Range = Largest data value – smallest data value Python Code to find Range: import numpy as np # Sample Data arr = [1, 2, 3, 4, 5] #Finding Max Maximum = max(arr) # Finding Min Minimum = min(arr) # Difference Of Max and Min Range = Maximum-Minimum print("Maximum = {}, Minimum = {} and Range = {}".format(Maximum, Minimum, Range))
  • 17. DESCRIPTIVE STATISTICS Variance: It is defined as an average squared deviation from the mean. It is being calculated by finding the difference between every data point and the average which is also known as the mean, squaring them, adding all of them and then dividing by the number of data points present in our data set. where N = number of terms u = Mean Python code to find Variance: import statistics # sample data arr = [1, 2, 3, 4, 5] # variance print("Var = ", (statistics.variance(arr)))
  • 18. DESCRIPTIVE STATISTICS Standard Deviation: It is defined as the square root of the variance. It is being calculated by finding the Mean, then subtract each number from the Mean which is also known as average and square the result. Adding all the values and then divide by the no of terms followed the square root. where N = number of terms u = Mean Python code to perform Standard Deviation: import statistics # sample data arr = [1, 2, 3, 4, 5] # Standard Deviation print("Std = ", (statistics.stdev(arr)))
  • 19. BASIC TOOLS OF EDA TYPES OF EXPLORATORY DATAANALYSIS: 1. Univariate Non-graphical - this is the simplest form of data analysis as during this we use just one variable to research the info. The standard goal of univariate non-graphical EDA is to know the underlying sample distribution/ data and make observations about the population. Outlier detection is additionally part of the analysis. 2. Multivariate Non-graphical - Multivariate non-graphical EDA technique is usually wont to show the connection between two or more variables within the sort of either cross-tabulation or statistics.
  • 20. BASIC TOOLS OF EDA TYPES OF EXPLORATORY DATAANALYSIS: 3. Univariate graphical - Non-graphical methods are quantitative and objective, they are not able to give the complete picture of the data; therefore, graphical methods are used more as they involve a degree of subjective analysis, also are required. Common sorts of univariate graphics are:  Histogram  Stem-and-leaf plots  Boxplots  Quantile-normal plots
  • 21. BASIC TOOLS OF EDA TYPES OF EXPLORATORY DATAANALYSIS: 4. Multivariate graphical - Multivariate graphical data uses graphics to display relationships between two or more sets of knowledge. The sole one used commonly may be a grouped barplot with each group representing one level of 1 of the variables and every bar within a gaggle representing the amount of the opposite variable. Other common sorts of multivariate graphics are:  Scatterplot  Run chart  Heat map  Multivariate chart  Bubble chart
  • 22. BASIC TOOLS OF EDA TOOLS REQUIRED FOR EXPLORATORY DATAANALYSIS:  R: An open-source programming language and free software environment for statistical computing and graphics supported by the R foundation for statistical computing.  Python: An interpreted, object-oriented programming language with dynamic semantics. Its high level, built-in data structures, combined with dynamic binding, make it very attractive for rapid application development, also as to be used as a scripting or glue language to attach existing components together.