0% found this document useful (0 votes)

21 views78 pages

Data Science Unit 2-11-08 2023

Uploaded by

rishavsingh7478

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views78 pages

Data Science Unit 2-11-08 2023

Uploaded by

rishavsingh7478

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 78

UNIT II

PROGRAMMING TOOLS FOR

DATA SCIENCE
SYLLABUS
 Mat plot lib
 Bar Charts
 Line Charts
 Scatterplots
 Working with Data: Exploring Data
 Cleaning and Munging
 Manipulating Data
 Rescaling
 Dimensionality Reduction
Mat plot lib
Mat plot lib
(Multi-platform data visualization
library)

 Matplotlib is a multi-platform data visualization library built on NumPy

arrays and designed to work with the broader SciPy stack.
 It was introduced by John Hunter in the year 2002.
 One of the greatest benefits of visualization is that it allows us visual access
to huge amounts of data in easily digestible visuals.
Mat plot lib
(Multi-platform data visualization
library)
What is data visualization?
 Data visualization is the graphical representation of information and data.
 By using visual elements like charts, graphs, and maps, data visualization tools provide
an accessible way to see and understand trends, outliers, and patterns in data.
Mat plot lib
(Multi-platform data visualization
library)
DATA VISUALIZATION

There are five key plots that are used for data visualization.
Mat plot lib

 Matplotlib is easy to use and an amazing visualizing library in Python.

 It is built on NumPy arrays and designed to work with the broader SciPy stack and
consists of several plots like line, bar, scatter, histogram, etc.
Matplotlib Architecture
There are three different layers in the architecture of the matplotlib which are the
following:
 Backend Layer
 Artist layer
 Scripting layer
Mat plot lib

Backend layer
 The backend layer is the bottom layer of the figure, which consists of the
implementation of the various functions that are necessary for plotting.
 There are three essential classes from the backend layer
 FigureCanvas(The surface on which the figure will be drawn),
 Renderer(The class that takes care of the drawing on the surface), and
 Event(It handle the mouse and keyboard events).
Mat plot lib

Artist Layer
 The artist layer is the second layer in the architecture.
 It is responsible for the various plotting functions, like axis, which
coordinates on how to use the renderer on the figure canvas.
Scripting layer
 The scripting layer is the topmost layer on which most of our code will
run.
 The methods in the scripting layer, almost automatically take care of the
other layers, and all we need to care about is the current state (figure &
Mat plot lib

Example:
 We will be plotting two lists containing the X, Y coordinates for the plot.
Pyplot :
 Pyplot is a Matplotlib module that provides a MATLAB-like interface.
 Pyplot provides functions that interact with the figure i.e. creates a figure, decorates
the plot with labels, and creates a plotting area in a figure.
Syntax:
 matplotlib.pyplot.plot(*args, scalex=True, scaley=True, data=None, **kwargs)
Mat plot lib

SAMPLE CODE
 # Adding the title
import matplotlib.pyplot as plt
 plt.title("Simple Plot")
# Adding the labels
# initializing the data
 plt.ylabel("y-axis")
 x = [10, 20, 30, 40]
 plt.xlabel("x-axis")
 y = [20, 30, 40, 50]
 plt.show()

# plotting the data

 plt.plot(x, y)
Mat plot lib

I In the above example, the elements of X and Y provides the coordinates for the x axis and y
axis and a straight line is plotted against those coordinates. e x axis and y axis and a straight
line is plotted against those coordinates.
E.g. Matplotlib
• Matplotlib allows you to make easy things
• You can generate plots, histograms, power spectra, bar
charts, errorcharts, scatterplots, etc., with just a few lines
of code.
Bar Charts
Bar Charts

A bar plot or bar chart is a graph that represents the category of data
with rectangular bars with lengths and heights that is proportional to the
values which they represent.
 The bar plots can be plotted horizontally or vertically.
 A bar chart describes the comparisons between the discrete categories.
 It can be created using the bar() method
 A legend (upper right corner )is used to describe elements for a particular
area of a graph.
 Python has a function called legend() which is used to place a legend on
Bar Charts

Syntax: plt.bar(x, height, width, bottom, align)

 x: This parameter is the sequence of horizontal coordinates of the bar.
 height: This parameter is the height(s) of the bars.
 width: This parameter is an optional parameter. And it is the width(s) of
the bars with default value 0.8.
 bottom: This parameter is also an optional parameter. And it is the y
coordinate(s) of the bars bases with default value 0.
 alighn:This parameter is also an optional parameter. And it is used for
alignment of the bars to the x coordinates.
Bar Charts

Types of Bar Charts

1. Horizontal bar chart
2. Column or Vertical Bar Charts
3. Stacked Bar Chart
4. Grouped Bar Chart
Types of Bar Charts

Horizontal bar chart Vertical Bar Charts Grouped Bar Chart Stacked Bar Chart
Bar Charts

import matplotlib.pyplot as plt

# data to display on plots
# Adding the legends
 x = [3, 1, 3, 12, 2, 4, 4]
 plt.legend(["bar"])
 y = [3, 2, 1, 4, 5, 6, 7]
 plt.show()

# This will plot a simple bar chart

 plt.bar(x, y)

# Title to the plot

 plt.title("Bar Chart")
Bar Charts

 Advantages:
 Summarize large data sets
 Performance tracking
 Accessible to all audiences
 Disadvantages
 Too simple
 Too easily manipulated
linE Charts
linE Charts
A Line plot or Line graph is a graph that represents the category
of data with lines that is proportional to the values which they
represent.
 The line plots can be plotted horizontally or vertically.
 A line chart describes the comparisons between the discrete
categories.
 It can be created using the plot() method.
Types of Line Graphs

Simple Line Graph Multiple Line Graph Compound Line

linE Charts
Different Parts of Line Graph
 Title
 Scale
 Labels
 Bars
 Data values
linE Charts

SYNTAX :
plt.plot(x, y)

 x,y: These parameter are the horizontal and

vertical coordinates of the data points.
linE Charts
# importing the required libraries
import matplotlib.pyplot as plt
import numpy as np

# define data values

x = np.array([1, 2, 3, 4]) # X-axis points
y = x*2 # Y-axis points

plt.plot(x, y) # Plot the chart

plt.show() # display
SCATTER PLOT
Scatter Plot

 Scatter plots are the graphs that represents the relationship between two
variables in a data-set.
 It represents data points on a two-dimensional plane or on a Cartesian
system.
 The independent variable or attribute is plotted on the X-axis, while the
dependent variable is plotted on the Y-axis.
 These plots are often called scatter graphs or scatter diagrams.
 Scatter plots are used to observe the relationship between variables and use
dots to represent the relationship between them.
 The scatter() method in the matplotlib library is used to draw a scatter plot.
Scatter Plot

 A scatter plot is also called a scatter chart, scatter gram, or scatter plot, XY graph. The scatter
diagram graphs numerical data pairs, with one variable on each axis, show their relationship.

 Scatter plots are used in either of the following situations.

 When we have paired numerical data

 When there are multiple values of the dependent variable for a unique value of an independent variable

 In determining the relationship between variables in some scenarios, such as identifying potential
root causes of problems, checking whether two products that appear to be related both occur with
the exact cause and so on.
Scatter Plot

Scatter Plot Uses and Examples

 Scatter plots instantly report a large volume of data.

It is beneficial in the following situations –

For a large set of data points given

 Each set comprises a pair of values

 The given data is in numeric form

Scatter Plot Correlation

Types of correlation
 The scatter plot explains the correlation between two attributes or variables. It
represents how closely the two variables are connected.
 There can be three such situations to see the relation between the two
variables –
 Positive Correlation
 Negative Correlation
 No Correlation
Positive Correlation

 When the points in the graph are rising, moving from left to right, then the scatter plot
shows a positive correlation.

 It means the values of one variable are increasing with respect to another. Now
positive correlation can further be classified into three categories:

 Perfect Positive – Which represents a perfectly straight line

 High Positive – All points are nearby

 Low Positive – When all the points are scattered

Positive Correlation
Negative Correlation

 When the points in the scatter graph fall while moving left to right, then it is called a
negative correlation.

 It means the values of one variable are decreasing with respect to another.

These are also of three types:

 Perfect Negative – Which form almost a straight line

 High Negative – When points are near to one another

 Low Negative – When points are in scattered form

Negative Correlation
No Correlation

 When the points are scattered all over the graph and it is difficult to conclude
whether the values are increasing or decreasing, then there is no correlation
between the variables.
Scatter Plot

import matplotlib.pyplot as plt  # Title to the plot

 # data to display on plots plt.title("Scatter chart")
x = [3, 1, 3, 12, 2, 4, 4] plt.show()
y = [3, 2, 1, 4, 5, 6, 7]

 # This will plot a simple scatter chart

plt.scatter(x, y)

 # Adding legend to the plot

plt.legend("A")
Scatter Plot

The line drawn in a scatter plot, which is near to almost all the points in the plot is
known as “line of best fit” or “trend line“. See the graph below for an example.
Working with Data: Exploring Data
Exploring Data
What is Data Exploration?
 Data exploration refers to the initial step in data analysis. Data analysts
use data visualization and statistical techniques to describe dataset
characterizations, such as size, quantity, and accuracy, to understand
the nature of the data better.
 Data exploration techniques include both manual analysis
and automated data exploration software solutions that
visually explore and identify relationships between different
data variables, the structure of the dataset, the presence of
outliers, and the distribution of data values to reveal patterns
and points of interest, enabling data analysts to gain greater
insight into the raw data.
 Data is often gathered in large, unstructured volumes from
various sources.
Why is Data Exploration Important?
 Humans process visual data better than numerical data.
 Therefore it is extremely challenging for data scientists and
data analysts to assign meaning to thousands of rows and
columns of data points and communicate that meaning without
any visual components.
 Data visualization are shapes, dimensions, colors, lines,
points, and angles.
Data Munging
Data Munging

 Data Munging is the general technique of transforming data from unusable or erroneous form to useful

form.

 Basically the procedure of cleansing the data manually is known as data munging.

 Data mugging is the practice of preparing data sets for reporting and analysis.

 Data munging is a fundamental step in data science, and there are various tools and libraries available in

programming languages like Python and R that aid in this process.

 Some popular libraries for data munging include Pandas,NumPy & scikit-learn in Python and dplyr in R.
Data Munging
Data Munging

Stage 1 : Data Discovery

 Everything begins with a defined goal, and the data

analysis journey isn’t an exception.

 Data discovery is the first stage of data munging, where

data analysts define data’s purpose and how to achieve it
through data analytics.

 The goal is to identify the potential uses and requirements

of data.
Data Munging

Stage 2 : Data Structuring

 Once the requirements are identified
and outlined, the next stage is
structuring raw data to make it
machine-readable.
 Structured data has a well-defined
schema and follows a consistent
layout.
 Think of data neatly organized in rows
and columns available in spreadsheets
and relational databases.
Data Munging

Stage 3 : Data Cleansing

 Once the data is organized into a standardized

format, the next step is data cleansing.

 This stage addresses a range of data quality

issues, ranging from missing values to duplicate
datasets.

 The process involves detecting and correcting

this erroneous data to avoid information gaps.
Data Munging
Stage 4 : Data Enrich
 Data enrichment is the process of filling in
missing details by referring to other data
sources.
 It’s a process that involves appending one or
multiple data sets from different sources to
generate a holistic view of information.
 For example, the raw data might contain partial
customer addresses.
 Data enrichment lets you fill in all address
fields by looking up the missing values
elsewhere, such as in the database or a postal
records lookup.
Data Munging
Stage 5: Data Validate
 Finally, it’s time to ensure that all data
values are logically consistent.
 Validating the accuracy, completeness, and
reliability of data is imperative to the data
munging process.
 Data validation also involves some deeper
checks, such as ensuring that all values are
compatible with the specified data type.
Data Munging in R

“R” is an open source software package directed

at analyzing and visualizing data, but with the
power of the language, and available packages,
it also provides a powerful means of
slicing/dicing the data to get it into a form for
analysis.
Data Munging in R

 In R Programming the following ways are oriented with data munging process:
 apply() Family
 aggregate()
 dplyr package
 plyr package
 In apply() collection of R the most basic function is the apply ( ) function.
 Apart from that, there exists lapply( ), sapply( ) and tapply( ) .
 The entire collection of apply() can be considered a substitute for a loop
Data Munging in R

 In R, aggregate() function is used to combine or aggregate the input data frame by applying a
function on each column of a sub-data frame.

 The plyr package is used for splitting, applying, and combining data.

 The plyr is a set of tools that can be used for splitting up huge or big data for creating a
homogeneous piece, then applying a function on each and every piece and finally combine all the
resultant values.

 The dplyr package can be considered as a grammar of data manipulation which is providing us a
consistent set of verbs that helps us to solve some most common challenges of data manipulation.
Data Munging

Issues with Data Munging

Data munging processes sometimes present issues such as:
 Resource overheads
 Data loss
 Flexibility
 Process errors
Data Munging

Benefits of Data Munging

 Data Quality Improvement
 Enhanced Analysis
 Dealing with Missing Data
 Standardization
 Feature Engineering
 Data Integration
 Reduced Processing Time
 Improved Visualization
 Increased Reproducibility
Data
Manipulation
What is Data Manipulation?
 Data Manipulation Meaning: Manipulation of data is the process of manipulating or
changing information to make it more organized and readable.

Data manipulation provides an organization with many advantages, including:

 Consistent data: It can be structured, read, and better understood by providing data in
a consistent format.
 Project data: it is paramount for organizations to be able to use historical data to
project the future and to provide more in-depth analysis, especially when it comes to
finances.
 Overall, converting, updating, deleting, and incorporating data into a database means
you can do more with the data.
Data Manipulation

For example:

 R provides a library called dplyr which consists of many built-in methods to manipulate the data. So to use
the data manipulation function, first need to import the dplyr package using library(dplyr) line of code.

 Some of few manipulation are

 filter(),

 distinct(),

 arrange(),

 select(),

 rename().
Data Manipulation

 Filter() : The filter() function is used to produce the subset of the data that satisfies the condition specified
in the filter() method.

 distinct() method : The distinct() method removes duplicate rows from data frame or based on the
specified columns.

 arrange() method : In R, the arrange() method is used to order the rows based on a specified column.

 select() method : The select() method is used to extract the required columns as a table by specifying the
required column names in select() method.

 rename() method : The rename() function is used to change the column names.
Data scaling
Data scaling

 Scaling is a technique to standardize the independent features present in the data in a fixed range.

 It is performed during the data pre-processing to handle highly varying magnitudes or values or units.

 Data scaling technique brings data points that are far from each other closer in order to increase the

algorithm effectiveness and speed up the Machine Learning processing.

 Two popular data scaling methods are normalization and standardization.

 Standardization or Z-Score Normalization refers to making data points centered on the mean of all

data points presented through a feature with a unit standard deviation.

Data scaling

 Normalization: is the process of adjusting all measured values from different

scales into one scale.

 Rescaling data is multiplying each member of a data set by a constant term k;

that is to say, transforming each number x to f(X), where

f(x) = k x, and k and x are both real numbers.

 Rescaling will change the spread of your data as well as the position of your data
points.
Dimensionality
Reduction
What is Dimensionality Reduction?
The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.

 Dimensionality reduction is the process of reducing the number of features (or dimensions) in a
dataset while retaining.

 This can be done for a variety of reasons, such as to reduce the complexity of a model, to
improve the performance of a learning algorithm, or to make it easier to visualize the data.
Benefits of applying Dimensionality Reduction

 Some benefits of applying dimensionality reduction technique to the given dataset are given below:

 By reducing the dimensions of the features, the space required to store the dataset also gets
reduced.
 Less Computation training time is required for reduced dimensions of features.
 Reduced dimensions of features of the dataset help in visualizing the data quickly.
 It removes the redundant features (if present) by taking care of multicollinearity.
Disadvantages of dimensionality Reduction

 There are also some disadvantages of applying the

dimensionality reduction, which are given below:

 Some data may be lost due to dimensionality reduction.

 In the PCA dimensionality reduction technique, sometimes
the principal components required to consider are unknown.
Approaches of Dimension Reduction
 There are two ways to apply the dimension reduction technique, which are given below:
 Feature Selection
 Feature selection is the process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a
way of selecting the optimal features from the input dataset.
Three methods are used for the feature selection:
1. Filters Methods

 In this method, the dataset is filtered, and a subset that contains only the
relevant features is taken. Some common techniques of filters method are:

 Correlation
 Chi-Square Test
 ANOVA
 Information Gain, etc.
2. Wrappers Methods

 The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and evaluate
the performance. The performance decides whether to add those features or remove to
increase the accuracy of the model. This method is more accurate than the filtering method
but complex to work. Some common techniques of wrapper methods are:

 Forward Selection
 Backward Selection
 Bi-directional Elimination
3. Embedded Methods:
Embedded methods check the different training iterations of the machine
learning model and evaluate the importance of each feature. Some
common techniques of Embedded methods are:

 LASSO
 Elastic Net
 Ridge Regression, etc.
Feature Extraction:
 Feature extraction is the process of transforming the space containing
many dimensions into space with fewer dimensions. This approach is
useful when we want to keep the whole information but use fewer
resources while processing the information.
 Some common feature extraction techniques are:
 Principal Component Analysis
 Linear Discriminant Analysis
 Kernel PCA
 Quadratic Discriminant Analysis
Thank you

Unit 1 - Chap 2 - Data Visualisation
No ratings yet
Unit 1 - Chap 2 - Data Visualisation
29 pages
Python Plots
No ratings yet
Python Plots
47 pages
DataVisualization - 1 Surya Sir
No ratings yet
DataVisualization - 1 Surya Sir
51 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
MATPLOTLIB NOTES Pandas
No ratings yet
MATPLOTLIB NOTES Pandas
17 pages
Data Visualization I 240217 192738
No ratings yet
Data Visualization I 240217 192738
40 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
30 pages
Introduction Tom at Plot Lib
No ratings yet
Introduction Tom at Plot Lib
38 pages
Unit 4 Plotting Final
No ratings yet
Unit 4 Plotting Final
51 pages
More On Matplotlib
No ratings yet
More On Matplotlib
43 pages
Intrusion Detection and Prevention in Networks Using Machine Learning and Deep Learning Approaches A Review
No ratings yet
Intrusion Detection and Prevention in Networks Using Machine Learning and Deep Learning Approaches A Review
4 pages
Unit V Data Visualization
No ratings yet
Unit V Data Visualization
49 pages
Unit II Visualizing Using Matplotlib
No ratings yet
Unit II Visualizing Using Matplotlib
24 pages
Matplotlib
No ratings yet
Matplotlib
30 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
BT-2016 SEM-IV Project Report (Review 1)
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
42 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
BMW M-4
No ratings yet
BMW M-4
108 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
24 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Unit 5
No ratings yet
Unit 5
10 pages
UNIT - 3 Matplotlib
No ratings yet
UNIT - 3 Matplotlib
10 pages
Module 2
No ratings yet
Module 2
151 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
22 pages
Pyplot
No ratings yet
Pyplot
14 pages
Development Machine Learning Techniques
No ratings yet
Development Machine Learning Techniques
11 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
21 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
97 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
Data Visualization Using Matplotlib and Seaborn
No ratings yet
Data Visualization Using Matplotlib and Seaborn
28 pages
23 AI Deep Fake Deep - Learning - For - Face - Anti-Spoofing - A - Survey
No ratings yet
23 AI Deep Fake Deep - Learning - For - Face - Anti-Spoofing - A - Survey
23 pages
SaiVinayakSanam ML2Project
No ratings yet
SaiVinayakSanam ML2Project
112 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Ieee - 2024 - Fracture Identification in Facial Bone X-Rays - Journel
No ratings yet
Ieee - 2024 - Fracture Identification in Facial Bone X-Rays - Journel
12 pages
DS 2
No ratings yet
DS 2
38 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
Mask RCNN Skin Lesions
No ratings yet
Mask RCNN Skin Lesions
9 pages
Matplotlib Basics
No ratings yet
Matplotlib Basics
27 pages
15octmatplotlib 2024
No ratings yet
15octmatplotlib 2024
4 pages
Article PP 1416-1433
No ratings yet
Article PP 1416-1433
18 pages
Unit 4 Python
No ratings yet
Unit 4 Python
12 pages
CHAPTER-2 Data Visualization
No ratings yet
CHAPTER-2 Data Visualization
4 pages
843 AI Student HandbookXI - PDF 20240803 173743 0000
No ratings yet
843 AI Student HandbookXI - PDF 20240803 173743 0000
16 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
FDS Unit 5 JPR
No ratings yet
FDS Unit 5 JPR
64 pages
19 Matplotlib
No ratings yet
19 Matplotlib
26 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
01 Matplotlib
No ratings yet
01 Matplotlib
2 pages
Data Visualization Using Matplotlib in Python
No ratings yet
Data Visualization Using Matplotlib in Python
15 pages
Data Visulation
No ratings yet
Data Visulation
8 pages
Matplotlib Functions
No ratings yet
Matplotlib Functions
32 pages
Python Univ V
No ratings yet
Python Univ V
16 pages
Unit 5
No ratings yet
Unit 5
25 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices
No ratings yet
YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices
12 pages
ESE Lab File
No ratings yet
ESE Lab File
105 pages
Data Exploration & Visualization - Unit 2
No ratings yet
Data Exploration & Visualization - Unit 2
8 pages
Notes9 - Class - 10 - Data Visualization Using MatPlotlib Notes
No ratings yet
Notes9 - Class - 10 - Data Visualization Using MatPlotlib Notes
5 pages
ML Using Python IT UPDATED
No ratings yet
ML Using Python IT UPDATED
53 pages
Enhancing Egocentric 3D Pose Estimation
No ratings yet
Enhancing Egocentric 3D Pose Estimation
11 pages
Age Face
No ratings yet
Age Face
41 pages
Lec 19
No ratings yet
Lec 19
14 pages
House Price Prediction Using Machine Learning and Artificial Intelligence.
No ratings yet
House Price Prediction Using Machine Learning and Artificial Intelligence.
11 pages
Data Science-Unit-4 - 05.10.23
No ratings yet
Data Science-Unit-4 - 05.10.23
59 pages
Class 1 Data Visualization in Python Using Matplotlib
No ratings yet
Class 1 Data Visualization in Python Using Matplotlib
13 pages
Data Science - UNIT V
No ratings yet
Data Science - UNIT V
46 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
23 pages
Wang The 8th AI City Challenge CVPRW 2024 Paper
No ratings yet
Wang The 8th AI City Challenge CVPRW 2024 Paper
12 pages
Visper
No ratings yet
Visper
5 pages
Chapter 4 Matplotlib Visualisation
No ratings yet
Chapter 4 Matplotlib Visualisation
14 pages
JKV Summarized
No ratings yet
JKV Summarized
61 pages
AAM 6th Prac
No ratings yet
AAM 6th Prac
3 pages
UNIT3
No ratings yet
UNIT3
60 pages
Spam Text Detection Over Social Media Usage A Supervised Sampling Approach
No ratings yet
Spam Text Detection Over Social Media Usage A Supervised Sampling Approach
8 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
18 pages
Unit 4 Data Visualization Using Matplotlib
No ratings yet
Unit 4 Data Visualization Using Matplotlib
42 pages
Multimedia and Computer Vision Unit 5
No ratings yet
Multimedia and Computer Vision Unit 5
25 pages
IV - AI & DS - AL3451 - ML - Unit5 - QB
No ratings yet
IV - AI & DS - AL3451 - ML - Unit5 - QB
5 pages
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
No ratings yet
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
30 pages
Dynamic AI-Augmented Firewall For Real-Time Threat
No ratings yet
Dynamic AI-Augmented Firewall For Real-Time Threat
5 pages
UNIT-5 Important Q-A
No ratings yet
UNIT-5 Important Q-A
22 pages
DAP 5 Module
No ratings yet
DAP 5 Module
68 pages
06-AIA42022424 Online
No ratings yet
06-AIA42022424 Online
12 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Python Unit 4.notes
No ratings yet
Python Unit 4.notes
50 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
2 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
8 pages
Unit 5
No ratings yet
Unit 5
25 pages
Data Visualization Using Pyplot
No ratings yet
Data Visualization Using Pyplot
15 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
7 pages
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)

Data Science Unit 2-11-08 2023

Uploaded by

Data Science Unit 2-11-08 2023

Uploaded by

UNIT II

PROGRAMMING TOOLS FOR

 Matplotlib is a multi-platform data visualization library built on NumPy

 Matplotlib is easy to use and an amazing visualizing library in Python.

# plotting the data

Syntax: plt.bar(x, height, width, bottom, align)

Types of Bar Charts

import matplotlib.pyplot as plt

# This will plot a simple bar chart

# Title to the plot

Simple Line Graph Multiple Line Graph Compound Line

 x,y: These parameter are the horizontal and

# define data values

plt.plot(x, y) # Plot the chart

 Scatter plots are used in either of the following situations.

Scatter Plot Uses and Examples

 Scatter plots instantly report a large volume of data.

It is beneficial in the following situations –

For a large set of data points given

 Each set comprises a pair of values

 The given data is in numeric form

 Perfect Positive – Which represents a perfectly straight line

 High Positive – All points are nearby

 Low Positive – When all the points are scattered

These are also of three types:

 Perfect Negative – Which form almost a straight line

 High Negative – When points are near to one another

 Low Negative – When points are in scattered form

import matplotlib.pyplot as plt  # Title to the plot

 # This will plot a simple scatter chart

 # Adding legend to the plot

programming languages like Python and R that aid in this process.

Stage 1 : Data Discovery

 Everything begins with a defined goal, and the data

 Data discovery is the first stage of data munging, where

 The goal is to identify the potential uses and requirements

Stage 2 : Data Structuring

Stage 3 : Data Cleansing

 Once the data is organized into a standardized

 This stage addresses a range of data quality

 The process involves detecting and correcting

“R” is an open source software package directed

Issues with Data Munging

Benefits of Data Munging

Data manipulation provides an organization with many advantages, including:

 Some of few manipulation are

algorithm effectiveness and speed up the Machine Learning processing.

 Two popular data scaling methods are normalization and standardization.

data points presented through a feature with a unit standard deviation.

 Normalization: is the process of adjusting all measured values from different

scales into one scale.

 Rescaling data is multiplying each member of a data set by a constant term k;

that is to say, transforming each number x to f(X), where

f(x) = k x, and k and x are both real numbers.

 There are also some disadvantages of applying the

 Some data may be lost due to dimensionality reduction.

You might also like