0% found this document useful (0 votes)

44 views70 pages

Ch-2 Python Libraries For ML

iml

Uploaded by

yashbhimani24082004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views70 pages

Ch-2 Python Libraries For ML

iml

Uploaded by

yashbhimani24082004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning

1
Ch – 2 Python Libraries for Machine
learning
❖Table of Contain
2.1 NumPy
❖ Creating Array: array()
❖ Accessing Array: by referring to its index number
❖ Stacking & Splitting: stack(), array_split()
❖ Maths Functions: add(), subtract(), multiply(), divide(), power(), mod()
❖ Statistics Functions: amin(), amax(), mean(), median(), std(), var(), average(), ptp()

2
NumPy

3
NumPy
❖NumPy (Numerical python) is an Open Source Python library that’s used in almost
every field of Science and Engineering.

❖ It’s the universal standard for working with numerical data in Python, and it’s at the core
of the scientific Python and PyData ecosystems.

❖NumPy is a general-purpose array-processing package. It provides a high-performance

multidimensional array object and tools for working with these arrays.

4
Features of NumPy
NumPy has various features including these important ones:
❖ A powerful N-dimensional array object
❖ Sophisticated (broadcasting) functions
❖ Tools for integrating C/C++ and Fortran code
❖ Useful linear algebra, Fourier transform, and random number capabilities

5
Installation Process NumPy:

6
7
8
9
10
11
12
13
Array transforms sequences of sequences into two dimensional arrays, sequences of sequences into three-
dimensional arrays and so on….

14
15
16
* Sometimes, we want to create arrays, but we don’t know the exact numbers to put in them yet, NumPy has
special functions to help us with that. Let’s see how they work in a simpler way :

*Zeros : The Zeros functions creates an array filled with all Zeros. It’s like having an empty container ready to
hold a bunch of Zeros.

17
*Ones : The ones function creates an array filled with all ones. It’s like having a container ready to hold a
bunch of ones.

18
*Empty : The empty function creates an array without any specific initial numbers. It’s like having a container
that has some random stuff inside, but we-re not sure what it is yet. The content depends on the memory
state of the computer.

19
20
21
22
23
24
arange(starting_number , ending_number , delta) :

In NumPy, you can create a regularly spaced array using the NumPy. Arrange() function. This function generates a
sequence of numbers within a specified range with a specified step size. The basic syntax for NumPy. Arrange() is
as follow :

NumPy. Range([start, ]stop, [step, ]dtype=None)

Where:
• start(optional) : The start of the sequence(inclusive). If not provided, it defaults to 0.
• stop : The end of the sequence(exclusive). The generated sequence will go up to , but not include, this value.
• step(optional) : The step size between the numbers in the sequence. If not provided, it defaults to 1.
• dtype(optional) : The data type of the elements in the resulting array. If not provided, it will be automatically
determined based on the inputs.

25
❑ When arrange is used with floating point arguments, it is generally not possible to predict the number of
elements obtained, due to the finite floating-point precision. For this reason , it is usually better to use the
function linespace that receives as an argument the number of elements that we want, instead of the step :

26
Example of arange :
import numpy as np

# Create an array from 0 to 9 with a step of 1

arr = [Link](10)

print(arr) # Output: [0 1 2 3 4 5 6 7 8 9]

# Create an array from 2 to 10 with a step of 2

arr = [Link](2, 11, 2)

print(arr) # Output: [ 2 4 6 8 10]

# Create an array from 0 to 1 with a step of 0.2

arr = [Link](0, 1.2, 0.2)

print(arr) # Output: [0. 0.2 0.4 0.6 0.8 1. ]

27
❑ NumPy also allows you to access a range of elements from an array using slicing. Slicing is done using the
colon : notation within square brackets []. The syntax for slicing is [ start : stop : step ], where start is the
starting index , stop is the stopping index (exclusive) , and step is the interval between elements.

28
29
30
31
32
Explain Reshape() :

In NumPy, the reshape() function is used to change the shape (dimensions) of an existing array while preserving
the total number of elements. Reshaping is a common operation when you want to convert a one-dimensional
array into a multi-dimensional array or change the dimensions of a multi-dimensional array. The reshape()
function returns a new view of the original data with the specified shape.
The basic syntax for the reshape() function is as follows:
[Link](a, newshape, order='C')
Where:
a: The array to be reshaped.
newshape: A tuple or list specifying the new shape (dimensions) of the array. The total number of elements in the
original array must match the total number of elements in the new shape.
order (optional): Specifies the memory layout order of the elements in the reshaped array. It can be 'C' for C-style
(row-major) or 'F' for Fortran-style (column-major). The default is 'C'.

33
import numpy as np import numpy as np

# Create a 1D array with 12 elements

# Create a 1D array arr1d = [Link](12)

arr1d = [Link]([1, 2, 3, 4, 5, 6]) # Reshape the 1D array into a 3D array with dimensions
(2, 3, 2)
arr3d = [Link]((2, 3, 2))
# Reshape the 1D array into a 2D array print(arr3d)
with 2 rows and 3 columns Output:
[[[ 0 1]
arr2d = [Link]((2, 3)) [ 2 3]
[ 4 5]]
print(arr2d)
Output: [[ 6 7]
[ 8 9]
[[1 2 3] [10 11]]]
[4 5 6]]

34
Explain Array Splitting:
In NumPy, you can split an array into smaller sub-arrays using various functions, depending on your requirements.
Here are some common ways to split arrays:

1. [Link](): This function splits an array into multiple sub-arrays along a specified axis.

import numpy as np

arr = [Link]([1, 2, 3, 4, 5, 6])

sub_arrays = [Link](arr, 3) # Split into 3 equal parts
print(sub_arrays)
Output: [array([1, 2]), array([3, 4]), array([5, 6])]

35
Explain Array Splitting:
2. numpy.array_split(): Similar to [Link](), but allows for uneven splits. You can specify the number of splits
you want.

import numpy as np

arr = [Link]([1, 2, 3, 4, 5, 6, 7])

sub_arrays = np.array_split(arr, 3) # Split into 3 parts
print(sub_arrays)
Output: [array([1, 2, 3]), array([4, 5]), array([6, 7])]

36
Explain Array Splitting:
3. [Link]() and [Link](): These functions split arrays horizontally and vertically, respectively. They
are useful for 2D arrays.
import numpy as np

arr2d = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

horizontal_split = [Link](arr2d, 3) # Split into 3 columns
print(horizontal_split)
Output: [array([[1], [4], [7]]), array([[2], [5], [8]]), array([[3], [6], [9]])]

vertical_split = [Link](arr2d, 3) # Split into 3 rows

print(vertical_split)
Output: [array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]

37
2.2 PANDAS
• Pandas is mostly used for data analysis tasks in Python. Numpy is mostly used for working with Numerical values as it
makes it easy to apply mathematical functions. Pandas library works well for numeric, alphabets and heterogeneous
types of data simultaneously.
• Pandas is an open-source Python library used for data manipulations, analysis and visualization. It provides easy-to-use
data structures and data analysis tools for Python, making it a popular choice for working with data in scientific
computing, finance, data analysis and machine learning.
• The key data structures in Pandas are the series and DataFrame. A Series is a one-dimensional labelled array that can
hold any data type. A DataFrame is a two-dimensional labelled data structure with columns of potentially different
types. In addition to these data structures, Pandas provides functions for reading and writing data to and from various
file formats, including CSV, excel, SQL databases and more.
• Pandas also offers powerful tools for data cleaning, aggregation, filtering and transformation, as well as statistical and
time series analysis. Its integration with other popular data analysis libraries in Python, such as NumPy and Matplotlib,
make it a powerful tool for data analysis and visualization.

38
2.2.1 Features of Pandas
Pandas is a popular open-source data manipulation and analysis library for Python. It provides data structures and functions
for working with structured data, making it an essential tool for data scientists, analysts, and developers. Here are some of
the key features of Pandas:
1. Data Structures:
• Series: A one-dimensional labeled array capable of holding data of various types.
• DataFrame: A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes
(rows and columns).
2. Data I/O:
Pandas can read data from various file formats, including CSV, Excel, SQL databases, JSON, and more, and write data back to
these formats.
3. Data Alignment:
Pandas can automatically align data from multiple data sources based on index labels. This makes it easy to work with data
from different sources and combine them.

39
2.2.1 Features of Pandas
4. Missing Data Handling:
Pandas provides tools to handle missing data, either by filling in missing values with appropriate default values or by
dropping missing data points.
5. Data Filtering and Selection:
You can use Pandas to select, filter, and slice data based on various conditions and criteria, making it easy to extract the
information you need.
6. Data Transformation:
Pandas supports various data transformation operations, such as merging, joining, reshaping, and pivoting data, which are
essential for data cleaning and analysis.
7. Grouping and Aggregation:
Pandas allows you to group data based on one or more columns and perform aggregation functions like sum, mean, count,
etc., on these groups.

40
2.2.1 Features of Pandas
8. Time Series and Date Functionality:
Pandas has built-in support for working with time series data, including date and time manipulation, resampling, and rolling
statistics.
9. Data Visualization:
While Pandas itself doesn't provide data visualization capabilities, it can be easily integrated with popular data visualization
libraries like Matplotlib and Seaborn for creating charts and plots.
10. High Performance:
Pandas is designed to handle large datasets efficiently, and it leverages low-level libraries like NumPy for high-speed data
operations.
11. Flexibility:
You can work with a wide range of data types within Pandas, including numeric, string, categorical, and datetime data,
making it suitable for various data analysis tasks.

41
2.2.1 Features of Pandas

12. Customization:
Pandas provides options for customizing data structures and operations, allowing users to adapt the library to their specific
needs.
13. Interoperability:
Pandas can be easily integrated with other data analysis and machine learning libraries in the Python ecosystem, such as
NumPy, Scikit-Learn, and TensorFlow.
14. Community and Ecosystem:
Pandas has a large and active community, which means there are plenty of resources, tutorials, and third-party extensions
available to enhance its functionality.

42
Advantages of Pandas

• Fast and efficient for manipulating and analyzing data.

• Data from different file objects can be loaded.

• Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data.

• Size mutability : columns can be inserted and deleted from DataFrame and higher dimensional objects.

• Data set merging and joining.

• Flexible reshaping and pivoting of data sets.

• Provides time-series functionality.

• Powerful group by functionality for performing split-apply-combine operations on data sets.

43
2.2.2 Creating Series
❑ To create a Series in Pandas, you can use the [Link]() constructor.
import pandas as pd
Output :
# Create a Series from a list 0 1
1 2
data_list = [1, 2, 3, 4, 5] 2 3
series_from_list = [Link](data_list) 3 4
4 5
print(series_from_list ) dtype: int64

# Create a Series from a NumPy array

Output :
import numpy as np
0 10
data_array = [Link]([10, 20, 30, 40, 50]) 1 20
2 30
series_from_array = [Link](data_array)
3 40
print(series_from_array ) 4 50
dtype: int64

44
2.2.2 Creating Series
❑ To create a Series in Pandas, you can use the [Link]() constructor.
import pandas as pd
Output :
A 100
B 200
# Create a Series from a dictionary C 300
data_dict = {'A': 100, 'B': 200, 'C': 300, 'D': 400}
D 400
dtype: int64
series_from_dict = [Link](data_dict)
print(series_from_dict )

Output :
# Create a Series with custom index
First 0.5
data_values = [0.5, 0.7, 0.2, 0.1] Second 0.7
custom_index = ['First', 'Second', 'Third', 'Fourth'] Third 0.2
Fourth 0.1
series_with_custom_index = [Link](data_values, index=custom_index)
dtype: float64
print(series_with_custom_index )

45
2.2.3 Creating DataFrames
❑ To create a DataFrame in Pandas, you can use the [Link]() constructor.
1. From a Dictionary of Lists/Arrays:
You can create a DataFrame by passing a dictionary where each key represents a column label, and the associated value is a
list or array containing the column's data.

import pandas as pd Output :

data = {
Name Age City
'Name': ['Alice', 'Bob', 'Charlie', 'David'], 0 Alice 25
'Age': [25, 30, 35, 40], New York
1 Bob 30 Los
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'] Angeles
} 2 Charlie 35 Chicago
3 David 40 Houston
df = [Link](data)
print(df)

46
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

2.2.3 Creating DataFrames

2. From a List of Dictionaries:
You can create a DataFrame from a list of dictionaries where each dictionary represents a row of data, and the keys within
each dictionary are the column labels.
import pandas as pd
data = [ Output :
{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
Name Age City
{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'}, 0 Alice 25
{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}, New York
1 Bob 30 Los
{'Name': 'David', 'Age': 40, 'City': 'Houston'} Angeles
] 2 Charlie 35 Chicago
3 David 40 Houston

df = [Link](data)
print(df)

47
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston
Output :
2.2.3 Creating DataFrames Duration Pulse Maxpulse
Calories
2. Pandas Read CSV FILE: 0 60 110 130
• A simple way to store big data sets is to use CSV files 409.1
1 60 117 145
(comma separated files). 479.0
• CSV files contains plain text and is a well know format that 2 60 103 135
340.0
can be read by everyone including Pandas. 3 45 109 175
• In our examples we will be using a CSV file called '[Link]’. 282.4
4 45 117 148
import pandas as pd 406.0
5 60 102 127
300.5
df = pd.read_csv('[Link]') 6 60 110 136
print(df.to_string()) 374.0
7 45 104 134
253.3
8 30 109 133
195.1
9 60 98 124
269.0
10 60 103 48 147
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston
Output :
2.2.3 Clearing Empty Cell Duration Pulse Maxpulse
Calories
One way to deal with empty cells is to remove rows that 0 60 110 130
contain empty cells. 409.1
1 60 117 145
This is usually OK, since data sets can be very big, and 479.0
removing a few rows will not have a big impact on the result. 2 60 103 135
340.0
By default, the dropna() method returns a new DataFrame, 3 45 109 175
and will not change the original. 282.4
4 45 117 148
import pandas as pd 406.0
5 60 102 127
300.5
df = pd.read_csv('[Link]') 6 60 110 136
new_df = [Link]() 374.0
7 45 104 134
print(new_df.to_string()) 253.3
8 30 109 133
195.1
9 60 98 124
269.0
10 60 103 49 147
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston
Output :
2.2.3 Clearing Empty Cell Duration Pulse Maxpulse
Calories
Replace Empty Values 0 60 110 130
Another way of dealing with empty cells is to insert a new 409.1
1 60 117 145
value instead. 479.0
This way you do not have to delete entire rows just because 2 60 103 135
340.0
of some empty cells. 3 45 109 175
The fillna() method allows us to replace empty cells with a 282.4
4 45 117 148
value: 406.0
import pandas as pd 5 60 102 127
300.5
6 60 110 136
df = pd.read_csv('[Link]') 374.0
7 45 104 134
[Link](130, inplace = True) 253.3
print(df.to_string()) 8 30 109 133
195.1
9 60 98 124
269.0
10 60 103 50 147
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Cleaning Wrong Data

To clean wrong or incorrect data in a dataframe using data frame pandas, you can apply various techniques based on the
specific data issues you’re facing. Here are a few common approaches :
1. Replacing specific values :
If you have specific wrong values that need to be replaced, you can use the replace() method to substitute them with
correct values.
For example :
Import pandas as pd
Output :
df = [Link]({‘A’ : [ 1, 2 , -999 , 4] , ‘B’ : [5, -999 , 7 , 8]}) #create a dataframe
0 15
cleaned_df = [Link](-999, [Link])
1 2NaN
Print(cleaned_df)
2 NaN 7
3 4 8

51
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Cleaning Wrong Data

2. Removing Outliners :
Output :
If your data contains outliners that need to be cleaned , you can filter them out based on specific conditions.
0 1
For example :
1 2
Import pandas as pd
2 99
# Remove ouliners outside the range of [mean – 3*std , mean + 3*std]
3 4
df = [Link]({‘A’ : [ 1, 2 ,99 , 4 , 99 , 5 , 25 , 12 , 6800 , -30 , 50]}) #create a dataframe
4
print(“mean = “ , df[‘A’].mean(), “std_dev = “ , df[‘A’].std())
9
cleaned_df = df[ (df[‘A’] >= df[‘A’].mean() – 3*df[‘A’].std()) &
9
(df[‘A’] <= df[‘A’].mean() + 3*df[‘A’].std())]
5 5
print(cleaned_df)
6
Output :
2
Mean = 642.45454545454 std_dev = 2042.6353254380165
5
7
52
1
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Cleaning Wrong Data

2. Removing Duplicates :
To remove the duplicates from a dataframe using pandas, you can use the drop_duplicates() method.
For example :
Import pandas as pd

df = [Link]({‘A’ : [1,2,3,2,1], ‘B’ : [‘a’ , ‘b’ , ‘c’ , ‘b’ , ‘a’]}) Output :

A
#Remove the duplicates based on all coloumns B
cleaned_df = df.drop_dplicates() 0 1
a
print(cleaned_df) 1 2
b
2 3
c 53
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Pandas Plotting

These functions provide valuable tools for analyzing data and computing various

statistical measures. By using these functions in combination with NumPy’s array

operations, you can perform a wide range of statistical calculations efficiently.

Plotting :

Pandas uses the plot() method to create [Link] can use Pyplot, a submodule of

the Matplotlib library to visualize the diagram on the screen.

54
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Pandas Plotting
import sys
import matplotlib
[Link]('Agg')
import pandas as pd
import [Link] as plt
df = pd.read_csv('[Link]')
[Link]()
[Link]()
#Two lines to make our compiler able to draw:
[Link]([Link])
[Link]()

55
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Pandas Scatter Diagram

Scatter Plot
Specify that you want a scatter plot with the kind argument:

kind = 'scatter'

A scatter plot needs an x- and a y-axis.

In the example below we will use "Duration" for the x-axis and "Calories" for the y-axis.

Include the x and y arguments like this:

x = 'Duration', y = 'Calories'

56
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Pandas Scatter Diagram

#Three lines to make our compiler able to draw:
import sys
import matplotlib
[Link]('Agg')
import pandas as pd
import [Link] as plt
df = pd.read_csv('[Link]')
[Link](kind = 'scatter', x = 'Duration', y = 'Calories')
[Link]()
#Two lines to make our compiler able to draw:
[Link]([Link])
[Link]()

57
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Pandas Histogram Diagram

Use the kind argument to specify that you want a histogram:

kind = 'hist'

A histogram needs only one column.

A histogram shows us the frequency of each interval, e.g. how many workouts lasted between 50
and 60 minutes?

In the example below we will use the "Duration" column to create the histogram:

58
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Pandas Histogram Diagram

#Three lines to make our compiler able to draw:

import sys
import matplotlib
[Link]('Agg')
import pandas as pd
import [Link] as plt
df = pd.read_csv('[Link]')
df["Duration"].plot(kind = 'hist')
[Link]()
#Two lines to make our compiler able to draw:
[Link]([Link])
[Link]()

59
Name Age City 0 Alice 25 New York 1 Bob 30 Los Angeles 2 Charlie 35 Chicago 3 David 40 Houston

Difference between Pandas and NumPy

Pandas NumPy

Purpose Data Manipulation and analysis Numerical computations

Data Type DataFrame (tabular data) Ndarrays (multidimensional arrays)

Functionality Powerful indexing and slicing Mathematical and logical operations

Flexibility Supports heterogeneous data Supports homogenous data

Performance Slower for large datasets Faster for numerical computations

Dependencies Built on top of NumPy Independent of Pandas

60
MATPLOTLIB

Human minds are more adaptive for the visual representation of data rather than textual data. We can
easily understand things when they are visualized. It is better to represent the data through the graph
where we can analyze the data more efficiently and make the specific decision according to data analysis.
Before learning the matplotlib, we need to understand data visualization and why data visualization is
important.

FEATURES OF MATPLOTLIB
Matplotlib is a popular data visualization library in python that provides a wide range of tools for creating
static, animated, and interactive visualizations. It is highly customizable and allows you to create variety of
plots, charts, and graphs. Here are some of the key features of Matplotlib:

❖ Plotting Functions: Matplotlib offers a comprehensive set of plotting functions to create various types
of visualizations, including line plots, scatter plots, bar plots , histograms , pie charts, box plots, and
many more. These functions provide flexible to customize the appearance and style of the plots.

61
MATPLOTLIB (Features Cont.)

❖ Support for Multiple Backends: Matplotlib supports multiple rendering backends, which allows you to
display plots in different environments. It can render plots in various formats such as interactive GUI
windows, static images in different file format (PNG, JPEG, PDF, etc.), or embedded within web
applications.
❖ Object-Oriented Interface: Matplotlib provides an object-oriented interface that allows you to have
fine-grained control over the elements of a plot. You can create Figure and Axes objects to define the
layout and positioning of plots, and manipulate individual plot components such as lines, markers,
labels, and legends.
❖ Customization Options: Matplotlib offers extensive customization options to tailor the appearance of
your plots. You can modify colors, line styles, markers, fonts, gridlines, and other visual elements.
Additionally, you can add titles, labels, annotations, and legends to enhance the readability of the
plots.
62
MATPLOTLIB (Features Cont.)

❖ Support for Multiple Plotting Styles: Matplotlib supports different plotting styles and themes, allowing
you to change the overall look and feel of your plots. It provides a set of predefined styles, such as
classic, seaborn, ggplot, and more. You can also create and customize your own styles.
❖ 3D Plotting: Matplotlib includes functionality for creating 3D visualizations. It allows you to plot three-
dimensional data, such as surface plots, wireframes, scatter plots, and contour plots. These capabilities
are useful for visualizing data in three dimensions and understanding complex relationships.
❖ Subplotting: Matplotlib enables you to create multiple plots within a single figure using subplotting.
You can divide the figure into a grid of subplots and plot different data or views side by side. This
feature is helpful for comparing data, creating multi-panel visualizations, or building complex layouts

63
MATPLOTLIB (Features Cont.)

❖ Animation: Matplotlib supports creating animated plots and visualizations. It provides an animation module that allows you to
animate changes in your data over time. You can create dynamic Visualizations such as line animations, scatter plot animations, or
custom animations based on your specific requirements.
❖ Integration with Pandas and NumPy: Matplotlib integrates well with other libraries, such as Pandas and NumPy, making it easy to
plot data stored in these data structures. It can directly plot data from Pandas DataFrames or NumPy arrays, allowing for seamless
integration into dat analysis workflows.

These are some of the key features of Matplotlib that make it a versatile and powerful library for data Visualization in Python. It is
widely used in various fields such as data analysis, scientific research, machine learning and data exploration.

64
MATPLOTLIB (Features Cont.)
Pyplot is a Matplotlib module that provides a MATLAB-like interface. Matplotlib is designed to be as usable as
MATLAB, with the ability to use Python and the advantage of being free and open-source. Each pyplot function
makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a
plotting area, decorates the plot with labels, etc. The various plots we can utilize using Pyplot are Line Plot,
Histogram, Scatter, 3D Plot, Image, Contour, and Polar.
After knowing a brief about Matplotlib and pyplot let’s see how to create a simple plot.

import [Link] as plt

# initializing the data
x = [10, 20, 30, 40]
y = [20, 25, 35, 55]

# plotting the data

[Link](x, y)
[Link]()

65
Adding Title
The title() method in matplotlib module is used to specify the title of the visualization depicted and displays the title using various
attributes.

Syntax:
[Link](label, fontdict=None, loc=’center’, pad=None, **kwargs)

import [Link] as plt

# initializing the data

x = [10, 20, 30, 40]
y = [20, 25, 35, 55]

# plotting the data

[Link](x, y)

# Adding title to the plot

[Link]("Linear graph")

[Link]()

66
Adding X Label and Y Label
In layman’s terms, the X label and the Y label are the titles given to X-axis and Y-axis respectively. These can be
added to the graph by using the xlabel() and ylabel() methods.
Syntax:
[Link](xlabel, fontdict=None, labelpad=None, **kwargs)
[Link](ylabel, fontdict=None, labelpad=None, **kwargs)
import [Link] as plt

# initializing the data

x = [10, 20, 30, 40]
y = [20, 25, 35, 55]
# plotting the data
[Link](x, y)
# Adding title to the plot
[Link]("Linear graph", fontsize=25, color="green")
# Adding label on the y-axis
[Link]('Y-Axis')
# Adding label on the x-axis
[Link]('X-Axis')
[Link]()

67
Arrow annotation
Arrow notations are similar to text annotations, but they include an arrow that points from one location to
another. These annotations are useful for highlighting specific relationships or connections between data
points. Arrow annotations can be created using the [Link]() or [Link]() functions.

import matplotlib
import [Link] as plt

[Link]([1,2,3,4],[1,4,2,3], label='Series 1')

[Link]([1,2,3,4],[4,3,1,3], label='Series 2')

[Link]('X-axis label')
[Link]('Y-axis label')
[Link]('Plot Title')

[Link](3,4, 'Example annotation', fontsize = 12 , color='red')

[Link]('Reversal',xy=(3,2), xytext=(2,1.5),arrowprops=dict(facecolor = 'black',arrowstyle = '->'))
[Link]('',xy=(3,1), xytext=(2.4,1.5),arrowprops=dict(facecolor = 'red',arrowstyle = '->'))

[Link]()

68
Arrow annotation - Circle
Circle annotations are used to draw a circle or ellipses on the plot. They are typically employed to mark specific data points or
regions. Circle annotations can be created using the [Link]() or ax.add_patch() functions.

import matplotlib
import [Link] as plt
import [Link] as patches
[Link]([1,2,3,4],[1,4,2,3], label='Series 1')
[Link]([1,2,3,4],[4,3,1,3], label='Series 2')

[Link]('X-axis label')
[Link]('Y-axis label')
[Link]('Plot Title')

[Link](3,4, 'Example annotation', fontsize = 12 , color='red')

[Link]('Reversal',xy=(3,2), xytext=(2,1.5),arrowprops=dict(facecolor = 'black',arrowstyle = '->'))
[Link]('',xy=(3,1), xytext=(2.4,1.5),arrowprops=dict(facecolor = 'red',arrowstyle = '->'))
circle = [Link]((2,4), 0.4 , facecolor='green', alpha = 0.3)
[Link]().add_patch(circle)

[Link]()

69
Arrow annotation - Rectangular
rectangle annotations are used to draw rectangular shape on the plot. They are typically employed to
emphasis a specific regions. rectangular annotations can be created using the [Link]() or ax.add_patch()
functions.

import matplotlib
import [Link] as plt
import [Link] as patches
[Link]([1,2,3,4],[1,4,2,3], label='Series 1')
[Link]([1,2,3,4],[4,3,1,3], label='Series 2')

[Link]('X-axis label')
[Link]('Y-axis label')
[Link]('Plot Title')
[Link](3,4, 'Example annotation', fontsize = 12 , color='red')
[Link]('Reversal',xy=(3,2), xytext=(2,1.5),arrowprops=dict(facecolor = 'black',arrowstyle = '->'))
[Link]('',xy=(3,1), xytext=(2.4,1.5),arrowprops=dict(facecolor = 'red',arrowstyle = '->'))
circle = [Link]((2,4), 0.4 , facecolor='green', alpha = 0.3)
[Link]().add_patch(circle)
rect = [Link]((1,4.5),3, 0.7 , facecolor='red', alpha = 0.3)
[Link]().add_patch(rect)
[Link]()
70

M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Numpy Arrays and Data Manipulation Guide
No ratings yet
Numpy Arrays and Data Manipulation Guide
39 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
Numpy
No ratings yet
Numpy
51 pages
NumPy Guide for Python Beginners
No ratings yet
NumPy Guide for Python Beginners
67 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
Num Py
No ratings yet
Num Py
21 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
Unit 3 - Numpy - VP
No ratings yet
Unit 3 - Numpy - VP
53 pages
Topic - 2 - The Basics of NumPy Arrays 1
100% (1)
Topic - 2 - The Basics of NumPy Arrays 1
10 pages
NumPy and Pandas Basics for Data Analysis
No ratings yet
NumPy and Pandas Basics for Data Analysis
61 pages
6 Numpy VI
No ratings yet
6 Numpy VI
126 pages
Python Array and NumPy Guide
No ratings yet
Python Array and NumPy Guide
62 pages
C1 W1 Lab 1 Introduction To Numpy Arrays
No ratings yet
C1 W1 Lab 1 Introduction To Numpy Arrays
12 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
NumPy Basics: Arrays & Computation Guide
No ratings yet
NumPy Basics: Arrays & Computation Guide
33 pages
Exp 12345
No ratings yet
Exp 12345
15 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
NUMPY
No ratings yet
NUMPY
8 pages
NumPy and Pandas Overview Guide
No ratings yet
NumPy and Pandas Overview Guide
13 pages
Numpy
No ratings yet
Numpy
27 pages
NumPy Array Operations Guide
100% (1)
NumPy Array Operations Guide
73 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
63 pages
NumPy Basics for Beginners
No ratings yet
NumPy Basics for Beginners
14 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
NumPy: A Comprehensive Guide
No ratings yet
NumPy: A Comprehensive Guide
10 pages
Unit IV Python Part1
No ratings yet
Unit IV Python Part1
23 pages
Python Numpy Pandas1
No ratings yet
Python Numpy Pandas1
11 pages
Num Py
No ratings yet
Num Py
13 pages
LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
LT2 - 07 - Numpy Matplotlib Pandas
101 pages
NumPy Notes
No ratings yet
NumPy Notes
30 pages
De Lab Manual New
No ratings yet
De Lab Manual New
24 pages
Num Py
No ratings yet
Num Py
31 pages
B14 - LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
B14 - LT2 - 07 - Numpy Matplotlib Pandas
101 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
NumPy Array Creation & Manipulation
No ratings yet
NumPy Array Creation & Manipulation
27 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
Advance Python Program Unit II
No ratings yet
Advance Python Program Unit II
92 pages
Unit Iv FDS
No ratings yet
Unit Iv FDS
142 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
Packages
No ratings yet
Packages
37 pages
Unit 4
No ratings yet
Unit 4
27 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
New Chat
No ratings yet
New Chat
30 pages
W03 - FA23 - AIC270 - Programming For AI - Syed Ahmed
No ratings yet
W03 - FA23 - AIC270 - Programming For AI - Syed Ahmed
57 pages
Self Numpy
No ratings yet
Self Numpy
6 pages
Entrepreneurship LAS Lesson-1 Week1
No ratings yet
Entrepreneurship LAS Lesson-1 Week1
7 pages
Silent Base 802 Manual
No ratings yet
Silent Base 802 Manual
28 pages
Engineering Students: Control Valves
No ratings yet
Engineering Students: Control Valves
18 pages
INGLES 11deg
No ratings yet
INGLES 11deg
3 pages
Amy Daniels Resume
No ratings yet
Amy Daniels Resume
2 pages
Introduction To Human Movement Analysis: Jason Friedman
No ratings yet
Introduction To Human Movement Analysis: Jason Friedman
33 pages
ES3 Notes
No ratings yet
ES3 Notes
1 page
Das 2012 Tea Smallholdings in Assam
No ratings yet
Das 2012 Tea Smallholdings in Assam
4 pages
Containment Basins and Stands 2
No ratings yet
Containment Basins and Stands 2
2 pages
8.6.4 Coated Macadams
No ratings yet
8.6.4 Coated Macadams
4 pages
CNC Turning Optimization for EN16 Steel
No ratings yet
CNC Turning Optimization for EN16 Steel
5 pages
Common Errors in Investment M...
No ratings yet
Common Errors in Investment M...
6 pages
Access Center
No ratings yet
Access Center
4 pages
3.corpora Design
No ratings yet
3.corpora Design
4 pages
Drager 8000 SC Incubator Test
No ratings yet
Drager 8000 SC Incubator Test
2 pages
Test Bank For Principles of Economics 4th Edition
No ratings yet
Test Bank For Principles of Economics 4th Edition
29 pages
Ajeel Field Well Design Example
No ratings yet
Ajeel Field Well Design Example
5 pages
Artificial Intelligence MCQs
No ratings yet
Artificial Intelligence MCQs
9 pages
8086 Procedures Macros Interrupts
No ratings yet
8086 Procedures Macros Interrupts
13 pages
DB12 50 PDF
No ratings yet
DB12 50 PDF
2 pages
Astronaut Ice Cream, Anyone
No ratings yet
Astronaut Ice Cream, Anyone
6 pages
Service Manual: H2.0UT, H2.5UT, H3.0UT, H3.5UT
83% (6)
Service Manual: H2.0UT, H2.5UT, H3.0UT, H3.5UT
100 pages
Module 2 17ec54 Source Coding Notes
No ratings yet
Module 2 17ec54 Source Coding Notes
36 pages
Grade 10 Arts Week 3
No ratings yet
Grade 10 Arts Week 3
20 pages
Pharmaceutical Calculations L2
No ratings yet
Pharmaceutical Calculations L2
17 pages
Reading
No ratings yet
Reading
2 pages
RPS Technology For Instructional Media
0% (1)
RPS Technology For Instructional Media
11 pages
Knowledge Management in BHU Libraries
No ratings yet
Knowledge Management in BHU Libraries
9 pages

Ch-2 Python Libraries For ML

Uploaded by

Ch-2 Python Libraries For ML

Uploaded by

Introduction to Machine Learning

❖NumPy is a general-purpose array-processing package. It provides a high-performance

NumPy. Range([start, ]stop, [step, ]dtype=None)

# Create an array from 0 to 9 with a step of 1

# Create an array from 2 to 10 with a step of 2

arr = [Link](2, 11, 2)

print(arr) # Output: [ 2 4 6 8 10]

# Create an array from 0 to 1 with a step of 0.2

arr = [Link](0, 1.2, 0.2)

print(arr) # Output: [0. 0.2 0.4 0.6 0.8 1. ]

# Create a 1D array with 12 elements

arr = [Link]([1, 2, 3, 4, 5, 6])

arr = [Link]([1, 2, 3, 4, 5, 6, 7])

arr2d = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

vertical_split = [Link](arr2d, 3) # Split into 3 rows

• Fast and efficient for manipulating and analyzing data.

• Data from different file objects can be loaded.

• Data set merging and joining.

• Flexible reshaping and pivoting of data sets.

• Provides time-series functionality.

• Powerful group by functionality for performing split-apply-combine operations on data sets.

# Create a Series from a NumPy array

import pandas as pd Output :

2.2.3 Creating DataFrames

Cleaning Wrong Data

Cleaning Wrong Data

Cleaning Wrong Data

df = [Link]({‘A’ : [1,2,3,2,1], ‘B’ : [‘a’ , ‘b’ , ‘c’ , ‘b’ , ‘a’]}) Output :

statistical measures. By using these functions in combination with NumPy’s array

operations, you can perform a wide range of statistical calculations efficiently.

the Matplotlib library to visualize the diagram on the screen.

Pandas Scatter Diagram

A scatter plot needs an x- and a y-axis.

Include the x and y arguments like this:

Pandas Scatter Diagram

Pandas Histogram Diagram

Use the kind argument to specify that you want a histogram:

A histogram needs only one column.

Pandas Histogram Diagram

#Three lines to make our compiler able to draw:

Difference between Pandas and NumPy

Purpose Data Manipulation and analysis Numerical computations

Data Type DataFrame (tabular data) Ndarrays (multidimensional arrays)

Functionality Powerful indexing and slicing Mathematical and logical operations

Flexibility Supports heterogeneous data Supports homogenous data

Performance Slower for large datasets Faster for numerical computations

Dependencies Built on top of NumPy Independent of Pandas

import [Link] as plt

# plotting the data

import [Link] as plt

# initializing the data

# plotting the data

# Adding title to the plot

# initializing the data

[Link]([1,2,3,4],[1,4,2,3], label='Series 1')

[Link](3,4, 'Example annotation', fontsize = 12 , color='red')

[Link](3,4, 'Example annotation', fontsize = 12 , color='red')

You might also like