Unit 5-2
Unit 5-2
LEARNING OBJECTIVES
After reading this lesson student should be able to understand:
• The concept of Arrays and how they are different from List.
• How multi-dimensional arrays can be created using NumPy
built-in functions.
• Accessing individual elements of the array using indexing
operation
• Assigning values to individual elements of the array using
indexing operation.
• Accessing subset of the array using slicing operation.
• Assigning values to subset of array using slicing operation.
• Executing basic mathematics as well as linear algebra operations
using NumPy.
• Why arrays are preferred over Lists.
INTRODUCTION
NumPy stands for Numerical Python, is a fundamental library for
scientific or numerical computing in Python. It provides support for
large, multi-dimensional arrays and matrices, along with a collection of
mathematical functions to operate on these arrays efficiently.
Some key features and functionalities of NumPy library are:
• Multi-dimensional Arrays
• Vectorized Operations
• Mathematical Functions
• Array Manipulation
• Broadcasting
• Random Number Generation
• Integration with other Libraries
NUMPY ARRAY FUNCTIONS
One of the key features of NumPy is its array object. NumPy allow
creating 1-dimensional, 2-dimensional and higher dimensional
homogeneous arrays, called n-d array. N-d array is a fast and flexible
container for large datasets in Python. Array enables to perform
mathematical operations on complete blocks of data using the same
syntax that is used to perform operations between scalar elements (such
as int, float, strings).
Various functions that can be used to create arrays in NumPy are as
follows:
Function Description
Name
array( ) Convert input sequence (such as list, tuple, array or any other
sequence type) to an n-d array. It creates a copy of the data, by
default.
asarray( ) Similar to array( ) function, convert input sequence (such as
list, tuple, array or any other sequence type) to an n-d array.
However, do not create a copy of the input, if input sequence is
already an array.
arange( ) Similar to built-in range( ) function, but creates an n-d array
(instead of a list) of elements specified in between start and
end value, given as arguments in arange( ) function.
ones( ) Creates an array of all 1s with the given shape and dtype.
zeros( ) Creates an array of all 0s with the given shape and dtype.
full( ) Creates an array of the given shape and dtype having all the
values specified as “fill value” in the argument.
linspace( ) Creates an array of fixed length within specified start and end
values.
tile( ) Creates a new array by repeating an existing array for a
particular number of times.
randint( ) Creates a random array of integers within a particular range.
Here, the line written after ‘#’ symbol is a comment. That’s how
single line comments are written in Python. For multiline comments,
triple quotes are used.
data1 is a list containing elements of both int and float types.
np.array( ) function will accepts ‘data1’ as argument and convert it
into an array. On printing the arr1 variable, the output will be as follows:
On printing the variables ‘arr2’ and ‘arr3’, the result will be:
np.asarray( ):
As defined in the previous section, np.asarray( ) function converts
input sequence (such as list, tuple, array or any other sequence type) to
an n-d array.
Call the np.asarray( ) function and pass the previously created array
‘arr3’ as the argument.
Print the newly created array ‘arr4’ as:
np.arange( ):
np.arange( ) function creates an n-d array (instead of a list) of elements
specified in between start and end value, given as arguments in
np.arange( ) function.
Other than start and end value, the function also use step value as
argument:
np.arange(start, end, step)
These three arguments signifies:
• start: It specifies the first element value of the array.
• end: It specifies the last element value of the array. However, the
last value is always end-1.
• step: It specifies the gap or jump between values. Or it can be
defined as the difference between next value and the current
value of the array.
Out of these 3 arguments, only end argument is mandatory. The other
two arguments, start and step are optional. If you do not specify any
value for start and step, then these are initialized by their default values,
0 and 1, respectively.
The example below creates an array ‘arr6’ by using np.arange( )
function having just one mandatory argument ‘end’.
By default, the data type of the array is float. If you want to change
this data type, the dtype argument can be used while creating the array.
np.ones_like( ):
It accepts an array (created using ones( ) function) as input and produces
an ones array of the same shape and dtype.
The example below creates an ones array ‘arr12’ by passing an
existing ones array ‘arr10’ as argument.
np.zeros_like( ):
It accepts an array (created using np.zeros( ) function) as input and
produces a zeros array of the same shape and dtype. Here, ‘arr15’ is a
zeros array created using ‘arr13’. It automatically copies the data type
and shape of the array from arr13.
np.empty_like( ):
It accepts an array (created using np.empty( ) function) as input and
produces a new array of the same shape and dtype.
The statement below creates and print an array similar to ‘arr17’ (a
one-dimensional array having 10 elements).
The np.empty_like( ) function can also be used to create an array
similar to any other type of array (an array not created using np.empty( )
function). For example, the statement below creates and print an array
similar to ‘arr14’ (a zeros array having 4 rows and 4 columns).
Instead of random values, all values of the array are now 30.
It is also possible to change a specific value of the array. For
example, if you want to change the first value from 30 to 40, then
following statement can be used:
The index of the array always start from 0. So, first value will have
the index 0. By using the index operation, the value ‘40’ can be assigned
to index 0.
The output of the above print statement will be:
Only the first value will be changed to 40. Rest of the values will
remain as 30.
We shall discuss in detail about indexing in the upcoming section
4.6.
np.full( ):
Creates an array of the given shape and dtype having all the values
specified as “fill value” in the argument.
This statement will create and print a one-dimensional array ‘arr21’
having 10 elements and all the values will be 7. Here, np.full( ) function
accepts two arguments; first is the number of elements and second is the
fill value.
np.full_like( ):
It accepts an array (created using np.full( ) function) as input and
produces another array of the same shape and dtype.
This statement below creates and print an array ‘arr23’ having same
shape and data type as of array ‘arr21’ but with the fill value as 15.
np.eye( ):
Creates a square nxn identity matrix (1s on the diagonal and 0s
elsewhere).
The statements below create and print an array ‘arr25’ which is an
identity matrix having all float values.
np.random( ):
Creates an array of random numbers. The np.random( ) function is
defined in random library. Hence, to call this function, name of random
library needs to be attached.
The statements below create and print a two-dimensional array of 3
rows and 4 columns having random numbers in between 0 and 1.
The output will be, the array elements “0,1,2” will be repeated 3 times.
np.linspace( ):
Creates an array of fixed length within specified start and end values.
The np.linspace( ) function accepts three arguments. First argument
specifies the start value, second argument specifies the end value and
third argument specifies the total number of elements of the array to be
created.
This statement will create a one-dimensional array having 15 as start
value, 18 as end value and the array will contain 25 elements. The step
value between each value is automatically determined by np.linspace( )
function.
Help Function
help( ) is a very useful Python function to get information about any
built-in function. To find that how a function should be called (i.e.,
syntax of the function, number of parameters, the kind of values accepted
by parameters), the name of the function can be passed as argument
while calling the help( ) function.
For example, if user is not aware what np.ones( ) function do and
how to call it, then help( ) function can be called as:
Many times it is required to inspect (or check) the structure of the array,
especially while working with the large arrays. Some attributes of
NumPy that helps in checking the structure of the array are:
(i) shape: tells the shape of the array i.e., how many rows and
columns the array has.
(ii) dtype: tells the data type of the array.
(iii) ndim: tells the number of dimensions of the array.
(iv) itemsize: tells the number of bytes that each element of the array
has occupied in memory.
For example, consider a large size array of size 1000 X 300. Initialize
this array with random values as:
Now, the structure of the array can be inspected by using shape, dtype,
ndim and itemsize attributes as follows:
The output of these statements will be:
First line of the output tells that the array contains 1000 rows and
300 columns.
Second line tells about the data type of the elements of the array i.e.,
float64. 64 here determines the number of bits required to store each
float value in memory.
Third line specifies the number of dimensions. As the array is two-
dimensional, hence the output says dimensions as 2.
The last line of the output specifies number of bytes required in
memory, to store each element of the array.
The first element points to first row and first column, thus having the
index as 0,0. The second element points to first row but second column,
thus having the index as 0,1 and so on.
Slicing:
Sometimes, it is required to access the specific elements of the array or
we can say to access a subset of the array or to assign the same value to
more than one index position. This operation is called slicing the array.
Slicing is similar to indexing with the only difference that in slicing, we
operate on more than one index value.
Let’s understand indexing and slicing operation with the help of
some examples.
Consider the following one-dimensional array created using
np.arange( ) function:
To access any element of the array index values are written in between
square brackets.
Third element of the array array_1d can be accessed as:
As indexes starts from 0, so the index for the third element will be 2.
and output will be:
In slicing, we can specify start index, end index and step value as:
array_name[start index: end index: step value]
However, the end index is never included in the output. The last
value of output is always end index – 1.
During slicing, all these index values (start index, end index and step
value) are optional. If start index is not given, by default it is taken as
first index i.e., 0. If end index is not given then by default it is taken as
end index + 1 and if step value is not given then it is taken as 1. So, in
such scenario, the slicing operation is written as:
array_name[::]
To slice from third element onwards in array ‘array_1d’:
Here the end index will be taken as end index + 1 i.e., 9 + 1 =10. And
output will be:
Here, the start index will be taken as 0. And output will be:
To slice from first index till last index but with index increment of 2 in
array ‘array_1d’:
Here the end index will be taken as end index + 1 i.e., 9 + 1 =10. And
output will be:
Indexing and slicing in one-dimensional array is similar to indexing
and slicing in the list. However, when these operations are performed in
two or higher-dimensional arrays, then index values for rows and
columns both need to be specified.
Consider the following two-dimensional array ‘array_2d’ created
using nested list:
To access the third row and second column, the indexing can be done as:
Here, the index of the row and column will be given separated by comma
and the output will be:
The way slice operation can have start index, end index and step
value in one-dimensional array, similarly it can have it in two or higher
dimensional array also.
In the statement above, to access all columns, only: (colon) is
specified which means the column values will be accessed from start
index 0 till end index + 1 but only for second row and the output will be:
One by one values will be taken from array_1d and their square will
be calculated and printed and the output will be:
This loop will access the rows from array_2d one by one and print it
and the output will be:
If you are not sure about the second dimension, then -1 can be
specified as second dimension and Python automatically calculates it
according to the first dimension and number of elements.
Combining Arrays:
Arrays can be combined horizontally as well as vertically. This operation
is called Stacking arrays. To combine arrays horizontally, np.hstack( )
function is used and to combine arrays vertically, np.vstack( ) function is
used.
For horizontal stacking, the number of rows should be the same,
while for vertical stacking, the number of columns should be the same.
Consider the following arrays array_1, array_2 and array_3,
created using np.arange( ) function:
The function will be applied on each element of the array and the output
of the print statements will be:
Each element of mat1 will be added with the same index element of mat2
and the output of element-wise matrix addition will be:
The eigen value and eigen vector of the matrix can be calculated using
np.linalg.eig( ) function as:
SELF-ASSESSMENT QUESTIONS
Q1. Create the following list having multiples of 3:
list1 = [3, 6, 9, 12, 15, 18, 21, 24]
Convert this list into a one-dimensional array.
Q2. Create a one-dimensional array of random integers from 21 to
40. The array should contain 10 elements.
Q3. Create a two-dimensional array of size 5 x 7. All the values of
the array should be same. You can pick any value of your
choice.
Q4. Convert the one-dimensional array created in question 1 into
two-dimensional array of size 2 x 4. Can array be converted
into two-dimensional array of any other size?
Q5. Create an identity matrix of size 5 x 5.
Q6. Create a one-dimensional array having 20 elements using
empty( ) function. Assign the value ‘20’ to the first half
elements of the array and assign value ‘45’ to the second half
elements of the array.
Q7. Create a two-dimensional array of 5 x 7 and print following:
print the value of third row and fifth column.
print all the values of fifth row.
print all the values of second column.
print all the rows and first two columns.
print last three rows and all the columns.
Q8. Create a matrix of 4 x 3 and calculate transpose of this matrix.
Q9. Create a matrix of 4 x 4 and find its determinant, eigen value
and eigen vector.
Q10. Create 2 two-dimensional arrays of size 4 x 3 and 3 x 3. Merge
these arrays vertically.
UNIT - 5
WORKING WITH PANDAS
INTRODUCTION
Python Pandas is a powerful open-source library for data manipulation,
analysis and visualization. It is built on the top of the NumPy library and
provides high-level data structures and functions to efficiently work with
structured data, such as tabular data, time series and heterogenous data.
Some key features and functionalities of Pandas library are:
• Data Frame
• Data Manipulation
• Data Input/Output
• Time Series Analysis
• Handling Missing Data
• Data Visualization
• Integration with other Libraries
This statement will create a Pandas series ‘s’ having sequence of integer
values. On printing the variable s, the output will be:
To access only index values from Pandas series, index property can be
used:
One thing that needs to be taken care is that the number of elements
in the index list and the number of elements specified in the series should
be equal.
The output of the above statement will be:
Usually we work with series as part of a DataFrame, so let’s start the
discussion about DataFrame.
The statements written above will create a dictionary ‘data’ and convert
it into a DataFrame and assign it to the variable ‘df’ and print it. The
output of the above ‘df’ statement will be:
Like series, a numeric index is created automatically in DataFrame also.
To arrange the columns in specific order in the DataFrame, columns
argument can be used. The columns argument specifies the order of the
columns to be displayed in the DataFrame.
To set the distinct values in address column for all the rows together, a
sequence can be assigned to the column:
REINDEXING
Pandas provide reindex( ) method that creates a new object and associate
the data with the new index.
Consider the data dictionary having name, age and occupation of
employees of an organization.
This will create a new DataFrame object df2 and output of df2 statement
will be:
The list marks has been converted into a Pandas series having name as
indices:
However, in case of character index slicing, the end index value row is
included in the output. Consider the following statement to extract rows
from second to fourth:
or
To access a specific cell value, specify the column name and use the
values property with specific row value:
Here, the statement will access the value from second row of name
column and the output will be:
Selection with loc and iloc
DataFrame also provide the special indexing operators loc and iloc.
These operators allows selecting subset of row and columns from the
DataFrame using NumPy like notation. loc use axis labels and iloc use
integer index.
The syntax of loc and iloc indexing operators is:
df.loc[row range, column range]
Here df is the name of the DataFrame.
To access all columns and specific row, specify the character index
label (in case of loc) or integer index label (in case of iloc). To access all
the columns, a: (colon) sign will include all the columns in the output.
Using loc indexing operator:
or
or
FUNCTION APPLICATION
An important operation that is frequently used with DataFrame is to
apply a function on a specific row or a specific column. This can be done
by using DataFrame’s apply( ) method.
Consider the DataFrame new_df that stores the marks of the
students:
Now, apply the function f on column marks, using the apply( ) method.
Let’s reindex the DataFrame by making column ‘name’ as index. For this
set_index( ) method can be used.
This will sort the names of the columns in the ascending order and the
output will be:
Output of the above statement will be the minimum value present in each
column.
To calculate the minimum value for each row, axis argument need to be
used:
count( ) method does not count NaN values, hence the output is 4 for
both the columns A and B.
Output will presents count of values, number of unique values, top value
and frequency of top value.
Once you specify the name of the specific sheet using sheet_name
argument, that specific sheet will be read by read_excel( ) method and
then converted into the DataFrame.
The output of the above statements will be:
Reading the sheet ‘Student’ from excel file ‘Data1’ and convert it into
DataFrame df_stud.
It will remove all the rows having one or more NaN values and output
will be:
To remove only those rows, where all the values are NaN, use how
argument:
And the output will be:
It is also possible to delete the columns having NaN values by using the
axis argument. To do this, let’s add a new column in DataFrame df_miss.
The column will contain all NaN values.
To fill NaN values and saving the change in the DataFrame, inplace
argument need to be used. This will permanently save the filled values in
the DataFrame.
DATA TRANSFORMATION
Data transformation involves changing data in the form required for
analysis. It includes removing duplicate values, replacing values with the
specific values, changing form of data using mapping and dividing data
into groups called bins.
Removing Duplicates
To remove duplicate value, Pandas provide drop_duplicates( ) method.
Before removing the duplicate values, it is possible to check whether
duplicate values exist or not. This can be done by calling duplicated( )
method. Both the methods by default works for complete DataFrame i.e.,
duplicity of value is checked for each row (combining values of all
columns). If you wish to check whether duplicate value exist in a
particular column or not, column name need to be specified as the
argument of the method.
Consider the DataFrame df_duplicate created from dictionary dict1.
The DataFrame contains information about name and marks of students.
The output will be the series indicating last two values as duplicate.
Calling drop_duplicates( ) method to drop duplicate values from
DataFrame df_duplicate:
Suppose you want to round off the marks of the students such that those
who have scored less than 25, there marks will be replaced with 20.
To replace multiple values, replace( ) method can be called as:
Multiple values can be passed as a list in the argument and the output
will be:
Suppose you want to replace all the values less than 23 with 20 and
values equal to or greater than 23 with 25. For such scenario where
different replacement is required for different values, replace( ) method
can be called as:
Here, parentheses means that the value is excluded and square brackets
means that the value is included.
If you want, you can count the values falling in a particular bin, by using
value_counts( ) method:
To change that which value will be included and which will be excluded,
right argument need to be used.
Suppose you want to select all orders and observe Sales of the customer
segment Corporate. Since customer segment details are present in the
DataFrame customer_df, so it will be required to merge customer_df
with market_df.
This merging can be done as:
Merging operation is similar to applying joins in databases. Merging is of
several types, including inner, outer, left and right. The type of merging
is specified using how argument. As specified previously, merging is
done using a common column between the DataFrames. This common
column is specified using on argument.
The output of the above calling of merge( ) method will be:
Now the orders made by the customers from Corporate segment can be
subset as:
Or
SELF-ASSESSMENT QUESTIONS
Q1 Create a Series of employees that stores salary of 10
employees. Set the name of the employees as the index label in
the Series.
Q2 Create a DataFrame that stores information about 5 library
books. The DataFrame should contain Book Name, Price and
Publisher.
Add a new column ‘Author Name’ to the DataFrame and fill
the column with appropriate distinct values.
Q3 Remove the second row of the DataFrame created in question
2. After removing second row, remove the column ‘Publisher’
from the DataFrame.
Q4 Create a DataFrame that stores 10 numbers and their cubes in
separate columns. Use lambda expression to multiply each
cube value with 5 and display the result.
Q5 Sort the DataFrame created in question 2 in descending order
of book name. Also set the name of the author as index in the
DataFrame.
Q6 Create a DataFrame that store marks of 5 different subjects
(Physics, Mathematics, Chemistry, English and Economics) for
10 students. The subject names should be the column name and
the student id should be the index label.
Calculate the descriptive statistics for the DataFrame created.
Also, compute the correlation between marks scored in Physics
and Mathematics to find relationship between them.
Q7 Create one Excel file containing details about the weather
(date, temperature) of two cities (Delhi and Bombay) in two
different sheets. The weather details should be of 10 different
dates.
Read the data from sheet that has details about weather of
Bombay and add a column ‘Rainy’ to it. The value of the
column should be either Yes or No.
Update the information in Excel file also.
Q8 Create a DataFrame containing information about age of 5
family members from 10 different families. The family
member number should be the column name and the family
number should be the index labels. Fill some values as NaN.
Perform the following operations on the DataFrame:
(i) Remove the rows having more than two NaN values.
(ii) Replace all NaN values with a constant value ‘25’.
(iii) Replace NaN values of each column with some distinct
value.
(iv) Replace all NaN values with the mean value of the
DataFrame.
Q9 Check if there is any duplicate value in the DataFrame created
in question 8 and if exist, remove these duplicate values.
Q10 Create bins of the ages of column 1 from the DataFrame
created in question 8 and count the number of values in each
bin.
UNIT – 5
DATA VISUALIZATION (MATPLOTLIB)
INTRODUCTION
Data cleansing and normalization are essential processes in data analytics
that help in improving the quality and consistency of data. Data
Cleansing, also known as data cleaning or data scrubbing, refers to the
process of identifying and correcting or removing errors, inconsistencies
and inaccuracies from a dataset. The goal of the data cleansing is to
ensure that the data is accurate, complete and reliable for analysis.
Some common tasks involved in data cleansing include:
• Removing duplicate records: Identifying and eliminating
duplicate entries within a dataset to avoid skewing analysis
results.
• Handling missing values: Dealing with missing or null values by
either imputing them using statistical techniques or removing the
incomplete records.
• Correcting inconsistencies: Resolving discrepancies and
inconsistencies in data by standardizing formats, resolving
conflicts and ensuring data integrity.
• Validating data: Verifying the accuracy of data by checking for
outliers, logical errors or data that falls outside expected range.
• Formatting data: Converting data into a consistent format, such
as converting dates to a standard format or ensuring consistent
units of measurement.
By performing data cleansing, data analyst can enhance the quality
of data and minimize the potential impact of errors on subsequent
analysis or modelling.
TYPES OF CHARTS/GRAPHS
Matplotlib can create various types of plots and visualizations. It
provides a wide range of options for customizing and enhancing the
appearance of plots. Different types of plots that can be created using
matplotlib are:
1. Line Plot: It is the most basic type of plot. It displays data points
connected via straight lines. It is generally used to represent the
trend or progression of data over time.
2. Scatter Plot: This plot displays information or individual data
points as markers in two-dimensional plane. It is useful for
displaying or visualizing the relationship between two variables.
With the help of scatter plots, Data Analyst identify the patterns
and clusters in the data.
3. Bar Plot: These plots are useful in representing categorical data.
They represent data as rectangular bars where the length of each
bar corresponds to the value of the data. Bar plots are generally
used for comparing different categories or groups.
4. Pie Chart: These are circular plots divided into sections, where
each section represents a category or group. The area of the
section corresponds to the proportion or percentage of data
belonging to that category or group.
5. Histogram: These plots are generally used to visualize the
distribution of numerical data. Histograms divide the range of
data into bins (intervals) and display the frequency or count of
data points falling into each interval.
6. Box Plot: This plot is also known as box and whisker plot. It
displays the distribution of numerical data through quartiles. Box
plot also provide information about the median, range and
potential outliers in the data.
7. Heatmap: Heatmaps use intensity of colors to represent values in
a two-dimensional array. It is generally used to visualize the
correlations, relationship between variables.
label here is used to label the multiple lines in the graph. If the graph
contains single line, then label is not required.
To display the labels in the graph after assigning names to them
using label argument of plot( ) method, it is mandatory to execute the
legend( ) method as:
The loc argument in the legend( ) method defines the position where
the labels will be displayed inside graph. The permissible values of loc
are 1, 2, 3 and 4.
To display the plot, show( ) method is executed as:
The output of the above statements will be the following line plot:
CREATING SCATTER PLOT
Scatter plots are useful for displaying or visualizing the relationship
between two variables. To create the scatter plot, consider the following
data points:
To plot these data points, scatter( ) method of pyplot module is used as:
The mandatory arguments here are the data point variables. Other
arguments specified in the statement are used for the following purpose:
The argument ‘c’ specifies the color of the marker, while ‘s’ specifies
the size of the marker. Another argument ‘alpha’ specifies the gradient
of the color of the marker. The range of alpha is from 0 to 1. The
‘marker’ argument specifies the shape of the marker. Permissible values
for the marker are ^ (for triangle), o (for circle, the default value), s (for
square), * (for star shape).
Apart from setting these arguments, the size of the figure or plot can
also be controlled by using figure( ) method. As argument, this method
accepts the height and width of the figure.
The output of the above statements will be the following scatter plot:
CREATING BAR GRAPH
Bar graph/plots are useful in representing categorical data. Consider the
following data points:
To plot these data points, bar( ) method of pyplot module is used as:
The ‘color’ argument is used to specify the color of the bars and ‘width’
argument is used to specify that how wide the bars will be in the graph.
The title( ) method is used to specify a title for the graph.
The xlabel( ) and ylabel( ) methods are used to specify the labels for x-
axis and y-axis.
The output of the above statements will be the following bar plot:
CREATING PIE CHART
Pie charts are circular plots divided into sections, where each section
represents a category or group. Consider the following lists ‘x’,
‘activities’ and ‘cols’.
List ‘x’ represents the data i.e., number of persons who have
performed some activities, ‘activities’ represents the names of activities
performed and ‘cols’ represents the color for each section.
To plot these data points, pie( ) method of pyplot module is used as:
Here, labels specify the labels of the sections of the chart, colors
specify the color for each section, startangle specify the position angle
to start sections, explode to slice out the section (a value of 0.1 for first
two activities means these two sections will be sliced out a bit), autopct
specify that the percentage of graph share will be calculated
automatically up to two decimal points.
To specify the title of the plot:
legend( ) method to display the labels in the chart with location value
equals to 4.
The output of the above statements will be the following pie chart:
CREATING HISTOGRAM
Histograms are generally used to visualize the distribution of numerical
data. Histograms divide the range of data into bins (intervals) and display
the frequency or count of data points falling into each interval.
Consider the following data points:
To plot these data points, hist( ) method of pyplot module is used as:
Color argument here specify the color of the histogram plot.
To display the histogram:
The same data points can be used to create two different types of plots in
the same x-axis and y-axis. Let’s create bar and line plot together using
above data points.
Using bar( ) and plot( ) methods to plot the bar and line plot:
The output of the above statements will be the following bar-line plot:
CREATING FIGURES AND SUBPLOTS
Sometimes it is required to create multiple plots in the same figure.
Matplotlib supports the concept of figures and subplots that can be used
to create multiple subplots inside the same figure.
To create multiple plots in the same figure, subplot( ) method of
pyplot module is used. The syntax of the subplot( ) method is as follows:
plt.subplot(nrows, ncols, nsubplots)
where nrows specifies number of rows, ncols specifies number of
columns and nsubplots specifies number of the sub-plot in the row.
Consider an example of creating a figure having 4 subplots:
Initializing data points using linspace( ) method of NumPy library:
For subplot 2:
For subplot 3:
For subplot 4:
The output of the above statements will be the following box plot:
Here, the centre orange line represents the median value. The two
horizontal lines at the bottom and top represents the minimum and
maximum values and the outliers are presented above or below the
horizontal lines.
SELF-ASSESSMENT QUESTIONS
Q1 Consider the attendance and marks of 10 students for two
different class sections. Here, attendance will be represented
on x-axis and marks will be represented on y-axis, for each
class section.
Perform the following operation:
(i) Draw and display the Line plot for both class sections
in one graph.
(ii) Draw and display the Scatter plot for both class
sections in one graph.
Q2 Consider the age of 10 persons represented as x-axis data
points and duration (in number of hours) spend in watching
T.V. as y-axis data points.
Draw and display the bar plot for these data points.
Q3 Consider the following data points that represents the number
of employees in each department: X = [9, 7, 5, 10, 8, 15].
The name of the departments are given as: Dept = [‘HR’,
‘Material’, ‘Accounts’, ‘Finance’, ‘Testing’, ‘Development’].
Define the distinct colors for each department.
Draw and display the Pie chart to show the percentage of
employees working in each department.
Q4 Consider the marks of 10 students ranging from 30 to 100.
Draw and display the histogram for the data points.
Q5 Combine the Line plot and Scatter plot created in question 1.
Draw and display the combined Line-Scatter plot.