DVP First Module
DVP First Module
Code: BCS358D
Course: Data Visualization with Python
Credits: 1
CIE: 50 Marks
L:T:P - 1:0:0
SEE: 50 Marks
SEE Hour: 1
Total Marks: 100
Module-1
1. Why Is Data Visualization Important?
2. Why Do Modern Businesses Need Data Visualization?
3. The Future of Data Visualization
4. How Data Visualization Is Used for Business Decision-Making Introducing
Data Visualization Techniques
Module-2
• Data Gathering and Cleaning
• Cleaning Data Checking for Missing Values
• Handling the Missing Values Reading and Cleaning CSV Data
• Merging and Integrating Data
Module-3
Data Exploring and Analysis
Data Visualization
Humans can understand data better through pictures rather than by reading
numbers in rows and columns.
It visualizes extracted information into logical and meaningful parts and helps
users avoid information overload by keeping things simple, relevant, and clear.
There are many ways in which visualizations help a business to improve its
decision-making.
• Faster Responses
With the massive amount of data collected daily via social networks or via
companies’ systems, it becomes incredibly useful to put useful interpretations
of the collected data into the hands of managers and decision-makers so they
can quickly identify issues and improve response times.
Simplicity
It is impossible to make efficient decisions based on large amounts of raw
data.
Easier Pattern Visualization
Some libraries are bundled with Python, while others should be directly
downloaded and installed.
For instance, you can install Matplotlib using pip as follows:
python -m pip install -U pip setuptools
python -m pip install matplotlib
Popular Libraries for Data Visualization
in Python
The Python language provides numerous data visualization libraries for
plotting data.
The most used and common data visualization libraries are Pygal, Altair, VisPy,
PyQtGraph, Matplotlib, Bokeh, Seaborn, Plotly, and
ggplot,
Tabular Data and Data Formats
# Dimensions
print("\nSeries Dimensions: ", s.ndim)
• size
• The pandas.series.size is used to return the number of
elements in the Pandas Series.
import pandas as pd
Syntax:
data = [['Ahmed',35],['Ali',17],['Omar',25]]
DataFrame1 = pd.DataFrame(data,columns=['Name','Age’])
print (DataFrame1)
To retrieve data from a data frame starting from index 1 up to the end
of rows.
DataFrame1[1:]
We can create a data frame using a dictionary.
import pandas as pd
print (dataframe2)
We can select only the first two rows in a data frame.
dataframe2[:2]
we can select only the name column in a
data frame.
dataframe2['Name']
#create a variable with integer value.
a=100
print("The type of variable having value", a, " is ", type(a))
print(thislist)
List Items
List items are ordered, changeable, and allow duplicate values.
List items are indexed, the first item has index [0], the second item has
index [1]
dictionary
Dict = {"Name": "Gayle", "Age": 25}
PANDAS
Python's pandas library is used for data analysis.
import pandas as pd
import is a key word used to import the packages along with
the libraries
Two main data structures
1. Series
2. DataFrame:
• A Series is a one-dimensional labeled array capable of
holding any data type (integers, strings, floating
point numbers, Python objects, etc.).
• We can easily convert the list, tuple, and dictionary into series using
"series' method.
• The row labels of series are called the index. A Series cannot contain
multiple columns.
It has the following parameter:
2. index: The value of the index should be unique. It must be of the same
length as data.
3. dtype: It refers to the data type of series. 4. copy: It is used for copying
the data.
In Python, we are used to working with lists as such:
num = [1, 7, 2]
num = [1, 7, 2]
n = pd.Series(num)
print(n)
We can also create a Series from dict.
If the dictionary object is being passed as an input and the index is not
specified, then the dictionary keys are taken in a sorted order to construct
the index.
seriesA + seriesB
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e’])
seriesB = pd.Series([10,20,-10,-50,100],index = ['z', 'y', 'a', 'c', 'e'])
seriesA + seriesB
import pandas as pd
data1=[1,2,3,4,5,6]
data2=[2,3,4,5,6,7]
a=pd.Series(data1)
b=pd.Series(data2)
print(a.add(data2))
seriesA.add(seriesB, fill_value=0)
seriesA – seriesB
seriesA.sub(seriesB, fill_value=1000)
DataFrame:
• pandas.DataFrame(data=None, index=None, columns=None,
dtype=None, copy=False)
dtype: data type of each column Heterogeneous data Size Mutable Data
Mutable
• Heterogeneous data
• Size Mutable
• Data Mutable
Creating DataFrame using list
Creating DataFrame using list
import pandas as pd
D = pd.DataFrame([[10,20], [30,40]])
print(D)
Creating DataFrame with row index and
column label
import pandas as pd
data = [[10,20],[30,40]]
D = pd.DataFrame(data, columns = ['col1', 'col1'], index = ['row1', 'row2’])
print(D)
Creating DataFrame using dictionary
import pandas as pd
data = {'Name': ['Anu', 'Sia'],'Marks':[19,25]}
D = pd.DataFrame(data, index = [1,2])
Creating DataFrame from dictionary of
Series
import pandas as pd
d = {'one': pd.Series([10, 20, 30, 40],
index =['a', 'b', 'c', 'd']), 'two': pd.Series([10, 20, 30, 40],
index =['a', 'b', 'c', 'd’])}
df = pd.DataFrame(d)
print(df)
Creating DataFrame from list of
dictionary
import pandas as pd
data = [{'b': 2, 'c':3}, {'a': 10, 'b': 20, 'c': 30}]
df = pd.DataFrame(data, index =['first', 'second’])
df = pd.DataFrame(d)
print(df)
Writing DataFrame to csv file
import pandas as pd
data = {'Name': ['Anu', 'Sia'],'Marks':[19,25]}
D = pd.DataFrame(data, index = [1,2])
print(D)
D.to_csv("E:\stu.csv")
DataFrame attributes
• index
• columns
• axes
• dtypes
• size
• shape
• ndim
• empty
•T
• values
index
There are two types of index in a DataFrame one is the row index and the
other is the column index.
The index attribute is used to display the row labels of a data frame
object.
The row labels can be of 0,1,2,3,… form and can be of names.
Syntax: dataframe_name.index
columns
• This attribute is used to fetch the label values for columns present in a
particular data frame.
• Syntax: dataframe_name.columns
import pandas as pd
# Dataset
data = {
'Student': ["Amit", "John", "Jacob", "David", "Steve"],
'Rank': [1, 4, 3, 5, 2],
'Marks': [95, 70, 80, 60, 90]
}
# Dataset
data = {
'Student': ["Amit", "John", "Jacob", "David", "Steve"],
'Rank': [1, 4, 3, 5, 2],
'Marks': [95, 70, 80, 60, 90]
}
# Dataset
data = {
'Student': ["Amit", "John", "Jacob", "David", "Steve"],
'Rank': [1, 4, 3, 5, 2],
'Marks': [95, 70, 80, 60, 90]
}
The pandas.DataFrame.head()