The document discusses different data visualization techniques in Python including scatter plots, histograms, and bar plots. It covers the basics of each plot, how and when to use them, and provides code examples to generate each type of plot using Matplotlib.
The document discusses different data visualization techniques in Python including scatter plots, histograms, and bar plots. It covers the basics of each plot, how and when to use them, and provides code examples to generate each type of plot using Matplotlib.
Part I In this lecture We will learn how to create basic plots using matplotlib library
• Scatter plot
• Histogram
• Bar plot
Python for Data Science 2
Data Visualization • Data visualization allows us to quickly interpret the data and adjust different variables to see their effect • Technology is increasingly making it easier for us to do so Why visualize data? o Observe the patterns
o Identify extreme values that could be anomalies
o Easy interpretation
Python for Data Science 3
Popular plotting libraries in Python Python offers multiple graphing libraries that offers diverse features
• matplotlib • to create 2D graphs and plots
• pandas visualization • easy to use interface, built on Matplotlib • seaborn • provides a high-level interface for drawing attractive and informative statistical graphics • ggplot • based on R’s ggplot2, uses Grammar of Graphics • plotly • can create interactive plots Python for Data Science 4 Matplotlib • Matplotlib is a 2D plotting library which produces good quality figures
• Although it has its origins in emulating the
MATLAB graphics commands, it is independent of MATLAB
• It makes heavy use of NumPy and other
extension code to provide good performance even for large arrays
Python for Data Science 5
Scatter plot
Python for Data Science 6
Scatter Plot What is a scatter plot? • A scatter plot is a set of points that represents the values obtained for two different variables plotted on a horizontal and vertical axes
When to use scatter plots?
• Scatter plots are used to convey the relationship between two numerical variables • Scatter plots are sometimes called correlation plots because they show how two variables are correlated Python for Data Science 7 Importing data into Spyder Importing necessary libraries ‘pandas’ library to work with dataframes
‘numpy’ library to do numerical operations
‘matplotlib’ library to do visualization
Python for Data Science 8
Importing data into Spyder Importing data
Removing missing values from the dataframe
Python for Data Science 9
Scatter plot x y
Python for Data Science 10
Scatter plot The price of the car decreases as age of the car increases
Python for Data Science 11
Histogram
Python for Data Science 12
Histogram What is a histogram? • It is a graphical representation of data using bars of different heights • Histogram groups numbers into ranges and the height of each bar depicts the frequency of each range or bin
When to use histograms?
• To represent the frequency distribution of numerical variables
Python for Data Science 13
Histogram x Histogram with default arguments
Python for Data Science 14
Histogram
Python for Data Science 15
Histogram Frequency distribution of kilometre of the cars shows that most of the cars have travelled between 50000 – 100000 km and there are only few cars with more distance travelled
Python for Data Science 16
Bar plot
Python for Data Science 17
Bar plot What is a bar plot? • A bar plot is a plot that presents categorical data with rectangular bars with lengths proportional to the counts that they represent When to use bar plot? • To represent the frequency distribution of categorical variables • A bar diagram makes it easy to compare sets of data between different groups
Python for Data Science 18
Bar plot
x height of the bars
Python for Data Science 19
Bar plot Frequency distribution of fuel type
Python for Data Science 20
Bar plot
x height of the bars
Set the labels of the xticks
Set the location of the xticks Python for Data Science 21 Bar plot Bar plot of fuel type shows that most of the cars have petrol as fuel type
Python for Data Science 22
Summary We have learnt how to create basic plots using matplotlib library