Session 9 and 10 Data Visualization
Session 9 and 10 Data Visualization
BGU
Data Visualization
• Figures 2.3 and 2.4 show the histogram of Bollywood movie budget in
crores of rupees (1 crore = 10 million) and box-office collection,
respectively, based on the data of 149 Bollywood movies (Data file:
Bollywood Data.Xls)
Fig. 2.3. Histogram of Bollywood movie budget
Fig. 2.4. Histogram of Bollywood movie box-office
collection
Data Visualization
• From Figure 2.3, we can infer that the budget for large proportion of
movies is less than 50 crores and it is a right-skewed distribution (that
is, long tail on the right-hand side).
• In Figure 2.4, we can also see an outlier where the box-office
collection is more than 700 crores (movie PK acted by Amir Khan and
directed by Raj Kumar Hirani).
• The cumulative histograms are called Ogive curves.
• The Ogive curve for Bollywood box-office collection is shown in the
next slide:
Fig. 2.5 Ogive curve for box-office collection
Data Visualization
• Pie chart is mainly used for categorical data and is a circular chart that
displays the proportion of each category in the data set.
• The pie chart for the movie genre based on the Bollywood movie data
set is shown in the next slide.
• Pie chart helps to visualize the proportion (percentage) of each
category as sector of a circle.
Fig. 2.8. Pie chart for movie genre.
Scatter Plot
• Scatter plot is a plot of two variables that will assist data scientists to
understand if there is any relationship between two variables.
• The relationship could be linear or non-linear.
• Scatter plot is also useful for assessing the strength of the relationship
and to find if there are any outliers in the data.
• Figure in the next slide shows a scatter plot between the movie
budget and movie box-office collection (in crores of rupees) plotted
using the data set in file Bollywood Data.xlsx.
Fig. 2.9 Scatter plot between movie budget and box-office
collection.
Scatter Plot
• The box plot is constructed using IQR, minimum and maximum values.
• The box plot for the data in Table 2.4 is shown in Figure 2.11.
Box Plot (or Box and Whisker Plot)
Box Plot (or Box and Whisker Plot)