0% found this document useful (0 votes)
2 views2 pages

Lab 10

This lab guide outlines objectives for using Pandas and Matplotlib for data visualization in a data analysis course. Students are instructed to create an Excel file, import its data into a Python DataFrame, and perform various data manipulations and visualizations, including sorting, filtering by music genre, and creating pie and bar charts. The guide also includes specific instructions for plotting relationships and distributions related to music data sourced from Kaggle.

Uploaded by

imamaliazizov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

Lab 10

This lab guide outlines objectives for using Pandas and Matplotlib for data visualization in a data analysis course. Students are instructed to create an Excel file, import its data into a Python DataFrame, and perform various data manipulations and visualizations, including sorting, filtering by music genre, and creating pie and bar charts. The guide also includes specific instructions for plotting relationships and distributions related to music data sourced from Kaggle.

Uploaded by

imamaliazizov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CS125 Introduction to Data Analysis for Social Sciences

Lab Guide 10

Objectives: Pandas – data visualization with matplotlib and pandas.

1. Open Excel and create the file exactly as shown.

2. Save the file to your Desktop with the name: music.xlsx.

3. Using the data from the file music.xlsx, create a Python script (Lab10_yourname.py) that does the
following:
a. Import the data in the file into a DataFrame, songs.
b. Sort the DataFrame by danceability.
c. Add a new column to songs, duration_sec, which stores the duration of each song in seconds (hint:
the duration column stores duration in milliseconds. There are 1000 milliseconds in a second).
d. Select the records with the music_genre of Rock and store as a new DataFrame, rock.
e. Select the records with the music_genre of Hip-Hop and store as a new DataFrame, hh.
f. Select the records with the music_genre of Alternative and store as a new DataFrame,
alternative.
g. Open a new Figure1 window and create the pie and bar charts shown in the figure below.
i. The pie chart compares the total number of Hip-Hop songs, Rock songs and Alternative
songs. The colors used are purple, red, and brown.
ii. The bar charts show the average (mean) popularity of Hip-Hop and Rock compared to the
mean popularity of all music_genres.

h. Create the plots shown in Figure2 below. The plots compare ‘danceability’ vs. ‘energy’ for
Alternative and Hip-Hop music.

i. Create the histogram in the same Figure window which shows the distribution of the song durations
in seconds.
Hint: to plot the histogram, the hist function calculates the number of values in each bin, and the
lower bound of each bin. It returns these values in a tuple, where the second element in the tuple is
the lower bound of each bin. You can save the return values and set the ticks on the axis using this
value:
hist_data = plt.hist( … )
plt.xticks( hist_data[1] )

Data source: Kaggle - Prediction of Music Genre

You might also like