0% found this document useful (0 votes)

56 views5 pages

Lab Numpy Pandas Matplot

The document outlines a lab practice for a programming course focused on using Python libraries NumPy, Pandas, and Matplotlib for data analysis. It includes submission instructions, an introduction to a dataset of top Spotify tracks from 2000-2019, and eight tasks that guide students through importing data, performing statistical analysis, and visualizing correlations. Students are required to follow academic honesty guidelines and submit their work in a specified format by the deadline.

Uploaded by

Yến Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views5 pages

Lab Numpy Pandas Matplot

Uploaded by

Yến Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lab Practice : NumPy, Pandas, and

Matplotlib
BANA3020 Introduction to Programming with Python Fall 2024

Lab Practice Submission Instructions:

• This is an individual lab practice and will typically be assigned in the laboratory (computer lab). You can
use your personal computer but all quizzes and practical exams will be performed with a lab computer.
• Your program should work correctly on all inputs. If there are any specifications about how the program
should be written (or how the output should appear), those specifications should be followed.
• Your code and functions/modules should be appropriately commented. However, try to avoid making
your code overly busy (e.g., include a comment on every line).
• Variables and functions should have meaningful names, and code should be organized into function-
s/methods where appropriate.
• Academic honesty is required in all work you submit to be graded. You should NOT copy or share your
code with other students to avoid plagiarism issues.
• Use the template provided to prepare your solutions.
• You should upload your .py file(s) to Canvas according to deadline.
• Submit separate .py file for each Lab problem with the following naming format: Lab12_Q1.py. Note:
If you are working on Jupyter Notebook, you need to download/convert it to Python .py file for sub-
mission.
• Late submission of lab practice without an approved extension will incur penalties.

Lab Practice Numpy Pandas Matplotlib Page 1

Introduction to Exploring a Data Set with Python
In the lecture you have been introduced to NumPy, Pandas, and Matplotlib. These make up the essential
toolkit for data analysis in Python. In this lab, you will be introduced to how to use these tools to work with a
data set. This lab contains 8 small tasks that aims to give you a tutorial on how to use these powerful Python
libraries.
We will be working with the Top Hits Spotify from 2000-2019 data set from Kaggle. Three CSV file for
the data set is provided on Canvas. The description about the data set provided on the site is as follows:

Context:
This dataset contains audio statistics of the top 2000 tracks on Spotify from 2000-2019. The data
contains about 18 columns each describing the track and it’s qualities.

Columns that we will use:

• song: Name of the Track.
• duration_ms: Duration of the track in milliseconds.
• year: Release Year of the track.
• popularity: The higher the value the more popular the song is.
• danceability: Danceability describes how suitable a track is for dancing.
• energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity
and activity.
• loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across
the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality
of a sound that is the primary psychological correlate of physical strength (amplitude).
• acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0
represents high confidence the track is acoustic.
• tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical termi-
nology, tempo is the speed or pace of a given piece and derives directly from the average beat
duration.

Lab Practice Numpy Pandas Matplotlib Page 2

Installing Pandas
Import pandas library using the following code:

1 import pandas as pd

If your Anaconda environment doesn’t have Pandas installed, please follow this guide: Installing Pandas for
Anaconda.

Task 1 - Import the data set

In this task, you need to download the songs_normalize.csv file from Canvas, place it in the same directory
(folder) as Python lab file and import it into Python with Pandas. If you want to see what the data set looks
like, you can open it in Microsoft Excel or Google Sheets.

To read a .csv (comma-separated-values) file with Pandas, use pd.read_csv(path function. This func-
tion takes in a string representing the file path and opens it as a Pandas DataFrame object.

The file path already given in the template notebook is ./songs_normalize.csv means read it from the
current directory. The dot . symbol represents the current directory.

You should store the imported data frame in a variable called dataset or df.

Task 2 - Preview the data frame

Let’s see what the data set looks like when imported into Python! We can quickly preview the first 5 rows of
the data set along with the header (column names) using just 1 line of code.

Use the dataframe.head() method provided by Pandas to do this.

Task 3 - Descriptive statistics

Usually when investigating a new data set, we would like to quickly look at the basic descriptive statistics of
each variable (column) in our data set.

Pandas has a convenient built-in method for a data frame to do this called dataframe.describe(). Call
this method to see the output.

Task 4 - Miliseconds to seconds

From Task 1 we could see that the duration is currently stored in miliseconds, which is a bit cumbersome to
read for us. Create a new column to store the duration in seconds instead.

To access and retrieve a single column in a Pandas data frame as an array you can use syntax similar to
Python dictionary: dataframe[’column_name’].

Lab Practice Numpy Pandas Matplotlib Page 3

Use NumPy to calculate the new duration in seconds and in minutes. To perform element-wise operations on
arrays with NumPy, simply use the array as a term in your mathematical expression. E.g. array / 5 will
divide each element in the array by 5.

Store these 2 new arrays as 2 new columns in your dataframe by using dictionary-like syntax, with the
new column names duration_sec and duration_min for seconds and minutes, respectively. Adding a new
column is just as simple as: dataframe[’new_column’]=my_array.

Task 5 - Duration statistics

We also want to see some basic statistics about the duration such as its mean (average), longest duration
(max) and shortest duration (min). Use np.mean(array), np.max(array), np.min(array) to get these
values and print it.

Next you should find what is the range of our song durations (difference between the longest and short-
est duration).

Finally, find the percentage of songs that have duration longer than the average value. To do this you
will need to use np.where function.
indices = np.where(condition)
This function returns a tuple, containing arrays of indices of elements in your array that satisfy the condition
given. In a 2-D array, the tuple will have 2 arrays corresponding to the row indices and column indices. In a
1-D array the tuple will contain only 1 array. For example:
negatives = np.where(a < 0)
Returns negatives = ([...],) which is a tuple containing a single array that holds indices of elements in
a that is negative. To access this array use negatives[0].

You should use where to find the indices of durations greater than the average. The percentage of songs that
have duration greater than average is the length of this array divided by the length of duration_sec times 100.

Next, find the song names that have durations over average. First you need to convert your column ar-
ray to a NumPy array using column.to_numpy() method. Next you can pass directly the result of where()
as index to a NumPy array to retrieve the elements in the array that satisfies the condition. For example if
you use:
a[np.where(a < 0)]
It will return an array of numbers in a that are smaller than zero.

Task 6 - Pearson correlation

NumPy also provides other useful statistical tools such as correlation computation for independent vari-
ables. The Pearson correlation coefficient measures the linear association between variables. In NumPy,
the corrcoef(x,y) function gives a Pearson correlation matrix:
" #
corr(x,x) corr(x,y)
corr(y,x) corr(y,y)
We want to get the correlation between the variables x and y so we choose either the [0,1] element or
[1,0] (corr(x,y) and corr(y,x) are the same). The correlation can be in the [-1,1]. With 0 meaning

Lab Practice Numpy Pandas Matplotlib Page 4

no correlation, 1 meaning strong positive correlation and -1 meaning strong negative correlation.

Refer to Page 2 for an explanation of what each variable means. In our case, let’s see what is the correlation
between some pair of variables. In this task you need to rite the code to calculate:

• Correlation between energy and tempo and print it.

• Correlation between energy and loudness and print it.

• Correlation between energy and acousticness and print it.

Task 7 - Finding unexpected entries

Our data set is title Top Hits Spotify 2000-2019, so we would expect the songs included to be within this
time range. However, there are songs in this data set that are outside of this range. Retrieve the list of names
for these songs. Use where to find and print names of songs whose year value is less than 2000 and songs
whose year value is greater than 2019.

Task 8 - Plotting correlations

Use matplotlib to visualize the correlations between our variables using scatter plots. You can use plt.scatter(xdata,ydata
You will need to write the code to plot:

• energy vs. loudness.

• energy vs. acousticness.

• energy vs. danceability.

• tempo vs. popularity.

• speechiness vs. popularity.

To do this you just need to create an array xdata = your column for x and ydata = your column for
y, and provide them as arguments for plt.scatter. Optionally, if you want to make your plot look a bit
nicer you can look into using cmap to set a colourmap, and c to assign a dimension to map the colours. For
example you can use:
plt.scatter(xdata,ydata,cmap="plasma",c=xdata)
This will assign the x dimension to the plasma colour map. Refer to this page for a list of available colourmaps:
Matplotlib Colormap.

Lab Practice Numpy Pandas Matplotlib Page 5

TV Scientific Assessment
No ratings yet
TV Scientific Assessment
9 pages
Forouzan, Gilberg
0% (5)
Forouzan, Gilberg
7 pages
MS Excel-If
0% (1)
MS Excel-If
18 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
B.tech Scheme and Syllabus 2019-2020 Onwards
No ratings yet
B.tech Scheme and Syllabus 2019-2020 Onwards
22 pages
MakeGrid v197.mq4
No ratings yet
MakeGrid v197.mq4
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Final Ip Practical File
No ratings yet
Final Ip Practical File
29 pages
John Cod - Coding Languages - SQL, Linux, Python, Machine Learning. The Step-By-Step Guide For Beginners
No ratings yet
John Cod - Coding Languages - SQL, Linux, Python, Machine Learning. The Step-By-Step Guide For Beginners
472 pages
Unit-V Python - BCC402
No ratings yet
Unit-V Python - BCC402
20 pages
AMDP - ABAP Managed Database Procedures
No ratings yet
AMDP - ABAP Managed Database Procedures
3 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Class X - A.I. - Practical Lab Manual - VVA 2024-25
No ratings yet
Class X - A.I. - Practical Lab Manual - VVA 2024-25
50 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Unit 3
No ratings yet
Unit 3
14 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
Unit 4
No ratings yet
Unit 4
49 pages
PP&DS Unit Iii
No ratings yet
PP&DS Unit Iii
26 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
Python Libraries
No ratings yet
Python Libraries
53 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
UML Quick Reference Card
100% (6)
UML Quick Reference Card
1 page
Conditional Branching-Loops
No ratings yet
Conditional Branching-Loops
35 pages
RAW Data
No ratings yet
RAW Data
22 pages
FDS Lab Manual-1
No ratings yet
FDS Lab Manual-1
51 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Ip - Report - Kuti Page
No ratings yet
Ip - Report - Kuti Page
37 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
DXE 24gksmknvj
No ratings yet
DXE 24gksmknvj
16 pages
Unit 5
No ratings yet
Unit 5
40 pages
Answers 1
No ratings yet
Answers 1
17 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Ex No: 1 Program To Generate Electricity Bill AIM
No ratings yet
Ex No: 1 Program To Generate Electricity Bill AIM
2 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Hints and Answers
No ratings yet
Hints and Answers
13 pages
Data Preprocessing Python Tome I
No ratings yet
Data Preprocessing Python Tome I
10 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Numpy Arrays
No ratings yet
Numpy Arrays
7 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Guidelines DAVP
No ratings yet
Guidelines DAVP
3 pages
Fundamentals of Data Science Lab Manual-5-26
No ratings yet
Fundamentals of Data Science Lab Manual-5-26
22 pages
Data Science
No ratings yet
Data Science
5 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
No ratings yet
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
54 pages
Tutorial: Intro To React: Before We Start What We're Building
No ratings yet
Tutorial: Intro To React: Before We Start What We're Building
15 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
Pandas Worksheet
No ratings yet
Pandas Worksheet
3 pages
Experiment 3
No ratings yet
Experiment 3
3 pages
Computer Science and Engineering
No ratings yet
Computer Science and Engineering
34 pages
GE02 (DAVP) Assignment
No ratings yet
GE02 (DAVP) Assignment
3 pages
Conguration of MDG, Consolidation and Mass Processing (1) 1
No ratings yet
Conguration of MDG, Consolidation and Mass Processing (1) 1
126 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
DBMS Interview Questions and Answers: Sindhuja Hari
No ratings yet
DBMS Interview Questions and Answers: Sindhuja Hari
71 pages
Bca 204
No ratings yet
Bca 204
2 pages
Fletcher Reeves Handout
No ratings yet
Fletcher Reeves Handout
6 pages
Strings
No ratings yet
Strings
15 pages
Sloan - A Point in Polygon Program PDF
No ratings yet
Sloan - A Point in Polygon Program PDF
3 pages
Int CCPR Ngo Est 99 8742 e
No ratings yet
Int CCPR Ngo Est 99 8742 e
69 pages
Bugreport X678B OP TP1A.220624.014 2023 09 04 13 22 16 Dumpstate - Log 1999
No ratings yet
Bugreport X678B OP TP1A.220624.014 2023 09 04 13 22 16 Dumpstate - Log 1999
35 pages
Algorithms For Data Compression in Wireless Computing Systems
No ratings yet
Algorithms For Data Compression in Wireless Computing Systems
7 pages
Estonian Foreign Policy Strategy 2030: Tallinn 2020
No ratings yet
Estonian Foreign Policy Strategy 2030: Tallinn 2020
42 pages
NEW CCT Lecture 2
100% (1)
NEW CCT Lecture 2
46 pages
Chapter2 Python 100 MCQs With Answers
No ratings yet
Chapter2 Python 100 MCQs With Answers
18 pages
Jesr 2024 0053
No ratings yet
Jesr 2024 0053
11 pages
Algorithm Analysis Module 3 Important Topics
No ratings yet
Algorithm Analysis Module 3 Important Topics
51 pages
Assignment 2: Relational Queries, SQL
No ratings yet
Assignment 2: Relational Queries, SQL
8 pages
Python Program To Make A Simple Calculator: Def Return
No ratings yet
Python Program To Make A Simple Calculator: Def Return
3 pages
CS162 Operating Systems and Systems Programming Address Translation
No ratings yet
CS162 Operating Systems and Systems Programming Address Translation
14 pages
En Subject PDF
No ratings yet
En Subject PDF
7 pages
Practice 5
No ratings yet
Practice 5
2 pages
Sheet1 Stud
No ratings yet
Sheet1 Stud
5 pages
Awaiting Inspection
No ratings yet
Awaiting Inspection
2 pages
To Everyone That Wants A Soundtrack To Their Lives Just Like All Their Favorite TV Characters Have
No ratings yet
To Everyone That Wants A Soundtrack To Their Lives Just Like All Their Favorite TV Characters Have
1 page
An Autonomous Multi Agent LLM Framework For Agile Software Development
No ratings yet
An Autonomous Multi Agent LLM Framework For Agile Software Development
7 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet

Lab Numpy Pandas Matplot

Uploaded by

Lab Numpy Pandas Matplot

Uploaded by

Lab Practice : NumPy, Pandas, and

Lab Practice Submission Instructions:

Lab Practice Numpy Pandas Matplotlib Page 1

Columns that we will use:

Lab Practice Numpy Pandas Matplotlib Page 2

Task 1 - Import the data set

Task 2 - Preview the data frame

Use the dataframe.head() method provided by Pandas to do this.

Task 3 - Descriptive statistics

Task 4 - Miliseconds to seconds

Lab Practice Numpy Pandas Matplotlib Page 3

Task 5 - Duration statistics

Task 6 - Pearson correlation

Lab Practice Numpy Pandas Matplotlib Page 4

• Correlation between energy and tempo and print it.

• Correlation between energy and loudness and print it.

• Correlation between energy and acousticness and print it.

Task 7 - Finding unexpected entries

Task 8 - Plotting correlations

• energy vs. loudness.

• energy vs. acousticness.

• energy vs. danceability.

• tempo vs. popularity.

• speechiness vs. popularity.

Lab Practice Numpy Pandas Matplotlib Page 5

You might also like