0% found this document useful (0 votes)

43 views11 pages

DAV Project

Uploaded by

sahabaheer860

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views11 pages

DAV Project

Uploaded by

sahabaheer860

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

COMPUTER SCIENCE 1

DATA ANALYSIS AND

VISUALIZATION

Deen Dayal Upadhyaya College

(University of Delhi)
Sector-3, Dwarka · New Delhi-110078

Submitted to: Submitted By:

Mr. Raj Kumar Sharma Peeyush Verma
Mrs. Megha Bansal Kunal Sharma
Professor Yash Kumar
Roll no.-22HCS4178
B.SC CS(H)

1
COMPUTER SCIENCE 2

Spotify-2023 Analysis
Kaggle

Clean the given Data

import pandas as pd

# Load the dataset

df_spotify = pd.read_csv(r'C:\Users\preml\Desktop\3rd sem\DAV\spotify-2023.
↪csv', encoding='ISO-8859-1')

# Check for missing values

missing_values = df_spotify.isnull().sum()

# Check for duplicate rows

duplicate_rows = df_spotify.duplicated().sum()

# Output the findings

print('Missing values in each column:\n', missing_values)
print('\nNumber of duplicate rows:', duplicate_rows)

Missing values in each column:

track_name 0
artist(s)_name 0
artist_count 0
released_year 0
released_month 0
released_day 0
in_spotify_playlists 0
in_spotify_charts 0
streams 0
in_apple_playlists 0
in_apple_charts 0
in_deezer_playlists 0
in_deezer_charts 0
in_shazam_charts 50
bpm 0
key 95

2
COMPUTER SCIENCE 3

mode 0
danceability_% 0
valence_% 0
energy_% 0
acousticness_% 0
instrumentalness_% 0
liveness_% 0
speechiness_% 0
dtype: int64

Number of duplicate rows: 0

Result:

There are no missing values in any column, and there are no duplicate rows in the dataset. The
data appears to be clean and ready for analysis.

1.0.1 Visualization for artist name and their number of tracks

[13]: import pandas as pd

# Load the dataset

spotify_data = pd.read_csv(r'C:\Users\preml\Desktop\3rd sem\DAV\spotify-2023.
↪csv', encoding='ISO-8859-1')

# Visualization for artist name and their number of tracks

artist_tracks = spotify_data.groupby('artist(s)_name')['track_name'].nunique().
↪sort_values(ascending=False).head(10)

plt.figure(figsize=(14, 7))
sns.barplot(x=artist_tracks.values, y=artist_tracks.index, hue=artist_tracks.
↪index, palette='muted')

plt.title('Top 10 Artists with the Most Tracks')

plt.xlabel('Number of Tracks')
plt.ylabel('Artist(s) Name')
plt.show()

3
COMPUTER SCIENCE 4

Result:

The visualization highlights the top 10 artists with the most unique tracks in the dataset. These
artists demonstrate prolific output, with each having a significant number of tracks to their name.
While the number of tracks is indicative of an artist's productivity, it's essential to consider other
factors, such as streaming numbers or listener engagement, to gauge an artist's overall impact
and popularity. Nonetheless, the data underscores the diversity and richness of the music
landscape, showcasing artists who have made substantial contributions in terms of content
creation.

1.0.2 The histogram for the distribution of ‘danceability’

[14]: # the histogram for the distribution of 'danceability'

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset

df_spotify = pd.read_csv(r'C:\Users\preml\Desktop\3rd sem\DAV\spotify-2023.
↪csv', encoding='ISO-8859-1')

# Create a histogram for the distribution of 'danceability_%' across the tracks

plt.figure(figsize=(12, 6))
sns.histplot(df_spotify['danceability_%'], bins=30, kde=True, color='skyblue')
plt.title('Distribution of Danceability Percentage')
plt.xlabel('Danceability (%)')
plt.ylabel('Frequency')
plt.show()

4
COMPUTER SCIENCE 5

Result

This line chart represents the average danceability of tracks per year, providing insights into the
evolution of this musical characteristic over time. The histogram underscores the diverse nature
of danceability across the tracks, providing a quantitative overview of how danceable the music
in the dataset tends to be. This analysis can serve as a foundation for further exploration into the
relationship between danceability and other musical attributes or listener preferences.

1.0.3 Distribution of energy percentages for the tracks

[16]: import pandas as pd

#Load the data

spotify_data = pd.read_csv(r'C:\Users\preml\Desktop\3rd sem\DAV\spotify-2023.
↪csv', encoding='ISO-8859-1')

# Plotting the distribution of the 'energy_%' column

plt.figure(figsize=(10, 6))
plt.hist(spotify_data['energy_%'], bins=30, color='lightgreen', alpha=0.7)
plt.title('Distribution of Energy')
plt.xlabel('Energy (%)')
plt.ylabel('Frequency')
plt.show()

5
COMPUTER SCIENCE 6

Result

The histogram provides a quantitative overview of the energy levels present in the music tracks.
This analysis can offer insights into the overall vibe or intensity of the music dataset, aiding in
further exploration or comparison with other musical attributes.

[5]: import pandas as pd

# Load the dataset

df_spotify = pd.read_csv(r'C:\Users\preml\Desktop\3rd sem\DAV\spotify-2023.
↪csv', encoding='ISO-8859-1')

# Let's visualize the average danceability per year

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
avg_danceability_per_year.plot(kind='line', marker='o', color='orange')
plt.title('Average Danceability per Year')
plt.xlabel('Year')
plt.ylabel('Average Danceability')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

6
COMPUTER SCIENCE 7

Result

In summary, the visualization offers a chronological perspective on how average danceability has
evolved over the years. This analysis can be instrumental for music analysts, researchers, or
enthusiasts aiming to understand temporal patterns in musical attributes and their potential
correlations with broader cultural or industry shifts.

The line chart above illustrates the total streams per year, providing a visual repre-
sentation of the changes in streaming volumes over time.
[6]: import pandas as pd

df_spotify['streams'] = pd.to_numeric(df_spotify['streams'], errors='coerce')

# Now let's try to plot the total streams per year again
import matplotlib.pyplot as plt

# Group by 'released_year' and sum the streams

streams_per_year = df_spotify.groupby('released_year')['streams'].sum()

plt.figure(figsize=(14, 7))
streams_per_year.plot(kind='line', marker='o', color='purple')
plt.title('Total Streams per Year')
plt.xlabel('Year')
plt.ylabel('Total Streams')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

7
COMPUTER SCIENCE 8

Result

The visualization offers a comprehensive overview of the total streaming landscape over the
years, reflecting both general trends and specific anomalies. Such insights can be invaluable for
stakeholders in the music industry, helping them understand consumption patterns and make
informed decisions related to content promotion, artist collaborations, and platform strategies.

1.0.4 This visualization helps to understand the distribution of music releases over
the years included in the dataset.

[15]: # Plotting the distribution of the 'released_year' column

import pandas as pd
#Load the dataset
spotify_data = pd.read_csv(r'C:\Users\preml\Desktop\3rd sem\DAV\spotify-2023.
↪csv', encoding='ISO-8859-1')

plt.figure(figsize=(10, 6))
spotify_data['released_year'].value_counts().sort_index().plot(kind='bar',␣
↪color='skyblue')

plt.title('Number of Tracks Released Each Year')

plt.xlabel('Year')
plt.ylabel('Count')
plt.show()

8
COMPUTER SCIENCE 9

Result

[7]: import seaborn as sns

import pandas as pd
import matplotlib.pyplot as plt

# Set up the design of the plots

sns.set(style='whitegrid')

# Plotting the relationship between

danceability, valence, and energy
plt.figure(figsize=(10, 6))
sns.scatterplot(data=spotify_data,
x='danceability_%', y='valence_%',
size='energy_%', hue='energy_%',
palette='coolwarm', sizes=(20,200))
plt.title('Relationship between
Danceability, Valence, and Energy')
plt.xlabel('Danceability (%)')
plt.ylabel('Valence (%)')
plt.legend(title='Energy (%)',
loc='upper right')
plt.grid(True)
plt.show()
9
COMPUTER SCIENCE 10

Result:

This scatter plot helps to understand how these three attributes correlate with each other across different
songs in the dataset. The size and color of the points represent the energy level, providing a multi-
dimensional view of the music characteristics.

10
COMPUTER SCIENCE 11

Python (2024)
100% (2)
Python (2024)
466 pages
Minh Hoa KTHK1 Anh 11 - Linh
No ratings yet
Minh Hoa KTHK1 Anh 11 - Linh
2 pages
Honka B2B Brochure 2020
No ratings yet
Honka B2B Brochure 2020
66 pages
Apa Example Essay
100% (2)
Apa Example Essay
5 pages
(PDF) Laser Diode
No ratings yet
(PDF) Laser Diode
16 pages
Perth 2014 - Abstract Book - Final PDF
100% (1)
Perth 2014 - Abstract Book - Final PDF
277 pages
1.1 Univariate Analysis: 1.1.1 Categorical Data
No ratings yet
1.1 Univariate Analysis: 1.1.1 Categorical Data
10 pages
Light XlTwgwQ0 OvDn1N7
No ratings yet
Light XlTwgwQ0 OvDn1N7
41 pages
SOP For Export of Fruits and Vegetables To EU
100% (2)
SOP For Export of Fruits and Vegetables To EU
51 pages
Lab 10
No ratings yet
Lab 10
2 pages
Loading and Wrangling Data With Pandas and NumPy
No ratings yet
Loading and Wrangling Data With Pandas and NumPy
46 pages
Lecture3 Pandas and Scraping
No ratings yet
Lecture3 Pandas and Scraping
54 pages
Enhancing The Weather - Governance of Weather Modification Activit
No ratings yet
Enhancing The Weather - Governance of Weather Modification Activit
69 pages
T Sivaprakash MBA BA03 040 Capstone Project
No ratings yet
T Sivaprakash MBA BA03 040 Capstone Project
16 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
19 pages
ML (Project) Merged
No ratings yet
ML (Project) Merged
16 pages
Ip Project
No ratings yet
Ip Project
20 pages
R Final
No ratings yet
R Final
19 pages
Stats Assignment
No ratings yet
Stats Assignment
20 pages
Spotify Analysis
No ratings yet
Spotify Analysis
9 pages
121A1079 Sma Exp6
No ratings yet
121A1079 Sma Exp6
7 pages
Phase3 NM
No ratings yet
Phase3 NM
7 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
Spotify 1
No ratings yet
Spotify 1
7 pages
Naan Mudhalvan
No ratings yet
Naan Mudhalvan
4 pages
Analyse
No ratings yet
Analyse
2 pages
Spotify Analysis
No ratings yet
Spotify Analysis
3 pages
Fall Convocation 2024 Graduation and Convocation - McGill University
No ratings yet
Fall Convocation 2024 Graduation and Convocation - McGill University
1 page
Spottify 1
No ratings yet
Spottify 1
8 pages
Spotify Data Analysis Report
No ratings yet
Spotify Data Analysis Report
6 pages
The Role of Subject Knowledge in The Eff PDF
No ratings yet
The Role of Subject Knowledge in The Eff PDF
15 pages
Project 2
No ratings yet
Project 2
3 pages
Ip - Report - Kuti Page
No ratings yet
Ip - Report - Kuti Page
37 pages
DVST Practicle Finalll
No ratings yet
DVST Practicle Finalll
22 pages
Codes H
No ratings yet
Codes H
3 pages
Final Task
No ratings yet
Final Task
19 pages
Spotify Data Analysis SQL Project 1712710947
No ratings yet
Spotify Data Analysis SQL Project 1712710947
23 pages
De CBP B3 Spotify
No ratings yet
De CBP B3 Spotify
11 pages
Project IP Coding
No ratings yet
Project IP Coding
2 pages
Answer The Following Questions in About 120 Words
No ratings yet
Answer The Following Questions in About 120 Words
2 pages
Electronic Temperature Controllers: Multipact
No ratings yet
Electronic Temperature Controllers: Multipact
6 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
Harsh Veer Python Project
No ratings yet
Harsh Veer Python Project
20 pages
Objectives For The Dataset
No ratings yet
Objectives For The Dataset
2 pages
INDEX
No ratings yet
INDEX
16 pages
C1M4 PracticeLab 1 Spotify Case Study Attachment
No ratings yet
C1M4 PracticeLab 1 Spotify Case Study Attachment
11 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Ip Spotify Music Analysis
No ratings yet
Ip Spotify Music Analysis
11 pages
DS Final Project PDF
No ratings yet
DS Final Project PDF
20 pages
Week 7 - Data Visualization
No ratings yet
Week 7 - Data Visualization
14 pages
HRM As Map, Model and Theory
No ratings yet
HRM As Map, Model and Theory
3 pages
MS Broschuere FLUITEX EN Metric
No ratings yet
MS Broschuere FLUITEX EN Metric
12 pages
Lab Numpy Pandas Matplot
No ratings yet
Lab Numpy Pandas Matplot
5 pages
Data Visualization Presentation
No ratings yet
Data Visualization Presentation
13 pages
Project Spotify Haseeb
No ratings yet
Project Spotify Haseeb
37 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
What Is The Role of Students in Online Courses?
100% (1)
What Is The Role of Students in Online Courses?
4 pages
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
No ratings yet
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
131 pages
Escal - GT3 - Jupyter Notebook
No ratings yet
Escal - GT3 - Jupyter Notebook
14 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Gallery Walk Final Report
No ratings yet
Gallery Walk Final Report
14 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Task
No ratings yet
Task
5 pages
Math 2016
No ratings yet
Math 2016
12 pages
Aneesha Big Data Project
No ratings yet
Aneesha Big Data Project
4 pages
Sinar Mas Pulping The Planet
No ratings yet
Sinar Mas Pulping The Planet
40 pages
3 - Thermal Energy Storage in District Heating and Cooling Systems A Review
No ratings yet
3 - Thermal Energy Storage in District Heating and Cooling Systems A Review
22 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
Sma Exp4 Ayu
No ratings yet
Sma Exp4 Ayu
6 pages
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
No ratings yet
Pemanfaatan Serat Selulosa ECENG GONDOK (Eichhornia Crassipes) SEBAGAI BAHAN BAKU Pembuatan Kertas: Isolasi Dan Karakterisasi
8 pages
Music Popularity Prediction Through Data Analysis
No ratings yet
Music Popularity Prediction Through Data Analysis
6 pages
Holiday Homework
No ratings yet
Holiday Homework
16 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
AI Tech Agency - by Slidesgo
No ratings yet
AI Tech Agency - by Slidesgo
41 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Big Data Analytics A Spotify Case Study
No ratings yet
Big Data Analytics A Spotify Case Study
9 pages
Year 5 Science Term 1
No ratings yet
Year 5 Science Term 1
42 pages
Spotify Final Research Report
No ratings yet
Spotify Final Research Report
99 pages
Lesson 4 Philippine Forest and Wildlife Resources: By: For. Leslie Sanchez Obiso CTU-Barili
No ratings yet
Lesson 4 Philippine Forest and Wildlife Resources: By: For. Leslie Sanchez Obiso CTU-Barili
12 pages
IP Project Final
50% (2)
IP Project Final
37 pages
Writing 1
No ratings yet
Writing 1
10 pages
Science 9 Q4 SML17 V2
No ratings yet
Science 9 Q4 SML17 V2
15 pages
Traffic Sign Detection and Recognition Using Opencv: Icices2014 - S.A.Engineering College, Chennai, Tamil Nadu, India
No ratings yet
Traffic Sign Detection and Recognition Using Opencv: Icices2014 - S.A.Engineering College, Chennai, Tamil Nadu, India
6 pages
History 7-10 - Sequence of Content
No ratings yet
History 7-10 - Sequence of Content
9 pages
A Hole in Space (1974) by Larry Niven PDF
No ratings yet
A Hole in Space (1974) by Larry Niven PDF
212 pages
Statistical Analysis with R For Dummies
From Everand
Statistical Analysis with R For Dummies
Joseph Schmuller
5/5 (1)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Digital Spectral Analysis MATLAB® Software User Guide
From Everand
Digital Spectral Analysis MATLAB® Software User Guide
S. Lawrence Marple, Jr.
No ratings yet