0% found this document useful (0 votes)
222 views29 pages

Document From Gr7

This document summarizes an exploratory data analysis project on a YouTube dislike dataset: 1. The author imports libraries and reads in a YouTube dislike dataset, retrieving the top and bottom 5 records. 2. An analysis of the dataset is conducted, looking at variables like video title, channel information, publish date, views, likes, dislikes, comments, tags and video descriptions. 3. Visualizations are created of the data using libraries like matplotlib and seaborn to analyze trends and relationships in the YouTube video metrics.

Uploaded by

Gnaneshwar Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views29 pages

Document From Gr7

This document summarizes an exploratory data analysis project on a YouTube dislike dataset: 1. The author imports libraries and reads in a YouTube dislike dataset, retrieving the top and bottom 5 records. 2. An analysis of the dataset is conducted, looking at variables like video title, channel information, publish date, views, likes, dislikes, comments, tags and video descriptions. 3. Visualizations are created of the data using libraries like matplotlib and seaborn to analyze trends and relationships in the YouTube video metrics.

Uploaded by

Gnaneshwar Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Python Project 03 - Exploratory Data Analysis on Youtube data¶

Submitted by : Sriram T.S¶

1. Import required libraries and read the provided dataset (youtube_dislike_dataset.csv) and retrieve
top 5 and bottom 5 records.¶

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:

df = pd.read_csv('youtube_dislike_dataset.csv')

In [3]:

df.head()
Out[3]:

com com
chan chan publi ment
video view_ dislik ment descr
title nel_i nel_ti shed likes tags s
_id count es _cou iption
d tle _at
nt

Resp
footba Enjoy
Jadon ect to
ll the
Sanch UC6U Dortm
socce best
o L29en 2021- und
-- r ftbol skills
Magic LNe4 Bund 07-01 10488 fans,
0 0bCF- 19515 226 1319 alemn and
al mqwT esliga 10:00: 88 must
iK2E Bund goals
Skills fAyeN 00 be
esliga from
& uw sad
seaso Jadon
Goals losing
n ... San...
hi...

Migos
Watch
Avala Migos
Migos the
nche just
- UCGI the
Qualit make
-- Avala elM2 2021- official
y s me
14w5 nche Dj3zz Migos 06-10 15352 35927 video
1 7479 18729 Contr want
SOEU (Offici a3xyV VEVO 16:00: 638 7 for
ol to live
s al 3pL3 00 Migos
Music my
Video WQ -
/Moto live to
) "Aval..
wn th...
.
R...

Hann
ah Hann
Supp
Waddi ah's
orting
UClB ngha energ
Actres
KH8y Televi 2021- m y
-- s in a
ZRcM sion 09-20 92528 wins bursts
2 40TE Come 11212 401 831
4AsRj Acade 01:03: 1 the throug
bZ9Is dy:
DVEdj my 32 Emmy h any
73rd
Mg for scree
Emmy
Supp n.
s
orting. Wel...
..

JO1'Y
JO1'Y young
OUN
OUN PRO Ver><
G
G UCsm DUCE REN
(JO1
(JO1 XiDP8 2021- 101JA is
-- ver.)'
ver.)' S40u 03-03 26415 PAN really
3 4tfbSy JO1 39131 441 3745 PERF
PERF BeJY 10:00: 97 JO1 PERF
YDE ORM
ORM xvyul 17 TheS ECT.
ANCE
ANCE mA TAR It's
VIDE
VIDE STA... not
O\n\
O ju...
n---...
com com
chan chan publi ment
video view_ dislik ment descr
title nel_i nel_ti shed likes tags s
_id count es _cou iption
d tle _at
nt

retaini
ng One Keep
wall of the up
Why UCM
New most with
-- Retai Oqf8a Practi 2021-
Jerse import all my
DKkz ning b- cal 12-07 71572
4 32887 367 1067 y ant projec
WVh- Walls 42UU Engin 13:00: 4
highw (and ts
E Collap QIdVo eering 00
ay innoc here:
se KwjlQ
Direct uous) https:/
Conn part... /pr...
e...

In [4]:

df.tail()

Out[4]:

com com
chan chan publi ment
video view_ dislik ment descr
title nel_i nel_ti shed likes tags s
_id count es _cou iption
d tle _at
nt

Lil
Lil 'DEST
Tjay Officia
Tjay - INED
Stead l
Callin UCEB 2
y video
g My 4a5o_ 2021- WIN'
zzd4y Callin for
Phon 6Kfjx Lil 02-12 12040 21807 OUT
37417 dafG 35871 81360 g My "Callin
e HwN Tjay 05:03: 8275 80 NOW
R0 Phon g My
(feat. Mnmj 49 !!
e Phon
6LAC 54Q https:/
Callin e" by
K) /liltjay.
g My Lil T...
[Off... ln...
Ph...

37418 zziBy PELI UCW NBA 2021- 28419 20759 1049 2624 NBA PELI Montr
beSAt CANS J2lW 01-16 17 G CANS ezl
w at NubAr 05:39: Leagu at Harrel
LAKE HWmf 05 e LAKE l is
com com
chan chan publi ment
video view_ dislik ment descr
title nel_i nel_ti shed likes tags s
_id count es _cou iption
d tle _at
nt

RS | RS |
Baske
FULL FULL going
tball
GAM GAM crazy
game-
E 3FIHb E with
00220
HIGH fcQ HIGH the
00187
LIGH LIGH rebou.
Laker
TS | TS | ..
s...
Ja... Ja...

MAM
AMO [MV]
I
O (MAM
[MV] hones
WAW AMO
(MAM UCuh tly do
WAW O) -
AMO AUML 2021- not
zzk09 MAM MAM Wher
O) - zJxlP 06-02 13346 72085 know
37419 ESX7 AMO 4426 90616 AMO e Are
Wher 1W7 09:00: 678 4 why
e0 O O We
e Are mEk0 10 this
WAW Now\
We _6lA song
Wher n\
Now hit so
e Are nInsta
ha...
We gra...
Now...

maste DOCT Foi


FELLI
r OR um
PE
UC8N maste HAIR\ prazer
ESCU
jnNW Maste 2020- rpodc nhttps passa
zzmQ DER
MsRq r 10-20 25205 ast ://ww r esta
37420 Eb0E O- 19198 1234 1471
q11N Podca 20:59: 7 lord w.the tarde
m5I Maste
YvHA st 30 lord doctor com
r
Qb1g vinhet hair.c vocs
Podca
eiro z om/? debat.
st #12
z ... fb... ..

Garet Two Thank


Spurs
h Bale minut you
Totten
brace e Kane
UCEg ham
secur Totten 2021- highli for
zzxPZ 25rdR Hotsp
es ham 05-23 22520 ghts everyt
37421 waA- ZXg3 34063 868 2004 ur
drama Hotsp 21:00: 90 from hing
8w 2iwai6 Totten
tic ur 31 Totten you
N6l0w ham
come ham have
Leice
back Hotsp given
ster ...
on... ur's... t...

2. Check the info of the dataframe and write your inferences on data types and shape of the dataset.¶
In [8]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37422 entries, 0 to 37421
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 video_id 37422 non-null object
1 title 37422 non-null object
2 channel_id 37422 non-null object
3 channel_title 37422 non-null object
4 published_at 37422 non-null object
5 view_count 37422 non-null int64
6 likes 37422 non-null int64
7 dislikes 37422 non-null int64
8 comment_count 37422 non-null int64
9 tags 37422 non-null object
10 description 37422 non-null object
11 comments 37264 non-null object
dtypes: int64(4), object(8)
memory usage: 3.4+ MB

In [10]:

df.shape

Out[10]:
(37422, 12)

Based on info and shape the following inferences can be made:

1. There are a total of 12 attributes, 8 of which are of object datatype whereas 4 are of int64 type.
2. The comments column is the only column with null values and has 158 of such rows.
3. The total number of rows is 37422 and the total number of columns is 12.

3. Check for the Percentage of the missing values and drop or impute them.¶

In [16]:

(df.isnull().sum()/len(df))*100

Out[16]:

video_id 0.000000
title 0.000000
channel_id 0.000000
channel_title 0.000000
published_at 0.000000
view_count 0.000000
likes 0.000000
dislikes 0.000000
comment_count 0.000000
tags 0.000000
description 0.000000
comments 0.422212
dtype: float64
Since the comments column has only a mere 0.42% of null values, the null rows of this column should
be imputed. Since this is a non - numerical column, i have chosen to impute it with a custom string.

In [21]:

df.comments = df.comments.fillna("This video has no comments")

Now checking to see if the values have been imputed

In [25]:

df.isnull().sum()

Out[25]:

video_id 0
title 0
channel_id 0
channel_title 0
published_at 0
view_count 0
likes 0
dislikes 0
comment_count 0
tags 0
description 0
comments 0
dtype: int64

4. Check the statistical summary of both numerical and categorical columns and write your
inferences.¶

In [28]:

df.describe()

Out[28]:

comment_count
view_count likes dislikes

count 3.742200e+04 3.742200e+04 3.742200e+04 3.742200e+04

mean 5.697838e+06 1.668147e+05 4.989862e+03 9.924930e+03

std 2.426622e+07 5.375670e+05 3.070824e+04 1.171003e+05

min 2.036800e+04 0.000000e+00 0.000000e+00 0.000000e+00

25% 5.122970e+05 1.323350e+04 2.810000e+02 9.000000e+02

50% 1.319078e+06 4.233050e+04 7.960000e+02 2.328000e+03

75% 3.670231e+06 1.304698e+05 2.461750e+03 6.184000e+03

max 1.322797e+09 3.183768e+07 2.397733e+06 1.607103e+07

In [69]:
df.describe(include = 'O').T

Out[69]:

freq
count unique top

video_id 37422 37422 --0bCF-iK2E 1

title 37422 37113 www 21

UCNAf1k0yIjyGu3k
channel_id 37422 10961 533
9BwAg3lg

channel_title 37422 10883 Sky Sports Football 533

tags 37422 28799 3817

description 37422 35630 589

This video has no


comments 37422 37265 158
comments

published_month 37422 12 Oct 4991

Based on the above results the following can be inferred:

1. All of the numerical columns are skewed.


2. view_count column,likes column,dislikes,Comment_count are all positively skewed.
3. Also, based on min and max values for each numerical column, it can inferred that the range for
these columns is large, which suggests there could be outliers but we cannot be certain without
a boxplot.
4. The most frequently appearing channel is Sky Sports Football with a frequency of 533. Also,
most of the videos were published in the month of october, the frequency being 4991.
5. Convert datatype of column published_at from object to pandas datetime.¶

In [36]:

df['published_at'].dtype

Out[36]:

dtype('O')

In [42]:

df['published_at'] = pd.to_datetime(df['published_at'])

In [44]:

df['published_at']

Out[44]:

0 2021-07-01 10:00:00
1 2021-06-10 16:00:00
2 2021-09-20 01:03:32
3 2021-03-03 10:00:17
4 2021-12-07 13:00:00
...
37417 2021-02-12 05:03:49
37418 2021-01-16 05:39:05
37419 2021-06-02 09:00:10
37420 2020-10-20 20:59:30
37421 2021-05-23 21:00:31
Name: published_at, Length: 37422, dtype: datetime64[ns]

6. Create a new column as 'published_month' using the column published_at (display the months
only)¶

In [45]:

df2 = df.copy() ##Creating copy for safety

In [56]:

df['published_month'] = df['published_at'].dt.month

Out[56]:

0 7
1 6
2 9
3 3
4 12
..
37417 2
37418 1
37419 6
37420 10
37421 5
Name: published_at, Length: 37422, dtype: int64

In [58]:

df['published_month'] ##checking

Out[58]:

0 7
1 6
2 9
3 3
4 12
..
37417 2
37418 1
37419 6
37420 10
37421 5
Name: published_month, Length: 37422, dtype: int64

In [59]:

df ##checking to see if it added to the dataframe

Out[59]:
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

footb
Enjoy
Jado all Resp
the
n socc ect to
UC6 best
Sanc er Dort
UL29 skills
ho 2021- ftbol mund
-- enLN Bund and
Magi 07-01 1048 1951 alem fans,
0 0bCF e4mq eslig 226 1319 goals 7
cal 10:00 888 5 n must
-iK2E wTfA a from
Skills :00 Bund be
yeNu Jado
& eslig sad
w n
Goal a losin
San..
s seas g hi...
.
on ...

Migo
Watc Migo
s
h the s just
Migo Avala
the make
s- UCGI nche
offici s me
-- Avala elM2 2021- Quali
Migo al want
14w5 nche Dj3zz 06-10 1535 3592 1872 ty
1 sVEV 7479 video to 6
SOE (Offic a3xy 16:00 2638 77 9 Contr
O for live
Us ial V3pL :00 ol
Migo my
Video 3WQ Musi
s- live
) c/Mot
"Aval. to
own
.. th...
R...

Hann Hann
Supp ah ah's
orting Wad energ
UClB
Actre dingh y
KH8y
ss in Telev 2021- am burst
-- ZRc
a ision 09-20 9252 1121 wins s
2 40TE M4As 401 831 9
Com Acad 01:03 81 2 the throu
bZ9Is RjDV
edy: emy :32 Emm gh
EdjM
73rd y for any
g
Emm Supp scree
ys orting n.
... Wel...

JO1'
JO1' youn
PRO YOU
YOU gVer
DUC NG
NG UCs ><
E101 (JO1
(JO1 mXiD REN
2021- JAPA ver.)'
-- ver.)' P8S4 is
03-03 2641 3913 N PER
3 4tfbS PER 0uBe JO1 441 3745 really 3
10:00 597 1 JO1 FOR
yYDE FOR JYxv PER
:17 TheS MAN
MAN yulm FEC
TAR CE
CE A T. It's
STA.. VIDE
VIDE not
. O\n\
O ju...
n---...
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

retain
ing Keep
One
wall up
UCM of the
Why New with
Oqf8 Practi most
-- Retai 2021- Jerse all
ab- cal impor
DKkz ning 12-07 7157 3288 y my
4 42UU Engin 367 1067 tant 12
WVh- Walls 13:00 24 7 high proje
QIdV eerin (and
E Colla :00 way cts
oKwjl g innoc
pse Direc here:
Q uous)
t https:
part...
Conn //pr...
e...

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Lil Offici
Lil 'DES
Tjay al
Tjay - TINE
UCE Stea video
Callin D2
B4a5 dy for
g My 2021- WIN'
zzd4 o_6K 1204 Callin "Calli
3741 Phon Lil 02-12 2180 3587 8136 OUT
ydaf fjxHw 0827 g My ng 2
7 e Tjay 05:03 780 1 0 NOW
GR0 NMn 5 Phon My
(feat. :49 !!
mj54 e Phon
6LAC https:
Q Callin e" by
K) //liltja
g My Lil
[Off... y.ln...
Ph... T...

NBA
PELI PELI
G
CAN CAN Montr
Leag
S at S at ezl
UCW ue
LAKE LAKE Harre
J2lW Bask
RS | 2021- RS | ll is
zziBy NubA etball
3741 FULL 01-16 2841 2075 FULL going
beSA rHW NBA 1049 2624 game 1
8 GAM 05:39 917 9 GAM crazy
tw mf3FI -
E :05 E with
Hbfc 0022
HIGH HIGH the
Q 0001
LIGH LIGH rebou
87
TS | TS | ...
Laker
Ja... Ja...
s...

3741 zzk0 [MV] UCuh MAM 2021- 1334 7208 4426 9061 MAM [MV] I 6
9 9ESX (MA AUM AMO 06-02 6678 54 6 AMO (MA hone
7e0 MAM LzJxl O 09:00 O MAM stly
OO) - P1W :10 WAW OO) - do
Wher 7mEk WAW Wher not
e Are 0_6lA MAM e Are know
We AMO We why
Now O Now\ this
WAW n\ song
Wher nInst hit so
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

e Are
We agra..
ha...
Now.. .
.

Foi
DOC
FELL mast um
TOR
IPE er praze
UC8 HAIR
ESC mast r
NjnN \
UDE Mast 2020- erpod pass
zzm WMs nhttp
3742 RO- er 10-20 2520 1919 cast ar
QEb0 Rqq1 1234 1471 s://w 10
0 Mast Podc 20:59 57 8 lord esta
Em5I 1NYv ww.th
er ast :30 lord tarde
HAQ edoct
Podc vinhe com
b1g orhair
ast teiro vocs
.com/
#12 z z ... debat
?fb...
...

Garet Spur Two


Than
h s minut
k you
Bale Totte e
UCE Kane
brace nham highli
g25rd Totte 2021- for
zzxP secur Hots ghts
3742 RZXg nham 05-23 2252 3406 every
ZwaA es 868 2004 pur from 5
1 32iw Hots 21:00 090 3 thing
-8w dram Totte Totte
ai6N pur :31 you
atic nham nham
6l0w have
come Leice Hots
given
back ster .. pur's.
t...
on... . ..

37422 rows × 13 columns

7. Replace the numbers in the column published_month as names of the months i,e., 1 as 'Jan', 2 as
'Feb' and so on.....¶

In [62]:

dict_n_m = {1 : 'Jan',2 : 'Feb', 3 : 'Mar', 4 : 'Apr', 5 : 'May', 6: 'Jun',


7: 'Jul', 8 : 'Aug', 9: 'Sep', 10: 'Oct', 11 : 'Nov', 12: 'Dec'}
df['published_month'] = df['published_month'].apply(lambda x : dict_n_m[x]) ##created a dictionary and
mapped numbers
## with months. Used lambda function inside apply to convert the numbers to months via dictionary.

In [64]:

df2 = df.copy() ##Updating df2 for safety then printing df to see the changes
df

Out[64]:

publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

footb
Enjoy
Jado all Resp
the
n socc ect to
UC6 best
Sanc er Dort
UL29 skills
ho 2021- ftbol mund
-- enLN Bund and
Magi 07-01 1048 1951 alem fans,
0 0bCF e4mq eslig 226 1319 goals Jul
cal 10:00 888 5 n must
-iK2E wTfA a from
Skills :00 Bund be
yeNu Jado
& eslig sad
w n
Goal a losin
San..
s seas g hi...
.
on ...

Migo
Watc Migo
s
h the s just
Migo Avala
the make
s- UCGI nche
offici s me
-- Avala elM2 2021- Quali
Migo al want
14w5 nche Dj3zz 06-10 1535 3592 1872 ty
1 sVEV 7479 video to Jun
SOE (Offic a3xy 16:00 2638 77 9 Contr
O for live
Us ial V3pL :00 ol
Migo my
Video 3WQ Musi
s- live
) c/Mot
"Aval. to
own
.. th...
R...
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

Hann Hann
Supp ah ah's
orting Wad energ
UClB
Actre dingh y
KH8y
ss in Telev 2021- am burst
-- ZRc
a ision 09-20 9252 1121 wins s
2 40TE M4As 401 831 Sep
Com Acad 01:03 81 2 the throu
bZ9Is RjDV
edy: emy :32 Emm gh
EdjM
73rd y for any
g
Emm Supp scree
ys orting n.
... Wel...

JO1'
JO1' youn
PRO YOU
YOU gVer
DUC NG
NG UCs ><
E101 (JO1
(JO1 mXiD REN
2021- JAPA ver.)'
-- ver.)' P8S4 is
03-03 2641 3913 N PER
3 4tfbS PER 0uBe JO1 441 3745 really Mar
10:00 597 1 JO1 FOR
yYDE FOR JYxv PER
:17 TheS MAN
MAN yulm FEC
TAR CE
CE A T. It's
STA.. VIDE
VIDE not
. O\n\
O ju...
n---...

retain
ing Keep
One
wall up
UCM of the
Why New with
Oqf8 Practi most
-- Retai 2021- Jerse all
ab- cal impor
DKkz ning 12-07 7157 3288 y my
4 42UU Engin 367 1067 tant Dec
WVh- Walls 13:00 24 7 high proje
QIdV eerin (and
E Colla :00 way cts
oKwjl g innoc
pse Direc here:
Q uous)
t https:
part...
Conn //pr...
e...

... ... ... ... ... ... ... ... ... ... ... ... ... ...

3741 zzd4 Lil UCE Lil 2021- 1204 2180 3587 8136 Lil Offici 'DES Feb
7 ydaf Tjay - B4a5 Tjay 02-12 0827 780 1 0 Tjay al TINE
GR0 Callin o_6K 05:03 5 Stea video D2
g My fjxHw :49 dy for WIN'
Phon NMn Callin "Calli OUT
e mj54 g My ng NOW
(feat. Q Phon My !!
6LAC e Phon https:
K) Callin e" by //liltja
[Off... g My Lil
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

Ph... T... y.ln...

NBA
PELI PELI
G
CAN CAN Montr
Leag
S at S at ezl
UCW ue
LAKE LAKE Harre
J2lW Bask
RS | 2021- RS | ll is
zziBy NubA etball
3741 FULL 01-16 2841 2075 FULL going
beSA rHW NBA 1049 2624 game Jan
8 GAM 05:39 917 9 GAM crazy
tw mf3FI -
E :05 E with
Hbfc 0022
HIGH HIGH the
Q 0001
LIGH LIGH rebou
87
TS | TS | ...
Laker
Ja... Ja...
s...

MAM
AMO [MV]
I
O (MA
hone
[MV] WAW MAM
stly
(MA UCuh WAW OO) -
do
MAM AUM 2021- MAM Wher
zzk0 MAM not
3741 OO) - LzJxl 06-02 1334 7208 9061 AMO e Are
9ESX AMO 4426 know Jun
9 Wher P1W 09:00 6678 54 6 O We
7e0 O why
e Are 7mEk :10 WAW Now\
this
We 0_6lA Wher n\
song
Now e Are nInst
hit so
We agra..
ha...
Now.. .
.

Foi
DOC
FELL mast um
TOR
IPE er praze
UC8 HAIR
ESC mast r
NjnN \
UDE Mast 2020- erpod pass
zzm WMs nhttp
3742 RO- er 10-20 2520 1919 cast ar
QEb0 Rqq1 1234 1471 s://w Oct
0 Mast Podc 20:59 57 8 lord esta
Em5I 1NYv ww.th
er ast :30 lord tarde
HAQ edoct
Podc vinhe com
b1g orhair
ast teiro vocs
.com/
#12 z z ... debat
?fb...
...

3742 zzxP Garet UCE Totte 2021- 2252 3406 868 2004 Spur Two Than May
1 ZwaA h g25rd nham 05-23 090 3 s minut k you
-8w Bale RZXg Hots 21:00 Totte e Kane
brace 32iw pur :31 nham highli for
secur ai6N Hots ghts every
es 6l0w pur from thing
dram Totte Totte you
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

atic nham nham


have
come Leice Hots
given
back ster .. pur's.
t...
on... . ..

37422 rows × 13 columns

8. Find the number of videos published each month and arrange the months in a decreasing order
based on the video count.¶

In [78]:

df.groupby(['published_month'])['video_id'].count().sort_values(ascending = False)

Out[78]:

published_month
Oct 4991
Sep 4880
Nov 4851
Aug 4262
Dec 3072
Jul 2340
Jun 2316
Mar 2258
Feb 2137
Apr 2126
Jan 2108
May 2081
Name: video_id, dtype: int64

9. Find the count of unique video_id, channel_id and channel_title.¶

In [83]:

print(df['video_id'].nunique())
print(df['channel_id'].nunique())
print(df['channel_title'].nunique())

37422
10961
10883

10. Find the top10 channel names having the highest number of videos in the dataset and the
bottom10 having lowest number of videos.¶

In [86]:

df.groupby(['channel_title'])['video_id'].count().sort_values(ascending = False).head(10) ##Displaying top 10


most frequently
## appearing channels.
Out[86]:

channel_title
Sky Sports Football 533
The United Stand 301
BT Sport 246
NBA 209
NFL 162
WWE 122
SSSniperWolf 99
SSundee 98
FORMULA 1 87
NHL 86
Name: video_id, dtype: int64

In [91]:

df.groupby(['channel_title'])['video_id'].count().sort_values().head(10) ## arranging in ascending and using


head

Out[91]:

channel_title
SilverName 1
Mini Muka 1
Mini Ladd 1
MindYourLogic 1
Mind Body Tonic With Dr Sita 1
Mimi Ar 1
Millyz 1
Milkair 1
Milissa Grande 1
MikuruSong 1
Name: video_id, dtype: int64
In [92]:

df.groupby(['channel_title'])['video_id'].count().sort_values(ascending = False).tail(10)## arranging in


descending and using tail

Out[92]:

channel_title
Karchez 1
Karate Combat 1
Kaptain Kuba 1
Kanye West 1
Kannur kitchen 1
Kannada Cinema 1
KanalD 1
Kanak News 1
Kamille Ramos 1
zoom 1
Name: video_id, dtype: int64

Since there are more than one channels with a minimum of 1, we are getting multiple answers.

11. Find the title of the video which has the maximum number of likes and the title of the video having
minimum likes and write your inferences.¶

In [111]:

df[df['likes']==df['likes'].max()].title
Out[111]:

26143 BTS () 'Dynamite' Official MV


Name: title, dtype: object

In [109]:

df[df['likes']==df['likes'].min()].title

Out[109]:

18654 Kim Kardashian's Must-See Moments on "Saturday...


Name: title, dtype: object

From the above it can be inferred that music videos usually tend to get a high number of likes
compared to other forms of videos on youtube and are in general more entertaining.

12. Find the title of the video which has the maximum number of dislikes and the title of the video
having minimum dislikes and write your inferences.¶

In [119]:

df[df['dislikes']==df['dislikes'].max()].title
Out[119]:

13591 Cuties | Official Trailer | Netflix


Name: title, dtype: object

In [117]:

df[df['dislikes']==df['dislikes'].min()].title

Out[117]:

18654 Kim Kardashian's Must-See Moments on "Saturday...


Name: title, dtype: object

From the above, it can be inferred that the trailer for the movie/series titled "Cuties" was not liked by
people in general.

13. Does the number of views have any effect on how many people disliked the video? Support your
answer with a metric and a plot.¶

In [120]:

dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True)


C:\Users\welcome\AppData\Local\Temp\ipykernel_3120\456078719.py:1: FutureWarning: The default value of
numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns
or specify the value of numeric_only to silence this warning.
dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True)

In [121]:

sns.scatterplot(data=df, x="view_count", y="dislikes")

Out[121]:

<Axes: xlabel='view_count', ylabel='dislikes'>

From the above heatmap, it can be inferred that there is a positive correlation between the view
counts and dislikes with a value of 0.68. This means that with an increase in view count, dislikes
generally increase although the relationship is not strictly linear. This can also be verified from the
scatter plot between the 2 variables.

14. Display all the information about the videos that were published in January, and mention the count
of videos that were published in January.¶

In [124]:
df[df['published_month']=='Jan']

Out[124]:

publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

Hey
I feel
Guys
UCY like
!!!
wNM Nate
this
- Q&A bogQ 2021- and
Mian has
2Gw With FzMc 01-21 8723 3862 Aisha
27 Twin 239 621 been Jan
m7Qf Naish cPSu 00:05 72 6 s
s the
BnE a y- :47 perso
most
pPW nality
reque
g matc
sted
h ...
v...

SUR He
Than
PRIS had
UCP k you
ING no
pATK guys
BRE idea!
- qmM 2021- for
NT Alexa Than
4sfX V- 01-16 6504 2624 watc
48 WITH River 5779 7907 k you Jan
SHSx CNR 21:40 784 77 hing
HIS a guys
zA NWY :04 and
TIKT so
aDU don't
OK much
wiA forget
CRU for
t...
SH!! wat...

WE I am
Okay
ARE so
I
HAVI UCV happ
need
NG A sTbo y to
- 2021- ed a
BAB Ahpn Tess tell
AJD1 01-03 5330 3896 mom
95 Y! | uL6j- Chris 119 1650 you Jan
Fc5rp 21:53 84 5 ent to
findin tDeP tine that I
Q :48 collec
g out vNw am
t my
i'm Q pregn
thoug
pregn ant!!..
hts ...
a... .

103 - Do UCzp Good 2021- 1057 2252 531 773 gmm Toda "ther Jan
AuJi Ugly Cc5n Mythi 01-19 077 6 good y, e's
wjsm Food 9hqiV cal 11:00 mythi we're nothi
Wk s C7Hh MOR :01 cal doing ng
Taste PwcI E morni a wron
Wors KEg ng blind g
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

rhetta taste
with it
e? ndlin test
being
Taste k to
bent"\
Test rhett deter.
nI ...
a... ..

Schla
jschla
#jschl tt is
tt big
UCW att singl
guy
- Schla Zp4y 2021- #schl e
jschla
JhqO tt 1jqBu Big 01-24 1724 1194 att hand
182 325 1578 tt Jan
2KWr gets vLtiyx guy 22:50 965 31 #bigg edly
highli
5U fit Ss_Z :57 uy wipin
ghts
Bw #shor g out
schla
t all
tt j...
the ...

... ... ... ... ... ... ... ... ... ... ... ... ... ...

DEVI
NE Piwer
Mdr
MON re Disco
mich
PER frere rd
ou
SON UCIlr de Piwer
quan
NAG 3byh 2021- mich re :
zmzF d
3730 E 6wm Piwer 01-16 6703 5446 ou https:
L5bG 832 1249 c'est Jan
0 AVA XgcP re 16:12 57 2 crout //disc
-jc pas
NT x_Tm :19 on ord.g
ses
AKIN 9Ocw amon g/QB
tourn
ATO g us duPg
age il
R! devin AA...
fou...
(c'est ...
...

Lamp
Prem
ard The
ier True
Sack Kick
leagu Its
ed UCk Off
e like a
Withi D- watc
2021- Chels fighte
zpzje n ZOixI The hed
3732 01-03 4286 1206 ea r who
x7qw Days 0a9Fj Kick 296 1505 Manc Jan
9 20:13 46 0 chels Geor
rA Rory IExD Off heste
:49 ea 1- die
s sHsb r City
3 But I
Miser g destr
Man thou..
y| oy
City .
Chel. C...
Ch...
..

3734 zqyv- Lil UCO Lil 2021- 2238 5892 2365 5539 lil Offici RIP Jan
5 B6m Wayn 9zJy Wayn 01-21 244 5 wayn al Juice
publi
com shed
chan chan publi view desc com
vide dislik ment _mo
title nel_i nel_t shed _cou likes tags riptio ment
o_id es _cou nth
d itle _at nt n s
nt

e
weez
y audio Wrld,
e- weez for Lil wrote
Ain't 7HWr y Wayn that
Got IS3oj 05:00 wedn e on
nBM e
Time B4Lr :10 esda "Ain't two
(Audi 7Yqw y Got cups,
o) wayn Time" pour
e ,... o...
carter
y...

PRA
The
DA Amer
openi
Cup ica's Incre
ng
Day UCo1 Cup dible
day
1| 5ZY Amer how
zwfu 2021- of the
Full O_X Amer icas these
3738 1- 01-15 3173 PRA
Race DRU ica's 2008 83 192 Cup boats Jan
3 24T7 04:07 82 DA
Repl 9LI30 Cup AC36 evolv
Q :55 Cup
ay | OPtx AC75 e in a
in
PRA Ag Pres short
Auckl
DA ented t...
and, .
Cup.. ...
..
.

NBA
PELI PELI
G
CAN CAN Montr
Leag
S at S at ezl
UCW ue
LAKE LAKE Harre
J2lW Bask
RS | 2021- RS | ll is
zziBy NubA etball
3741 FULL 01-16 2841 2075 FULL going
beSA rHW NBA 1049 2624 game Jan
8 GAM 05:39 917 9 GAM crazy
tw mf3FI -
E :05 E with
Hbfc 0022
HIGH HIGH the
Q 0001
LIGH LIGH rebou
87
TS | TS | ...
Laker
Ja... Ja...
s...

2108 rows × 13 columns

In [125]:
df[df['published_month']=='Jan'].count()

Out[125]:

video_id 2108
title 2108
channel_id 2108
channel_title 2108
published_at 2108
view_count 2108
likes 2108
dislikes 2108
comment_count 2108
tags 2108
description 2108
comments 2108
published_month 2108
dtype: int64

From the above, the count of videos published in january is 2108.

You might also like