0% found this document useful (0 votes)
4 views29 pages

NoteGPT_Advanced SQL Project _ Netflix Data Analysis Using SQL (Guided) - Portfolio Series #4_10 - Datasets

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views29 pages

NoteGPT_Advanced SQL Project _ Netflix Data Analysis Using SQL (Guided) - Portfolio Series #4_10 - Datasets

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 29

00:00:00

hello everyone welcome back to our YouTube channnel Jero analyst this is the
project four of our 10 SQL series thank you so much for showing so much love to all
my last three project so let's see the project that we will build today so by end
of this video you will publish your project like this in your GitHub repository
which I'm going to show you step by step so let's see the task that you will do in
this project now you can see this is a Netflix movies and TV shows and data set we
will

00:00:20
download this data set from keegel I'm going to show you from where you will be
getting this and data set I can see we have added a nice logo in our project which
I'm going to show you in the end how you will add this logo and and uh you can
write the overview about the project so you can read the overview about the project
here you can see the project objective that is mentioned here okay and you can see
the data set so to get the data set you just need to click on this link that will
bring you to the

00:00:43
Kegel website from where you can get this data set this is the data set I have used
in this project which you'll be using in this project so now let's see what else
you will do so once you download the data set from k.com we will first explore the
data in Excel then we will create the table and set up the database and import the
data in our postgress SQL once you import the data we will start exploring the data
set and we will solve 15 business problems so these are the 15 business problem
that

00:01:07
you will solve on this data set you can see the problem statement here so I have
added medium to advanced level questions in this project so you can see all this
question that we will be solving okay now all these questions you can see I have
included here so at the end of this video I'm going to show you how you will
publish your project completely like this where you will add all the data set and
you will add your SQL queries you can see my SQL queries here if I just click on
this link it will take me to

00:01:31
this business problem statement you can see it here and if I click on my Solutions
which will take me to another SQL query which is which is where I have mentioned
all this problem that I have solved in this project you can see it here so same way
you will publish your project which I'm going to show you at the end of this video
so without any further delay let's start the video so let's go to kel.com and
download this data set now you can see I'm on the kel.com webbsite and this is the
data

00:01:53
set we are talking about so the data set so the data set name is Netflix movies and
TV shows and this is the data set I'm going to add this Link in the video
description as well so first you just need to click on this download button once
you click on this download button you can see the data set started downloaded in my
screen so you can see this is where it got downloaded so it is a Gip file I will
just unzip it and I'll copy the file in desktop okay so now you can see I have
copied the file in
00:02:19
desktop so let me open the file in Excel so first we will try to explore the data
set and we will see what the things we have in the data set and then we will create
a database and set up database and create table in Port this data into postgress
SQL okay now if you're using MySQL I would recommend you to use pogress SQL if you
do not know how to set up pogress SQL if you do not know how to set up postgress
SQL in your machine I have added a video in the video descriptions which you can go
through step by step and you can install

00:02:45
postgress SQL in your system you can use almost the same syntax that you have been
using in MySQL with pogress SQL as well okay so now let's see the data set this
seems like these these are the shows ID which is like the movies ID or maybe the
series id you can see this is seems to be ID column and we can see the data set
that total we have which is 8808 so that means total we have 8,800 of rows and data
in this data set and these are the columns you can see we have type in this type
seems to be we

00:03:16
have movies and we have a TV shows this TV shows are web series then we have title
so this is the title columns this is the director columns this is the cast column
so if you double click on this title column you can see this the title this seems
to be really really long title so some of the text are really really long here so
we will import as it is we are not going to clean this data and we will import as
it is and then we will try to solve the business problem so let's see the director
column in this

00:03:45
this is the director column you can see and uh in the director column again I can
double click on it so to see if in case we have multiple data right so you can see
in the director seems also be very very large data set very large characters in
this column so let's see next column that is going to be cast column and in this
cast column also we have a really large text file you can see and one thing you
will see that there are multiple cast members and names are mentioned by separating
by comma you can see it here we have the

00:04:16
country column so in the country column we can see the country informations where
this web series or movies was released you can see it here so again we can see some
multiple countries name here as well you can double click on it to you know you
know it's going to autoit in this Excel so we can see in some of the cell we have
multiple countries so here you can see this data set so let's first make it bigger
so you can see here it has United States Ghana and all these countries where this

00:04:44
movies or web series was released so this is the date added column so you can see
it here this is the date added column and then we have this is the release year
column so which year this movies or webite was are released you can see it here so
this is the rating column and this is the colum that has durations like how long
the web series or how long these movies are you can see it here so then we have
something called listed in which is basically the genre okay we have descriptions
okay so in this descriptions this is again really
00:05:15
really large you can see because the description has mentioned about the film about
the web series okay so this is all everything that we have here okay now let's go
ahead and create the table and then insert the this data set into this postgress so
first we will need to set up the database I'm just going to cancel everything for
now I will just relaunch it so let me just relaunch it and I will just need all
this column name because I will be creating a table to import this data as it is
okay now if you see the

00:05:45
record total we have 8808 records so this is this is the toal records we have you
can see at the end so it is 8808 after importing the data we will verify if you if
we have this 8808 records or not okay so so let's first set up the database and I'm
going to use postgress SQL for this project if you're using MySQL you can use MySQL
but there is a chance that you may experience some issues while importing the data
so I'm going to recommend you to use pogress SQL for this project you can check my

00:06:13
video in the video descriptions how to install pogress SQL in your computer so
let's start now I'm going to launch this applications and I'm going to cancel
everything whatever I have at the moment so I'm going to set up a database now I
already have set up a database now I'm going to set up a new database so the tool
I'm using that is called PG admin 4 which is a pogress SQL management tool same as
like SQL workbench we have so I'm going to just click on this server

00:06:38
and from the server I'm going to click on this postgress SQL 16 so you may have
this pogress SQL 16 same as like it so you just need to go to this in database and
click on this xt0 once you click on this xt0 you will see all the database
available so you just need to right click on it need to click on create database
and you can give any database name so I'm just going to give it called Netflix it
should start with text I would say Netflix do DB1 now you can give any other name
so for me I'm going to use Netflix one

00:07:04
because I have already Netflix DP okay so you don't have to do anything else just
click on Save now once you click on save you will see the database has created
already if you don't see the database you can just right click and refresh on it
and you see the database will be here so for us this is the database Netflix DB1
let's click on this query tool now to launch a query where we will write you know
our SQL queries you need to select the database go to this play icon and or you can
use the

00:07:28
shortcut as well and then once you click on it you will see a query tool will open
and without query tool you will see something like this okay you cannot click
anywhere else so select the database go to query tool and the query tool will be
open in front of your screen so here we will write our queries and we will then you
know start importing the data so let's make it Zoom a little bit and I'm just going
to hire everything for now now here we want to First create our table right so if
you
00:07:53
want to verify we do not have any table at the moment so I can just go to Netflix
DB1 and you can see schemas and in the schemas we are in the public and in the
public we are in the tables so which is public and you will see something called
tables okay you can see at the moment we don't have any tables so we will first
create the table then you will see the table here okay now to verify you are on the
right database or not you can see it here we are inside Netflix DB and this is the
postgress

00:08:19
which is my user and this is the server and this is the server name okay so first
we will create a table so I'm going to basically start with writing a comments
called Netflix project okay now you can write any name so I'm just going to write
it and I would say create table and what is the table name I'm going to name it
called Netflix okay and what are the columns I will need to define the column
inside this bracket right so I have copied all the column name from the data set
which

00:08:52
is from my Excel so you can see First Column is going to be show ID then the column
is going to be type then it's going to be title then it's going to be director then
it's going to be to be cast then it's going to be country okay then it's going to
be date edit then it's going to be release ear and then it's going to be rating and
we have durations we have listed in and descriptions so these are the columns we
have now each column should be separated by a comma so first of all let me just

00:09:19
select all this column and let's move to right one in additions right okay now now
here we need to for each column we need to give their data types okay so this is
very very important so let's get the data types so if you see in excels we will be
able to see the data types right so if you see the First Column that's going to be
show ID so the data type seems to be characters you can see now you can see we have
text and numbers so we will need to consider this as a characters so that is going
to be verar

00:09:47
in postgress SQL if you double click on it you will see the maximum with maximum
length of this character so seems like it's less than you know maybe 10 but we
we'll be defining it as a five okay if you want to verify it you see the max
character that we have which is which is s 8807 that means total we have five
characters now I can Define it as a 10 or seven so I'm going to say ver car and
here I'm going to Define it five or I can Define it six okay so that is fine this
is my first column which is called

00:10:14
show ID and then the data type is bar car and I'm defining the limit as a six if
you define less you will be getting error while we will import the data set now
let's see the next one which is called type so for the type again you can double
click on it and you can see still this seems to be same but we can Define it as a
10 bear okay so I can go ahead and say that for this one be car 10 okay so for the
type showing it's a SQL properties just the reason you can see it is highlighted in
red colors so
00:10:42
you should change this name to something else okay so for now I'm going to keep as
a type only but uh for this one I will be changing it with something else okay so
verar 10 that is fine now let's go ahead and see for the title so this is the title
the title seems to be very very long which you have already checked earlier if you
just show you like this you can see it seems to be very long so we can verify it
okay to verify it I can just click one of the cell maybe here and I can just go
ahead and say

00:11:13
Max okay and I would say Len and I would just need to select this column and then I
will see the maximum length of this title okay you can see it's 104 so maximum
length of one of the row is that is 104 okay we have used simple Max L this
functions okay in excels so for this one the title the data types I would Define it
more than 104 so I'm going to Define it as a 150 okay so this is how you define the
data types okay so for directors again I need to check first let me just make sure
delete it and whatever I had it

00:11:49
here I will just put it back now for the director let me check again so for the
director again we will be checking with Max Len so so what I can do is that I can
put like Max X and I can say Len and I can just select this column okay so it's
like we have lot of null values so it's not going to be a right option so what I
will do is that I would just select this column and uh here I will put instead of
D2 I'm going to put D 88 08 I think this was the range so if I just close it you
see I have 208 so the

00:12:25
maximum length for this IND director is 208 okay so I'm going to Define for the
director it's 208 so 208 for the director now let's check for the cast again for
the cast it's going to be the long text because all this is a text and data type so
what we will do I will just check outside okay that's going to be I will check it
here for cast okay so let me just say cast here and I would say Max Len and let's
check for this column which is called cast so it's E2 start with E2 end with e

00:13:05
8808 okay so that's how I select everything so it's 771 so that seems to be very
very large data types I'm going to select be car and 7750 okay maybe I would select
1,000 okay now let's check for the country so if we just double click on this
country column you will see we again have the large data types here so for now I
can again check the same so for Country the column name is f i will just change
this e to F okay so it's 123 I'm going to Define it as 150 for the country so
country seems to be fine now

00:13:43
let's check for the date edit so the date edit usually it is a date data type but
you can see this data is in text format like we have a different shape here right
we have the month name then we have the day and then we have comma and then we have
the year so this standard SQL is not going to accept it so we'll Define this data
as of now as verar only so if you see limit is going to be somewhere around maybe
50 so I'm going to Define it as a you know 50 here okay for the date edit so that's
going
00:14:09
to be 50 now for the release here this seems to be integer you can see so I will
Define it as a is integer okay now for rating if you see the rating is again verar
so I would Define it as verar 10 so that should be under 10 only let's see for
durations so for durations again the seems to be V car so I would Define it as a 10
or 15 okay now for listed in so this is the genra so I'm going to Define it like 50
okay so where car maybe 25 okay so let's see if in case we get error for this
listed in so

00:14:48
descriptions I will check for the descriptions again that's going to be for l so I
will just change this F to l so it's 250 so the description seems to be showing as
a 250 okay so that's for L that's correct so I will just Define it as a 250 okay
250 okay so this is fine now intentionally I kept this listed in in lower than what
range we have so we may get error for this okay now I have defined it as only 25
characters let's see if we get error and how we fix it okay so now I'm just going
to run this

00:15:28
query this this is going to create a table so let's first fix this syntax error so
it's a create table so I have entered this create table twice let me was delete one
now you can see we have the right syntax let's run it and now you can see for this
cast we are getting error because this is a part of SQL properties so I'm just
going to change it with cost so cost along with t I'm just adding s here okay now
let's execute this and you can see this ran successfully that means the table is
now

00:15:57
created you can right click and refresh on it you will see Netflix table so just
make sure that you are inside this database which is Netflix and DB1 so you see to
verify it you can just go to this the database and if you don't see your database
just right click on it Refresh on it you will see your database extend it and uh
then you will go to this public from the public go to this tables and from the
public go to schemas and tables from the tables you will see the table here just
make sure you right

00:16:24
right click and refresh on it and you see the table has created which is Netflix
okay now in this table we are going to import the data to import the data you just
need to click on this table right click and uh then you click select this import
export options and just make sure you select this import and in this options you
need to go to options and make sure this header is enabl because our data has a
header so you need to select this header is enable we don't have to touch anything
else then in general just select the file

00:16:51
path or just click this folder icon and select your file so we have downloaded the
data set in this desktop so I'm just going to select desktop and press okay okay
now you see I'm getting error so it seems like this you know import fail due to
some reason let's verify it we can click on this icon and then we will see the
reason so start reading from the end so you can say value two long for the
character varying 25 so we have defined something 25 some column for that column
it's saying the value is too long so if
00:17:19
you want to see more you can read this line you see this context copy Netflix line
three column listed okay the column name is listed we can see it here so call
listed and in the listed it is trying to import this long text but it seems this
text is more than 25 characters right so for this we first need to increase the
length for this uh you know column which is called listed in and then we will be
able to import the data okay so to do that what I can do is that I can go back here
and uh I can either write alter command or I can

00:17:50
just remodify this one so instead of 25 I'm going to say 55 okay or maybe I will
just check I think listed in I will just check what is the maximum length we have
for the listed in so we can check it here so it's K or k let's check so I will just
check the maximum length to ensure that it don't get the error 79 so we can Define
it as a 150 or 100 okay so listed in 100 now I need to delete the table completely
then I will create the table again so you can write something called drop table

00:18:27
if if exist so this is going to delete the table okay this command make sure you
add the semicolon here so this line is going to delete the table and this line is
going to create the table now you can run in one go so you can just click on this
play icon and this is going to you can see execute it so now the table has created
now make sure you right click on it and refresh on it okay now you can see we have
a new table so this table has a correct limit if you want to verify you can just
say select star from

00:18:57
Netflix and he will be able to see the actual table with the data types and the
limits okay now you can see for listed in at the moment the limit we have which is
100 text right 100 characters so now this time just right click on it again Refresh
on it just to make sure refresh right click import and then once you see select
general just select the data it's correctly selected make sure header is enabled
click okay and you see a two popup one is showing process started one is showing
process completed that means

00:19:28
the process is is completed at the moment okay so now let's see the data to see the
data you can just select this and uh click this play icon or you can use the short
card as well for me you can see this is F5 so for you it may be something else okay
now I can just minimize everything else from here so let me minimize everything and
uh you can see the now table is created right so we can explore the data set and we
can see the table so First Column we have which is show ID then we have type

00:19:57
which is movies and TV show you can see it here and we have title we have director
name so we can see some of the null values we have in director name so we will keep
it for now because again maybe for some movies there is no director name at the
moment okay maybe it is deleted now in the cast you can see a lot of nulls are
there so that is again fine because there is there's a possibility that some of the
cast names are deleted you can see it here now you can see here each cast member
are separated by commas so in this column we
00:20:25
have multiple cast members here so these are the actors you can see who perform in
the movies or the web series okay then we have country columns where we have single
countries and multiple countries you can see separated by commas here and we have
date edit so here we have the date in the same format which is you can see it month
then day then year and then we have release year then we have rating that is you
can see it here and we have durations which is here now one thing you will see for
the

00:20:54
movies we have the durations for the web series we don't have the durations in time
but we have something like Seasons like one season two seasons okay something like
that so that is fine because how do you check if you see the second record that is
showing as web series the first one is showing 90 minute so if you want to check
their types you can go to the First Column second columns you can see first one is
movies the second one is web series second all of this right till here so you will
see all of this you will have

00:21:20
the season right numbers so you can see all of they have the seasons okay so each
web series is having the seasons and each film have this minute okay as a durations
then we have listed which is the genre column so again this also has a data which
is separated by comma you can see it here right so we have multiple genre the same
film fall into multiple genres you can see it here okay then we have the
descriptions which has the informations about the film right so maybe the short
story about the film you

00:21:50
can see it mentioned here okay all right so that is it the data set now let's see
the total counts of this data to verify if we have imported all the data correctly
or not okay so to do that I can just copy it and uh maybe add a line of code here
and I would just use count of Star as total content okay so this is going to return
me the total content so let me just run this you see the toal content that is
showing as 8807 now you can see the toal content is showing as 8807 now we have
imported

00:22:31
correctly all the records you can see it here okay so this is the schemas that we
have created now let's go ahead and kind of see how many different types of movies
we have or different types of content we have right to do that I can simply run
this query to select all the columns and then we can see the distinct type okay so
we want to see how many different types of you know content we have in Netflix
right so for that we can simply say distinct and we can say type right so this is
going to show us the

00:23:00
distinct type of the content right so we have movies and we have the TV shows that
you can see it here so this is one of the findings let's do some more analysis then
we will solve 15 business problem so say select everything from Netflix again you
can explore by yourself you know you may come up with some other problems so now
you can see we can see the different titles again we can see distinct directors we
can see how many nulls are there okay so all these columns are there you can
explore
00:23:30
and you can come up with your own queries and business problems now for me I have
explored this data set and I have come up with 15 business problems which we will
be solving in this project which we will be including in this project okay so let
me first import those 15 problems here okay so these are the 15 business problems
which I'm going to provide to you so which you can include and uh you can solve
this 15 business problem or you can do some more business problems as well by
yourself okay so let

00:23:56
me write here maybe 15 business problems okay so this is our main analysis going to
be right so these are the problem we will solve so you can see I have added 15
business problems so let's explore one by one and then let's solve one by one right
first see the question number one okay so the problem number one says count the
number of movies and the number of TV shows we have right so first let me just
convert into as a text so now here I can write my Solutions okay so which I can
copy paste in my GitHub repository at the

00:24:34
later point so now we need to find out the total number of film which is uh movies
and the TV shows like web series we have right so we can before solving this we can
simply have a look on the data set again so we can say select start from Netflix
which is our table name let's see the data again so then we can build our logic
okay so we can see we have a column called type okay we need to find out each type
and their number number of count that we have right so that means we can do a group
by

00:25:01
by this type column and we can do a count on this show ID or we can say count of
Star right so that is how we will find out how many movies we have and how many TV
shows we have right so here what I will do is that I will simply select that type
First Column and then I would say count oops let me just type it here then I would
say count count of star or show ID okay so this star is going to count the total
records for each type okay now then we need to do a group by by type so I would say
Group by by type okay and this one I can

00:25:36
save it as toal content okay so this is the problem number one let's just execute
it now you can see it here we have different types and the number of total content
we have so for movies we have 6,131 and the TP shows we have 26 76 so this is the
problem number one solution now let's see the problem number two find the most
common rating for the movies and the TV shows so we have two category one is movie
and one is TV shows we need to find out what is the maximum rating that is given to
this

00:26:10
movie category and what is the maximum rating that is given to this TV shows right
that is what we need to find out now if you go to this rating column we have a
column called rating that is here so it doesn't seems like it's a number so we
cannot use Min or Max right so it seems this is a text okay so here you can see we
have a text so we need to find out for each category what is the maximum which text
is given maximum time right so that is what we will you know do here so for that
I'm simply going to

00:26:40
use Simple approach called Group by along with the window functions so I'm going to
select the type here so I have like the type so I'm going to select first type and
then I'm going to select the second column which is going to be rating okay so let
me first run this query so you can you can see we have types and the ratings that
means for each different types you can see difference ratings are given here okay
now I need to find out for each category for like we have two category one is

00:27:10
movie and one is TV show so which rating is given the most okay so usually if you
if I do it here if I just do a group by okay if I do a group by by the first and uh
then second column and I can just do a wrting here and I can do a Max here and if I
just hide this one here and I put this rating in this Max okay so usually this
Returns the maximum rating each category and the maximum rating okay so you will be
getting the result but that's not going to be correct here so the reason is that
the rating is

00:27:44
string here you can see right in string you cannot use the max functions so you can
see it here for movie we are getting this is the rating that is showing as Max and
for the TV shows this is the reting that is given most of the you know for most of
the film or most of the you know shows okay but this is not correct which I'm going
to show you now okay why because this is a text you cannot use a Max function on
text or a string okay now to solve that what we can do is that we can do a group by
by

00:28:09
this type and then inside this type we can do a group by by the rating and then we
can check their frequency like for each type and for each rating how many times
were given right for each type so that is what I will do so here what I will do is
that I'm going to use something called count count of star or you can just put
count of show show ID and I will do a group by by type and the rating so I would
say Group by by one one means the First Column then I'm saying Group by by two
which is a rating and then I'm

00:28:37
saying count okay so this is going to give me each rating you can see each type and
their different types of rating and the total count you can see this for TV shows
for this rating was given 174 time okay here you can just do order by by three in
descending and you will see the highest rating that is this one okay that that was
given to this movies and then we have this one which is given to this right so we
kind of have the result now what we need is that for each type we need one rating
that was given the

00:29:10
most time okay so for that you can see here if I just do a order by first of all by
here type and then I do a order by by this rating which is going to be three so I
will have the each type and they rating the data organized and the highest rated
you know the rating okay so you can see for movies this is the rating which is tbma
that was given 262 time okay so this is what I need now if you go to the second
section that will be like TV shows in the TV shows this is the rating that was
given highest which is tbma

00:29:44
okay so for both we have the same type of rating okay that was given the most so
what I can do is that I can simply use a window function to select this and that
result so for that you can simply go ahead and say rank here so we'll be using a
rank window functions and I can say over okay now inside the over you can simply
say that Partition by type so in each type we want to find out the highest rating
right so we can say partition Partition by type okay and we need to do order by
order by this count in descending

00:30:21
okay so this one I can save it as ring ranking okay so let's go ahead and execute
it so now you see for each type we'll be G giving ranking one with the highest
rated you know rating the you know row so that is this one and similarly for this
second type we will again give the this run that is the Run ranking one okay this
is the rank function does right so we can just execute it and you can see the
ranking and I can just ignore this order by for now because I got R Done order by
inside

00:30:59
this so I can just execute it again now you see for movie this 84 minute that was
that has the highest uh you know rating that is this one which is one count one
okay now again we need to do count descending here okay Miss descending that's the
reason we are not getting the right ranking okay now you can see from M TBM that
has got this rating this time like 2062 time so this is for movies this is the
highest rated uh you know rating that is tvma and for again for the TV shows that's
tvma okay

00:31:34
so this is the rating that was given most of the time for TV shows as well and the
for movies as well now we can simply use a soft query to select these two things
okay so I can go ahead and see select and I can say what I want to select is the
type okay then I want to select this ranking that was given the most of the time so
that's going to be this rating okay now from where I'm getting so I would say from
this below query so this below query I can save in a table using a subquery so I
can say T1

00:32:05
so now I'm just saving this query in this table called T1 from this table one I
will be getting the type and the rating and I I can use this column to do the
filter okay so here I would say that where select everything where this ranking
equals one now this column we cannot use directly inside this query that's the
reason we are saving this query in a subquery and then we are using this column
okay now I can say using here where ranking equals one okay now this one I can just
organize it so let's go

00:32:37
now let's execute it and we will have each of the type and the rating that is given
most of the time okay for movies which is this one tbma for TV shows which is tbma
this is the rating that is received most of the time okay so that was question
number two let's see the question number three now for the question number three it
says list all movies released in a speci specific year which is 2022 so we need to
list all the movies released in this year which is 2022 that's the task so you can
simply

00:33:06
go ahead and say select everything from Netflix so before solving any problem first
I would like to see the data then I can think like which is the approach is going
to you know which which is the approach I'm going to apply to solve this problem so
same approach you can do the same thing by yourself now you can see the question
says list all movies released in a specific year it is very important for you to
read the question multiple time so that you can understand the question till the
time you don't

00:33:29
understand the question see the table read the questions once you understand what
you need to find out then you are good to go so we need to find out list all the
movies released in a specific year which is 2022 so that means for now I will need
to select everything then I need to First filter the data by 2022 okay 2022 2020
and then I need to S filter the data which is only for movies right then I need to
only filter the data for movies so I need two we conditions here so I'm saying
selecting

00:33:57
everything from Netflix and I can say where so I can use this type as a to filter
so I can say type equals now I would say movie so you need to make sure that you're
giving proper spelling and this is a case sensitive so you need to put properly
here okay so this is going to give us all the mobies that we have so far which is
6,131 we have checked this earlier as well now I need one more functions which is
called only selecting the data for 2020 right so for that we have a year column I
think that's the release year

00:34:28
you can see it here so that's the release yeah so we can filter this by this column
say that this column equals 2020 okay so this column name is release and year so I
would say equals now this is a number which is simply you can put 2020 now both the
conditions I want to be true so I'm going to use a end logic here okay because I
want it has to be movie and also it has to be released in 2020 okay so list all the
movies now I can see we have all the movies details which was released in 2020 and
they are

00:35:00
movies you can see it here and if you see the release date that's 2020 okay simple
wear conditions we have used here to solve this question number three let's see
question number four now so the question number four says find the top five
countries with the most content on Netflix so it says find the top five countries
with the most content on Netflix so we need to find out the countries that has the
most content on Netflix so let's go ahead and see the table first so I would say
select

00:35:28
everything from Netflix okay so we need to find out the top five countries which
has the most content okay so in which countries the most contents are being
released okay so for that we can do a group by by the country so we do have a
country column we can see we can do a group by by the country and then for each
country we can do a count on the show ID to see how many content belongs to each
country right so we can do that that's very simple okay so I can simply go ahead
and S select this column which is called

00:36:00
country and I can use a count on this shoes ID which is show ID this one or I can
use a store star so I'm going to say show ID and I'm going to sa it as to tell
content okay and I would say Group by by one so that I I see each country and the
total content that was being released okay so you have the data here you can see
for this three country this showing the content was released once and for this it's
showing the content was released is 13 for null we have this where we don't have
the country okay so

00:36:32
it almost solved the problem but you see we have one country you know here we have
a combinations like different different countries okay but this combinations we can
only see one so there is a possibility we may have more movies that was only
released by Australia that was not released by Australia United Kingdom and Canada
okay so we first need to solve this problem so here we have in one uh you know
column we have multiple countries and name so we need to split this countries and
then we can check for each countries

00:37:00
and what is the total count okay so that is what we will need to do to solve this
problem okay so that is what we will do now how do we solve it so first based on
this comma we can separate this country okay now whatever the movies we are working
on that movies was also released in Australia also released in United Kingdom also
released in Canada so that is going to count in each of these three countries okay
so I'm going to use a function called string to array here to solve this problem
okay so let me first

00:37:27
show you the the functions so let's say I have this column called country okay I'm
going to just select the country just for you and uh I say from Netflix okay so
let's select the country from the Netflix you can see I'm having the countries now
you can see here I have multiple countries right so wherever I have more than one
countries you can see they are splitted by comma so I can use this comma as a split
you know delimeter to split this country right based on this comma so for that

00:38:00
what you can do is that simply you need to write something called string to array
okay now array if you know array is a kind of you can it's kind of list that you
you see in Python okay that is separated by commas okay so now this string
functions string to array functions take two argument one is the column name and
the second is the delimers the delimer you need to put inside this codes okay so
here our delimer is comma now I'm just going to kind of save it and save it as new
country okay so it's going to create one array

00:38:37
okay now if you just run this query you will see it has created array now you see
this this is the only one records we have it's one array so here you see we have
multiple records okay you can see all the records are put in the array using this
double codes okay so still this doesn't solve our problem we still have 880 07
countries records okay now what I will do here is that I'm going to use something
one more function that's called unest okay so for for that that's going to create
one row and each country

00:39:09
is going to you know edit you know one after one okay so that is what it's going to
do so I will just select everything from here and it's called un EST unest okay now
inside this I will just put this column so let's see now I can see we have total
10,000 which is we we we have 10,19 different different countries record here they
are not distinct again but this is the total number of frequency we have okay so
the same functions what we will be doing here okay so it's like array first array

00:39:43
we are converting the country by using this comma into different different arrays
and then from this array we are just unnesting it using this unest functions okay
so what I will do here is that I will simply go ahead and use instead of directly
doing Group by by this country I will do group by by this new country column that I
have created but I can just use this you know functions here okay now what will
happen it's going to first create for each country it's going to split based on the

00:40:12
comma and then I'm just using a count function and I'm doing Group by by one okay
let me run it now you will see we have 197 distinct countries okay no countries
have like d double records you can see you can verify it so each countes have only
one here one time because now I have splitted based on the comma I have splitted
the data into the array and from the array I have unest it into uh you know single
single records then once I do a group buy so each country is going to consider in
one

00:40:41
group and then total number of movies that release in that country is going to
count it here okay so this is a very very important functions part of string
functions you can go through this do some of the practice and then you will be
familiar with this functions okay so that was the functions to solve this problem
now you can see we need to select the top five countries with the most content so
let me just see we can just do a order by here order byy by two in descending to
means in the second

00:41:08
column okay and you see I can say limit limit five to see the top five countries
with the most content now you see we have India which is in the second rank United
States that's the first rank okay these are the top five countries where the
highest movies were released okay now let's see the question number five so the
question number five says Identify the longest movie so we need to find out the
longest movie so we can simply go ahead and say select everything from Netflix and
uh this is going to show us

00:41:41
all the details now you can go to this durations which is column that we have it
here so we just need to find out the maximum durations right so this is a text we
first need to convert this into number and then we can use a mix functions on this
right so so to do that what I can do is that I can go ahead and say select
everything from Netflix okay and I would say where and uh I can use this functions
which is called durations okay first of all we need the longest movie so I'm first
going to

00:42:13
filter out say that type equals movie because we only want to see movie and the
second what I want is the duration is the longest durations right so I can go ahhe
and say durations durations and uh here I need to say the select the maximum
durations right so the maximum durations how do you select okay so we can go ahead
and say select Max and we can say durations Max some durations from Netflix okay so
this is going to work as a subquery now I can execute this query okay you can see
the maximum duration

00:42:50
that we have 99 okay so 1999 it's going to return 99 okay so once you have 99 is
going to compare 99 and then we can execute this query so this is going to give us
the movies that has the highest you know durations now you can just check it so 99
seems to be the highest durations movies that we have okay now the question number
six says find the content that added in the last five years so we can simply use
this date edit column to get the content that was released in The Last 5 Years so
let's go

00:43:23
ahead and say select everything which I need all the contents from net Netflix okay
I will say where and I'm going to use this column which is called Date addit okay
so I say date edit okay now here I need to select the last five year range right
this date edit that you see this is a date but you can see the data type that is
showing as a you know character here so we first need to convert this into a date
and then we can select the Last 5 Years date okay so or we can extract the year
from this this and then we can again compare

00:43:59
so anyways you can do it anyways I'm going to first convert this into a date proper
date and then I can use simply current date so I can say current date minus okay
interval okay and I would say 5 years okay now if you want to see this functions
that's going to return a 5 years old date okay so you can see it's returning a 5
years old date that's 2029 20 2019 9 September 2019 6 September and this is the 5
years old date right you can see it here so this is going to return the 5 years old
date right this functions and

00:44:37
here I can say that is greater than equals to this 5 years old date so this is how
it's going to select the Last 5 Years uh you know the movie that was added in The
Last 5 Years in Netflix okay so for that we will just use this date edit function
so first I am going to convert this into an uh you know actual date so the way I
convert it okay so I will show you let's say I just select this column additionally
okay just to show you and uh you can see a new column that is added date edit right

00:45:08
so here I can simply say to care to date okay so I will convert this text into date
I will just need to select my column which is date and what is the format I have so
all the format you need to put inside this quotes so I have something called month
first you can see if you see this data I have the month then the day then comma
then I have the year so you need to write this function is called month then DD D
in the capital and then you to say comma and then you to say y y y y so that means
month in

00:45:41
the text and then we have DD that means the number and then we have the day number
and then we have the year so this is going to convert this date into actual date so
I missed a comma here so let's square and execute it here so I missed a parenthesis
here now let's see so now you can see here the data type is a date for this column
that I have created using this two date and it's showing the actual date now this
column I can use and I can compare it right with the current date and the last five

00:46:09
years old date so this is what I'm going to do here so I'm going to say instead of
date I will use it this functions okay so this function is going to convert our
data into a date and we're just comparing that hey if the movie was added you know
before this date which is 2019 that we have checked earlier then we will need all
this records so I will just go ahead and organize my query now you can see now we
have all the movies that was edit in The Last 5 Years so you can verify it okay so
you can see

00:46:39
it here edit 2021 2021 okay so you can check all the movies that was added in the
last five years okay all right so this is how we have solved the question number
six now let's see the question number seven so the question number seven says find
all the movies TV shows by director called Rajiv chilaka okay so this is the
director name we need to find out all the movies and the TV shows that is done by
this director okay now to solve it first let's have a quick look on the data set so
we say select

00:47:10
everything from Netflix okay we do have a column called directors so we're going to
use that column so that's going to be this column we can see it here so let's see
so that's going to be at director column that we have country so I think this is
the director column this is the cast so this is the director column you can see it
here okay so in this director column we need to find out all the movies or TV shows
that was you know directed by this director which is called Raj chilaka okay so to

00:47:45
solve that what you can do you can simply use a where functions okay and you can
see where and you can say where director equals okay and I can put this name okay
so this is going to check if in case any movie that was you know directed by this
director which is called Raj chilaka okay so we can just check now you can see
there are 19 movies that was directed by this director okay so this is somehow
correct but in some of the directors we have seen this directors names are joined
like it is it has you know directed by

00:48:25
multiple director the same movies so that's where it is not going to show that you
know records so let me just show you this in director columns okay so here we see
we have two directors okay so what if in case we have same movie that is directed
by rajib and someone else so that means this director function is not going to work
so for this you can simply use a like operator here so you can go ahead and say
like and you can just add uh you know this percentage assigned at the starting at
the end that is going to select all

00:48:55
the movies that was directed by this you know director so even if you have multiple
directors still that is going to select because we're looking for a pattern now in
the director columns we're saying hey if this name exist I need that records okay
that means this percentage and at the first and the end so mean this director can
name at the starting of this all the director or in the middle or at the end okay
so now let's see now let's see we have 22 records and let's see if we have got any

00:49:21
new records okay so here you see this record was not selected earlier because
earlier we had 21 20 records I think and here we have two new records which is this
movies okay so you can see this movies the movies name is this which is called
Mighty Raju okay now this movies was directed by all this director so earlier I
used a where conditions where it did not select a Tre right I can just reshow you
so earlier now we have 22 earlier we had 20 19 records okay so you see that movies
is gone because it's

00:49:55
simply trying to look for this director name if in case any other director name is
there it's not going to select it okay so that means for that we can use this like
operator that's going to look for a pattern in this director name and if this
director name exist if this director name exist then it's going to select This
Record now if in case if this name has a small letter lower case so because in that
case it's not going to select so it expect the name should be starting with this
proper case so

00:50:22
this proper case should start with the capital r then this start with the last name
start with the capital c so to fix it you can use something called I like Okay so
this I like is going to work with this case sensory so if in case if there any new
records by this director and that has a you know small R here or you know small C
here it is still going to select it okay so it's better to use this I like you know
operator instead of Simply using like okay all right so next let's go ahead and see
this one which is

00:50:52
called question number eight so it says list all TV shows with more than five
season okay okay we can again see everything from Netflix we need all the TV shows
where we have the season which is more than five okay now we can go to this column
which is called season let's see so we need the season that is more than five so
all the TV shows okay so we need the web series so you can simply say where and you
can see this is the column name called durations durations greater than five season
okay now the thing is that you see this

00:51:30
durations column we have the minute as well and this is the text right so you
cannot use this greater than less than in text okay so how do we you know select
the season where we have more than five so you can see here we have you know some
similar pattern okay first of all we need to filter the data by this only web
series because we don't want to see the data for the movies so I can just go ahead
and say here is that where type equals TV uh so I would say TV show okay so where
the type is TV show and I

00:52:09
will have all the data with only for TV shows you can see it here we have 2,676 now
I will use this column okay so this is the column I'm going to use so in this
column you can see there's a pattern we have the number first then we have the
season so every records has number season we need this this duration greater than
five season so that means so we can split the six from this column and then we can
compare it here okay so how do we do it okay so for this one I'm again going to
show you how you can use

00:52:38
this split functions okay so let's let me edit here okay let me say I am going to
add a new column here and that is going to use this column which is called
durations okay so I would say durations this column I'm going to use durations okay
so on this durations like we have text to split in Excel so the same functions we
have we can use it here so the function name is called split part so we'll be using
split part function here so let me first show you how this function works so I can
just

00:53:07
write split and I can say part and I can put my column name so this take three
argument one is the column name second is the delers that's for us is going to be
space okay and then the third is the argument like 1 2 3 okay so one means the
before this space the first argument that it is going to select this split part
okay and I'm going to say as maybe something called the you know the season okay
let's run it and you will see the result now I can see we here we have uh the all
the TV shows selected now you

00:53:46
can see for the season we have here two season one season one season okay so last
column you can see 211 it has you know splitted the the text before this space now
what is this this one I can explain okay so let's say we have some name here okay
so we have some name here that's going to be let's say we have something called
Apple okay then we have something called banana then we have something called
Cherry okay let's say from this name you only need the first name so that means
everything that is before the

00:54:19
first space right or you need only everything before this second space okay so
let's say first I need everything before this Force Space so for that what I will
be doing is that I'm going to use this function which is called a split part okay I
will say split part and this is my column name okay so I will say comma so what is
my Del meters so the Del meter is going to be space and which string I need so do I
need the first one second one third one so I need the first one right so I can use
one here so it's

00:54:50
going to print the Apple you can see it has printed the Apple okay so this is the
logic that we are using we using this part to get the first object from that you
know the seasons okay so the same thing we can do it here right so instead of
durations we can say this so this is going to get us the number instead from the
seasons because we have already filtered the data by TV types now we are simply go
ahead and select all the TV shows okay where this number the season is greater than
five okay so

00:55:22
let's go ahead and now run it I need the star so that that means I need all the
records only for the TV shows which is greater than five so let's kind of go ahead
I need to use n logic here so I would say TV shows and greater than five so let's
go ahead and say I don't need this now for and uh split part durations space is 1
greater than five okay so let's kind of run it now okay so I am selecting this uh
alas I don't need the alas here so let's run it so this side we are getting a text

00:56:12
data type so we need to convert this data type into number so using numeric and
then we can uh you know you compare with this you know this greater than you know
operator okay I can see we have the outcome where all the durations is more than 5
ision we can verify it okay so this is how it is kind of working okay so we have
used this split part functions to substract the you know the number and then we
have converted the number because it was in text format then we have converted the
number into

00:56:41
numeric and we are using this greater than sign so this is how we have solved
question number eight now let's see the question number nine if you're still
watching the video I really appreciate your time I request you to subscribe to my
YouTube channel if you have not subscribed yet and thank you so much for all your
love and support so let's continue continue so let's see the question number nine
it says count the number of content item in each genre so in each genre we need to

00:57:04
find out the number of content items in each genre right so that's the questions so
we will again see the table so select everything from Netflix now let's see we have
any genre you know columns yes we do have a genre column I think that's listed in
so this column is the genre column you can see listed in okay so we are going to
use this column and it says find out each genre and the number of you know content
we have so you see this some film belongs to multiple genre okay so first of all we
need to split this

00:57:38
you know each genre and then we need to do a group by by the each genra and see how
many films that falls into each genre okay so for that we first need to split this
records okay into different different uh you know columns or you can use the same
string to array functions and then we can use un to kind of convert this into the
single row and then we can do a group by by this okay so that is what I am going to
do so what I will do here is that I'm going to select this column is called listed
in

00:58:09
okay next I'm going to show that show ID okay so let's first select this two column
because I only need this two columns to kind of solve this problem okay so you see
I have 8807 records now what I need is that each genre should be splitted right
from here so for that I can use either the split functions or I can use the you
know array function which is called string to array okay string to array now if you
use a string to array so that is really easy you just need to give the column name
and the

00:58:41
delator okay so the column name is this the Del meters we know it's a comma but if
you use the split functions you need to Define like how many you know every time
every first genre that is going to be First Column then the second genre third
genre but we do not know how many genres are there in each film okay so the best is
this string functions okay so let me run this query and you will see now for each
each listed each genre you can see it has created a new array okay so this is the
first array inside

00:59:12
this you can see we have multiple records you can see they are you know inside
double quotes Okay so I'm going to now use unest functions okay I'm that is going
to split this all the different different genre based on this uh you know comma so
that is what I will do I'm going to see use here unest put this column let's run it
okay now you can see the total record has increased okay ear it was 8,000 now it is
19,000 now what is happening okay so it is taking the first first record

00:59:45
the first film you can see the show ID okay then it is taking the you know first
film and the first genre if there's a two genre then it will create one more record
by the same show ID you can see so I did two the first International TV shows was
the first genre then this then this okay so for the TV show two it has created
three different rows because we have used un So based on the three genre it has
created different different records okay now that means I can do a group by by this
and I can count this T show right

01:00:14
so the problem is solved so that's what I'm going to do so I'm going to put this
column on the top and uh in this T show I'm going to use count so I will say count
and I will just give a comma here this one I going to save it called johra and I'll
just need to do a group byy to see the results so I would say Group by by one and I
have each genre and the number of this content that was you know kind of created
inside in each genre so I would say total content okay so this is how we

01:00:47
have solved this question number nine so there was question number nine let's see
the question number 10 so the question number 10 says find each year and the
average number of content released by India on Netflix so return the top five years
with the highest average release so we need to find out for India total number of
content that is released in India so for that we will again see the data first
select everything from Netflix now if you go to this in data sections we have
something called

01:01:17
country okay so in this country column we can filter the data okay so where is the
country column so we have this country column okay so first of all we can select
the data for India because we need the data for India so I can say where country
equals India okay so this is how I will select all the content that is released by
India so in India that is released in India it's going to be in India now what else
we want to do so here we want to select the this is the total records which is 972
okay now we want to select
01:01:56
each year so first for that we need to get each year release year okay so we have
the release year which is here or we can get uh even mean the content release year
or content edit date so for now I'm just going to consider this date added because
it says we need to find out the average content release here because some movies
are released in the past maybe in the 1990s now they are being added in the Netflix
so let's say I'm going to consider this in d8 columns D edits so how many contents
are being added in the Netflix

01:02:26
yearly basis so I'm going to use this column which is called Date edit so that
means how many contents average content are added in each year by you know in India
by Netflix okay so I will go ahead and say select this in date edit date edit okay
column and uh then I'm going to first of all convert this into the you know the
year because this one I want in the year right so first for that I will first
convert into date okay so to do to outset two and outset date and I'm going to say
this one this

01:03:01
is a text right right now I need to First convert into a date so I'm going to say
to date and uh I I will Define the data types so this one it has month which we
already did earlier so we have month DD and then we have comma then y y y y okay so
this is as date right this is the date we have and I need rest everything for now
okay so I need to now extract the year from this uh date column and then I can do a
group by by each here and then I can find the average right so for that what I can
do is that simply I can go

01:03:38
ahead and say here I can go ahead and say cut and I can say extract and I will say
here from from where from this column okay now this one I'm going to save it as a
year and I will do a group by I will say Group by Group by by one so that I can
have the count of star okay so that means for each year I I have the total number
of movies that was released now so so that means I will have each year and the
number of content that was released okay so let's go ahead and kind of check this
extract Year from there this typo so

01:04:19
extract word from this two date and uh now you can see we have these are the years
that is available from India so all these years and the number of content that was
added by Netflix okay now we need to find out the total content now we need to find
out the average film for each year right so the average is going to be the total
content for India which is maybe 972 and for each year we want to find out the
average so each year whatever the content so let's say 2020 2018 we have 333 so
this is going to be the

01:04:48
average for 20 you know for this year for this year and the average is going to be
different right so for that we were going to use subquery here to get this you know
to solve this okay so I can use this uh content so this going to give me the early
yearly number of content now here I need the toal content that is released by India
so for that I can simply use a subquery inside the subquery I can say toal select
count of star I can say from Netflix okay where country equals okay now country
equals India

01:05:25
okay so this is how this is going to return me the total content that is released
by India you can see 972 okay now this is going to return me yearly content which
we already have seen now I will just need to multiply with 100 so that is how I
will be getting the average okay now I can save it as a average content for year
okay so let's execute it now so we have the average content perod S zero at the
moment there is some reason because we are doing we're doing multiple calculations
here so we are

01:05:59
getting a divisions error so I will need to convert this into numeric numeric
because SQL doesn't know the data types of this so we will say numeric and this one
Whatever I'm getting from this also I'm going to convert into numeric then I will
get the divisions so now you can see here I have 2018 so so this year we had
highest content from India which is 333 and that's 34% all of total the content so
this year 203 So 20% okay now we can simply use a round functions to kind of do a
round up so you can say

01:06:32
round and uh next line okay can around by two okay so that's how and this one is
like yearly content okay so we will have now the clean reports so we're getting
toal content and then we are dividing this yearly content divid by the toal content
using a sub okay you can see it here so that was the question number 10 so let's
see the question number 11 now so question number 11 says list all the movies that
are documentaries so we can go ahead and see the data which is Select everything
from Netflix so now we

01:07:11
will first see the data so let's see the data first so we have uh something called
genre in the genre we have something called documentaries right so we need to find
out all the movies that falls into documentaries genre okay so for that I I can
simply go ahead and say select everything from Netflix where listed listed in and I
can simply use a like operator here as well okay so I could say like and uh what I
need to find is documentaries okay so I will put documentaries now to ensure that
if it

01:07:50
is starting from capital or small I can use I like okay so we will see if we find
any different type of documentaries I have missed where conditions so I'm saying
where this listed I like document so we have 869 records if I just remove this I
like let's see how many records we get we get no records okay because the actual uh
the you know the spelling that is the capital D but still I'm defining as small D
because I'm using I like so it is still working fine completely okay so you can see
that we we have 8 6

01:08:27
n total records okay now one thing you can verify it even if it has more genre but
if the documentary genre is added where is still selecting it let me just see the
listed in okay so here you can see the second film it was released in two genre so
documentaries and the international movies so you can see still we are getting that
because we are using this like operator using this you know percentage assigned
okay so that is how we are getting all the records wherever it can find the
documentaries it is

01:08:59
going to return the records okay so that was question number 11 now let's see the
question number 12 so the question number 12 says find all the content without a
director so very simple we have a director column so we can check if the director
column has any null okay so that's going to be this column so we can select
everything okay which is going to be the same and we can simply use the director
column here director director and we can say is null okay all the content without a
director okay so that means director we

01:09:36
need null okay we can just select it and these are the all content where the
director name is missing 2,634 so you can verify it in the directory columns we
have null okay so that is how we have solved the question number 12 let's see the
question number 13 now so it says find how many movies actor Salman Khan appeared
in last 10 years okay now let's see the question number 13 so the question number
13 says find how many movies actor Salman Khan appeared in the last 10 years okay
so this one we can solve using simple

01:10:07
whereare conditions I would say select everything from Netflix okay now we have a
cast column in this cast column I'm going to use uh simple wear conditions to see
how many film we have you know cast as Salman Khan so for that I can say where and
I'm going to see use I like and I will put I like here I'm going to say cast I like
and I'm going to use Simple percentage and I'm going to say Salman Khan okay so
even if it is some of the you know movies name has Salman Khan with a

01:10:43
small or you know upper case it's going to deal with this I like okay now let's see
how many film we have so we don't have any film let's see like okay I missed a
parenthesis here so I will say I like I missed a you know percentage here now you
see 20 film we have for Salman Khan okay which you can see it here now we just need
the data for last 10 years okay so that is released in last 10 years okay so for
that we can simply filter by this date edit or you can filter by the release here
because if in

01:11:23
case you want to check by the release here you need to filter by the release here
so that's going to be really easy we have the integers so I can go ahead and say
end this release year greater than last 10 years like so the current year minus 10
years right so current year you can use this current date functions and you can say
extract and here from this today date so it's going to return the a from the today
date and you can say minus 10 so this current date return the today date and we are

01:11:57
extracting the year and then minusing 10 that's how we get the 10 years old date
now then we are comparing with the release year and saying hey release year is
greater than this 10 years so that is how we select the data only for last 10 years
so we have two records let's verify it the first record that was released in 2015
and the second record that was released in 2019 so these are the two film we have
for this actor which is called Salman Khan okay you can see the film name all right
so let's see

01:12:26
now the question number 14 and the question number 15 so first solve the question
number 14 it says find the top 10 actors who appeared in the highest number of
movies produced in India okay so for this I'm going to again select everything from
Netflix so simple Group by we need to do using a we conditions okay so we need to
find out the top 10 actors who appear in the highest number of film produced in
India now if you see this cast we have each cast and they have you know this the
each film we have multiple cast

01:13:03
right so we first need to extract each cast and then based on each cast we need to
duplicate each film based on each cast we need to create each film and the cast
name okay so let's say this film we have 10 cast so we need to create 10 records
for this film the film two and then we can do a group by by this so that we can
find out the each class and the total number of film they have done so this one
again we can use this same function which is called string to array functions okay
to solve it just to show

01:13:31
you I'm just going to print this cast and I'm going to show show ID okay or cast
and I'm also going to use this string string to array okay now this take two
argument one is the column name which is cast and the delimeter which is going to
be comma now we just need to create unest okay so let's go ahead and print it now
now you can see we have 64,000 different records because now it has splitted the
cast right from the cast it has splitted the cast so it has taken the First cast
Then second cast Third

01:14:15
cast from the same film and you can see everything remains same so I did2 for the
show2 all this cast okay that means it has created from the show2 it has extracted
all the cast and then it has created a new column okay now we can simply do a group
by by this column and do a count that is how we will see like the total number of
movies that uh you know appeared by each actor okay so first of all let me just
delete this show ID let's delete this cast and I'm just going to name it call
actors okay now I

01:14:47
need each actor and the total count so it's like same toil content Okay so let's go
ahead and now go ahead and say Group by one which is my first column so I will have
each actor and the total movies number of movies they have appeared okay see we
have each actors and the number of movies they have appeared now you can see we are
getting total 39,000 different actors because in each film there are multiple
actors that work but it doesn't mean that each actor will have a one film right so
that's the

01:15:19
reason now again let's say there are only two film that is done by 20 actors right
so for that we will have 20 actors so each actor some actor may have done in two
film some actor may have only participated in one but we will have 20 unique you
know records for 20 actors okay so that is what actually happening here let's go
ahead and kind of now see what else we need to do so now we need to find out the
top 10 actors who appeared in the highest number of movies produced in India okay
now we can simply

01:15:48
go ahead and do order by here order by this two which is going to be order by by
two that is how we have each actor and the the total film that is released you know
for each actors Okay order by in descending okay so let's go ahead and kind of run
it again so you can see this is anupam here so that's 39 this is Rupa okay now one
thing we need to filter the data only for India right we are getting like now here
we're getting all the you know records okay so we do not want it we only want the
data for India so we can

01:16:25
say where and we can say country and I can use I like okay I like and I'm going to
say India now just anyone of you tell me why I'm using this I like and why I'm not
using country equals India you can write down your answer in the comment box why
I'm using country I like why I'm not using country equals okay so you can write
down your approach in the comment box why I'm doing it let's see now we can see we
have 4,000 records we just need top 10 so I'm just going to say

01:17:00
here limit limit 10 so we have top 10 actors and the number of film that is done by
this top 10 actors okay you can see it here all right that is how we have solved
question number 14 now let's see the question number 15 so the question number 15
is little difficult and you can see it here it says categorize the content based on
the presence of this keyword like kill violence in in the description field label
the content containing this keyboard as bad and all other content as good so count
how many

01:17:34
item fall into each this category okay now in the descriptions in each film
description we need to find out if that film has something called kill or this
violence you know uh violence a keyboard okay so for that if in case it has kill or
violence we need to categorize that film as a bad and then otherwise we need to
categorize that film as a good a film or you know content or film or TV shows then
we can see each you know bad how many count we have and each good how many film we
have okay so that is what

01:18:03
we will do let's first see the data select everything from Netflix and uh let's go
ahead and run this query which you can see it here 8807 now you can see this in
descriptions if you see this in descriptions here we need to find out something
called kill or we need to find out this violence okay so if you want to just check
you can say where and you can simply use I like and you can say I like and uh we
can simply say I like and we can look for kill okay let's see if any film we have
by

01:18:38
the description as kill so I need to say descriptions which column right
description like kill so in the description we're looking for kill so we have 3004
records where descriptions we have killed let's verify it so to protect his family
from powerful drug okay is killed now you can see this is what it has returned okay
let's see this one so we have something called skill so it is the reason we are you
know selecting this record so though this is not the correct you know records but
if

01:19:11
you see it here so here you will have again something called killed right so we
have killed here so if you see it here we have again something called killer okay
so this is they are similar so this is the reason it is kind of returning this one
because we have skill so because of that we have used this percentage sign it's
returning this if you don't use the percentage then that's going to work okay but
thing is that we need to find out this kill okay okay that's the one thing and this
is the

01:19:40
first condition second I want to check one more conditions is that that's violence
right so that's going to be instead of kill I want to look for violence let's kind
of run it let's see if that film description has violence or kill the this keyboard
I want to select that film okay let's see this one so this one you can see kill
this one you can see violence I did check something let's check this one has skill
this one has skill okay this one has surface sickness violence you can see this one
has

01:20:16
violence okay so this one we want to categorize okay so this same approach I'm
going to use it okay use a case statement to create a new column using the same
approach okay so I can say case and I can say when so when what I want to look for
is that I want to look for descriptions I like kill so why I'm using I like you can
just comment out like why I'm using I like not using like okay so you can let me
know in the comment box so I'm saying when descriptions I like equals I like you
know uh in this quotes and

01:20:48
percentage skill or I want to check when this descriptions I like viance okay then
I want to call it bad okay bad film okay bad film or bad content okay now else I
want to call it good content okay so every case need to end using end and you can
give a column name called category okay so that is it now this has simple one logic
either it check it it has skill or violence if it's if it has skill or violence it
categorize that film as a bad content otherwise it categorize that as a good
content okay

01:21:41
so let's run this query and let's see it will create a new column let's kind of
check the I missed a comma here okay let's kind of run it again so it's going to
create an column at the end so that's you can see it here okay so this film it's a
good film okay the second one it's a good film this one is a bad film okay so let's
verify it so you can see it here here we already can see that we have violent okay
so that's kind of similar let's kind of check we also have

01:22:13
a skilled and we have violent okay so this is a bad film now we want to do a good
categorize we want to do a group by by this category so that we can do a count so
to do that I can use a CT here I would say with new table s CT syntax is very
simple you say with start with with and you just give a city name you can give any
name and you just need to say s and inside this parenthesis you need to put your
whole query okay now what will happen this query is going to execute first by SQL
this is going to create a table called

01:22:45
new table the city itself it's called temporary table so from this table I can
select whatever I need okay now from this I would need this category because I want
to find out each category and how many count we have so as total content from where
I'm getting all this I'm getting all this from this this C which is called new
table okay so I will just do a group buy by one which is this category so that I
can have the total count for each category like good or bad okay so let's see now

01:23:21
now you can see we have good 8465 content and for bad we have 342 content okay so
based on this keyword we have kind of you know created this group and then based on
the group we are doing Group by and we are checking the total content so this is
how we have solved all the 15 problems now based on your analysis you can solve
some word problems as well and you can include those problem as well in your GitHub
repository so let's go ahead and see how we should publish this project so I'm
going to first save it as a Solutions in

01:23:51
desktop so I will say solution and uh I'm going to save it in desktop okay now
let's go to GitHub and create a new repository okay so for that we will need this
readme file okay so to create a new repository you can go to GitHub and go to
Google and just type GitHub okay and if you have not signed up you can signed up
you can just click this icon and uh you can use your Google account to sign up once
you sign up you can set up your profile you can just click this option which is
called profile and uh you can click on profile

01:24:24
and you can set up your profile okay so what I want to show you is that you need to
go to repository and you can click on new and you can just give a name so I would
say Netflix SQL project okay so very simple name I would say project two because I
already have project one now this was important to make sure it is public because
if it is private no one can access it or no one can see it this A read me file
because we will add a read me file where we will explain about the project okay so
click on create

01:24:52
repository that's going to create a new Repository and you can see the repository
here okay so here we want to upload all the files and folder each and everything
that is related to this project so we can say upload files and uh you can upload
the each and everything okay so I want to upload this solutions I want to upload
this data set okay now once you upload everything next what you need to upload is
that you need to upload the read me file as well so you can write the rme file by
yourself

01:25:21
where you can write the descriptions about the project like each and every details
okay okay so you can just write something called Netflix movies and TV shows data
analysis using SQL okay so first hash the one hash mean it's a heading okay so you
can just save it now you will be able to kind of see it's a heading okay so you can
see it is a heading let's say I want to AR a image so I can simply kind of put a
image tag here so I will need to just write something like this square brackets and
parenthesis inside

01:26:00
the parenthesis I need to put the image link okay here I will just use any name
called Netflix logo and I can get the image link so to get the image link I need to
First upload the image in this repository or we can use you know image from any
anywhere else so I will say upload and uh I will say from this Netflix folders I
have the image so let me see I think I have the image in download I'm going to
include all this image and everything so you just need to upload it I will just
need the link okay

01:26:29
so let me first copy the Link logo and you can see the link is copied I'll just go
back to this readme which is this one okay you can see this read me I will just
click on edit now here inside this parenthesis I will just add the link okay I just
click on Save and you will see the project R me file is ready with this logo okay
next what I need is the objective okay so for object objetive I will just use
double you know hash and I would say objective okay and I can just say save you can
see objective right so this how you can add

01:27:07
all the command here now I'm going to include this read me file which I already
have it here so you can just copy paste it and you can just add your questions and
you can write your Solutions okay so let me just copy this R me file here and uh
just click on Commit and uh this project is ready so you can see the project nice
heading you can change the name if in case you want something else okay you can add
the name as for you and you can rewrite this overview objective data set now you
can see this schemas so this data set link

01:27:39
you can add it based on you if you want to edit okay so this is going to take you
to the kel.com from where we got the data set you can see it here okay so you can
see now we have this questions which the questions we have solved so you can edit
any of the questions to edit you can click on this edit and uh you can just make
sure that whatever SQL queries you are copy pasting it that has to be inside this
you know SQL command you can see SQL starting with this btic SQL and btic so here
you need to put your

01:28:11
commands so three btics should be there and this SQL you know word okay now this
double hash means it's a heading so then triple hash means it's a third heading
that means the heading template three okay so this is how you can add all the
questions and at the end I have added some link which you can remove by yourself so
that's going to be this links which is about the author so you can delete it you
can add your email address and maybe your you know portfolio link and other
information

01:28:39
that you want to showcase okay that's it for this video thank you so much for
watching the video till the end you can get all the project resources and data set
from the video descriptions and if you're getting any issues while completing this
project you can let me know in the comment box have a good day see you tomorrow
take care bye-bye

You might also like