0% found this document useful (0 votes)
4 views

SQL

This document outlines a comprehensive SQL course designed for data analytics, covering everything from basic concepts to advanced techniques. The course includes practical exercises, a capstone project analyzing data science job postings, and resources for learners, all offered for free. It emphasizes the importance of SQL in the data science industry and provides insights into relational and non-relational databases.

Uploaded by

marklevi.zoom
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

SQL

This document outlines a comprehensive SQL course designed for data analytics, covering everything from basic concepts to advanced techniques. The course includes practical exercises, a capstone project analyzing data science job postings, and resources for learners, all offered for free. It emphasizes the importance of SQL in the data science industry and provides insights into relational and non-relational databases.

Uploaded by

marklevi.zoom
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 127

00:00

data nerds welcome to this full course tutorial on how I use SQL for data analytics this
is the course I wish I would have had when I first started learning this tool as it's a super easy
to learn skill when taught properly and I feel like you can strongly master the basics by the
end of this video now this tool is one of the most highly sought after skills in the data science
Industry don't believe me let's check out the data for skills required in data analyst jobs SQL
is the number one skill

00:25

in almost half of job postings for data Engineers it's still number one and it's more than
in half for data scientists it's the second most important skill right after python now SQL
stands for structured query language and it can be spoken either SQL or SQL if I needed to
perform analysis on my computer of a data set in a database sqls the programming language
I'll be using to extract out any insights and this skill is something that I've been working on
refining for years all the way from my first job as a data analyst in Corporate

00:55

America to even my last role with Mr Beast squl is everywhere so what will'll be
covering in this course well I've broken it up into three chapters we'll be starting at the very
Basics and then building way up to an advanced section building our very own Capstone
project for the basics chapter we're going to start at the very beginning including breaking
down important Concepts behind SQL databases from there we're going to jump head first
into actually practicing SQL queries focusing on the basics of

01:22

common keywords simple analysis and even more advanced topics like joints now
don't worry if you have no coding or analytical experience as this chapter is designed for you
now after we get the basics down we're going to then jump into Advanced Techniques we're
going to walk through creating and setting up your very own database locally on your own
computer we're then going to move into more complex analysis using things like CTE and
subqueries now you're easily going to forget these basic and advanced concepts unless you
actually

01:51

get into implementing what you learned and the best way to learn something is to
build something and that's what we'll be doing in the Final Chapter we're going to be
working to solve a real world problem problem analyzing the top skills and jobs in the data
science Industry we'll be using the data set powering my app data nerd. Tech that provides
insights on data science job postings and by the end of it you'll have your own custom
project that you'll be able to showcase on GitHub for the world to see now I'm a firm
believer in

02:18

open sourcing education so this course is completely free and all the resources that
you need for it are included so I'm going to be providing you locations to actually run your
SQL queries along with the data sets when you build your project but I'll be honest YouTube
isn't paying the bills like it used to so I have an option for those that want to actually support
the course while also rewarding you for doing this for those using the link below to
contribute I'm going to give you some extra perks that

02:47

will help you learn SQL even faster you're going to have access to multiple practice
problems within each section they break down the problems into easy medium and hard
and even provide the solutions for completing the problems which will reinforce all the skills
that we're learning in the video additionally I'm providing all the course notes for this video
they're going to provide even more details and resources for the sections along with
breaking down the queries in the videos a lot further and

03:12
then once you've completed the course I'm going to award you a certificate of
completion now one quick shout out before we jump into the first chapter on Basics
explaining what is SQL I want to give a shout out to Kelly Adams She is the brains behind the
majority of this content in this course she's a full-time data analyst that I've been following
on LinkedIn for years now and I was beyond ecstatic whenever she said she'd team up with
me to build this course I'll be linking Kelly's LinkedIn and also

03:37

portfolio below the like button so be sure to check that out all right with that let's
actually jump into the basics chapter all right welcome to this chapter on the basics and for
this we're going to be focusing on the very fundamentals in this section and also the next so
in this section we'll be breaking down what exactly is SQL and then after that I'm going to
give you access to all the different data sets so you can work the problems right alongside
me now what I'm showing here are the course notes and for those that

04:06

have contributed to the course you'll have access to go right alongside me and see all
the different notes as I'm walking you through this so with that let's actually dive into
understanding what is SQL and for this we need to understand two major Concepts behind
this the first concept is around where you'll be writing and executing your SQL queries and
the other main concept is around databases where our data is actually stored so let's dive
into this first a database is a collection of data and these things were made to hold

04:37

massives amount of data an Excel file Can Only Hold around a million rows of data but
a database can easily hold millions of Excel files so if you do the math that's a lot of rows of
data now about those databases there's two major types of databases relational and non-
relational relational databases include things like tables very structured data non-relational
databases are for unstructured data we're going to dive into each in a little bit but first let's
go back to where we're actually running and executing SQL now whenever
05:08

we write out SQL code in order to request data from a database this is called a query
and this SQL code can do a whole a heck lock more that just request data a common
acronym associated with what it can do is called crud and it stands for create read update
and delete with SQL keywords like create an insert into you can add new records then with
keyword like select you can retrieve specific data you need next with the keyword update
you can modify existing records and then finally one the scarest of alls if you need to

05:39

remove records you can use delete now these are all commands we're going to be
covering throughout the basic and advanced section so it's nothing you need to commit to
memory right now so now we understand the basics of querying and also a database but
where the heck are these databases even stored there's a couple major options you can
either run it locally on your computer or you can use a server now for this course we're
going to be running the database locally it's going to save a lot of costs and also I feel you
learn a lot by

06:06

running databases locally before you jump into something more complicated like a
server local computers are also used during development and testing so you may find
yourself from time to time actually downloading a database on your computer so you don't
mess with the real one so where a real world databas is typically located and you have two
major options one is a company server and that's something that's actually hosted internally
and you have that the company full control of This Server this option

06:33

is nicknamed on Prem now the other options like Cloud providers do this for you
companies like AWS Google cloud and aure are known for this and frankly for a price they'll
manage all the different headaches of managing your own server so this is conveniently
called serverless and that doesn't mean there are no servers it just means you don't have to
have the headaches so now that we understand the databases can be stored either on your
computer or on a server what are the different databases we can
07:00

store in these locations well there are two major types relational and non-relational
databases relational databases store data in rows and columns this is very much used in
transactional applications say I had a database of job postings that companies go to daily in
order to transact with an update of what jobs are available you could have one table to keep
track of all the different jobs and then you could have an Associated table to keep track of all
the different different companies and their information there would then be

07:33

things like IDs within the two in order for you to relate the different tables hence
relational databases this course will be primarily focusing on using SQL to interact with
relational databases but the other type of databases are non-relational databases commonly
known as nosql but nosql doesn't stand for not SQL it stands for not only SQL these type of
databases support a host of different options including unstructured data so not only can it
include more structured data like column and row

08:06

based data but also you can have other types like key value payers graph-based and
even document based these are using your everyday applications to track your connections
on LinkedIn or to manage your docs in Google doc so what are some popular database
options well if we navigate to the stack Overflow developer survey from 2023 over 76,000
respondents voted on what is their database of choice if we look at the top five options four
of these are relational databases and only one is a non-relational database and for this

08:39

we're going to be focusing on the number one and three option the first one is
pronounced postgress you don't have to pronounce the SQL at the end the other one per of
the Creator is called SQ light but I'm pretty lazy so I'm just going to call it SQL light so we
covered everything we needed to know for databases but where the heck are we going to be
actually writing and running these queries well there's a few few popular options for this
and it's commonly done in an editor whether that's from a database provider a cloud

09:06

provider or even just a plain old code editor let's talk about database provided editors
first for popular options like postgress and MySQL they'll actually provide you an application
to use on your computer to access their databases inside of you'll be able to track status of
your different tables and actually look at the progress of it and you can even do something
as simple as writing SQL queries now if you're using something like a cloud provider like
Google cloud or Azure they're also going to give you the option inside of a

09:35

web browser in order to interact with your database I keep a lot of my data on Google
cloud and it allows you to monitor all the different tables that you have along with an
interface to go in and query the data now the third option is the one that we're actually
going to be using but I want you to be aware of the other two and this is a code editor
during the second and third chapters of this course we're going to be focusing on using a
popular option VSS code which allows you to not only manage all your different SQL files
that

10:02

we'll be using for this but it will also allow us to run these queries for the beginner
section we're going to be using a code editor right inside of your browser this bad boy is
called sqlite viz and it's completely free to use this is a popular open- Source option that
allows you to run queries right inside your browser and in the next video we're going to talk
about more about what the data and databases we're using inside of things like sqlite viz
and vs code all right so now you should have a basic

10:32
understanding of how and where SQL queries are run and then what and how
databases are used in order to store data so with that let's actually dive into seeing where
you're going to be playing with all this data all right see you in the next one all right welcome
to the section on an intro to a course where we're going to be focusing on understanding
what data sets we're going to be using for this and how we're going to be running those data
sets but before we even talk about any of these data sets we need to understand what
problem

10:59

we're going to be solving for the entirety of this course you're going to be taking the
perspective of a job Seeker with the goal of identifying what are some of the highest paying
jobs along with what are the most optimal skills to learn so in order to solve this problem
you need a data set that has this type of data in it well luckily I've been collecting this type of
data in my app data nerd. te and inside of this app it Aggregates job postings in the data
science Industry in order for you to see things like what are the top

11:30

skills for data analyst additionally it also includes pay requirement for the different jobs
along with detailing how these skills are paid based on a certain job title so the data set that
we're using for this course is from that app and specifically it's for data science job postings
in 2023 here is an ER or entity relationship diagram about the different tables in this data set
we have four major tables one fact table con containing all our different job postings then
two Dimension tables that

12:04

contain key information about the skills required for those jobs and then finally a
dimension table around the different companies that are posting these jobs so what the
heck are these fact Dimension tables you probably never heard of fact tables contain the
core data for the analysis they measure and record actual events so in our case the different
job postings because of this there's typically a high volume of records and then there's
usually some sort of foreign key to associate it to a dimension table the dimension tables
12:33

describe attributes or dimensions of the data so in our case skills or company data
these are really important in supporting filtering and grouping different sets of data they
usually have fewer rows of data but generally are more descriptive anyway going back to the
ERD the job postings fact table has a job ID column in it and this is a unique ID to all the
different job postings this ID is inside the skills job deal table as a foreign key and we have
this in a separate table because there can be multiple skill IDs

13:06

associated with the job ID and conversely this is associated to the skills dim table which
actually has the list and the type of skills that are associated to the job back to that job
postings fact table we also have a company ID column and this column is associated to the
company dim table which in this table has key information about the companies such as its
name and its URL that you can access anyway that's all theoretical talk let's actually dive into
actually looking at the different data we're going to be

13:37

using for this and for this you can navigate to this URL right here and it's could provide
access via that tool SQL light viz in order to analyze and visualize our SQL queries we're going
to dive more into this tool in the next section but for now understand that I'm running a
query up at the top and the results are appearing below anyway this is the job postings f
back table so for this analysis that we're going to be doing I'm going to be analyzing it from
the perspective of a data analyst because I'm a data analyst and that's

14:07

what I'm interested in but you actually have different job titles available to use here's a
list of all the different options you can not only look at data analysts but also data scientists
or data Engineers if you wanted to you can explore senior roles or even roles that are slightly
related such as business analyst machine learning engineer software engineer and Cloud
engineer so feel free during this course to modify any of my queries in order to adapt it to
what job you care more about now the
14:35

data set includes postings from around the world I'm primarily going to be searching it
for remote jobs but you can adapt it to any location that you find fit and the last thing to app
about is it also includes a lot of companies from diverse perspectives not only tech
companies but also Commerce companies now there is one other data set used in this
course and we're only going to use it from time to time specifically we need it for our
arithmetic operations this this is a fictitious data set Based

15:02

on data science job invoices there's only one table inside of it and it's conveniently
named invoices fact we'll go over more of this data set in a future section so we covered
what data we're going to be using for this but what database are we going to be using to
actually host all this data well as mentioned in the last video we're going to be focusing on
postgress and SQL light this beginner chapter will all be based on SQL light it's a lightweight
file-based database and it's ideal for small to medium apps with zero

15:33

configuration it's so small in fact that it's actually running right inside your browser
whenever you access this tool SQL like Vis for the second and third chapters in this we'll be
using postgress it's an advanced open-source relational database suited for large
applications and it supports a lot more complex queries so because it's open source you'll be
able to download this database on your computer for free and use it now I do want to call
out there are some slight syntax differences between the two and syntax is the set of

16:05

rules and structures you have to follow while programming for the majority of course
commands like this are going to work in both these different databases it will only come to
very Nuance situations where these two databases are going to have differences in it but
really we're not going to find a lot now that we laid all this ground work let's actually jump
into doing all these different queries for those that purchase the course notes whenever you
navigate to a section it's not going to only showcase what databases you're

16:30

using for this but also we'll have an overview of what we're covering for each of the
Min sections within the chapter you'll have some basic notes on the topic along with the
query and the expected results when we get to the practice problems within the chapter it
will have similarly not only the data set but then also the questions and the solution all right
so with that enough me yapping let's dive into it see you in the next one all right welcome to
this basic section where we have this mini

16:56

section on the basics we're going to focus on a handful of keywords that I use on a
routine basis and I think you need to have committed to memory in order to know how to
use and so we'll be actually implementing it and running queries with these commands to
learn more about them first up it's actually instead of one keyword we're going to focus on
three and this is going to be saying select star from and a database table select is a keyword
that identifies a column or columns we want to connect to and from identifies the

17:26

table or tables we want to connect to let me show you what I mean so routinely I'll
need access to some sort of database for my job I'll have to reach out to something like a
database administrator and they'll give me access anyway select and from is how I'm going
to verify that I have access to the database so in this case let's say I got this email and it says
I have now access to this job 2023 and it has all these tables in it I want to go now verify that
I have access to it so let's verify we have access to this and you're going

17:53

to navigate over to the URL that I have linked right here and this is going to be our little
workspace that we're be working in in order to verify access now when you pop this open
you should have two separate Windows one an upper window where you're going to write
the query and then a lower window below this that is going to display all the different
results that you have we can see here that this query is already executed I can open up this
left side pane right here and actually look into the database

18:22

that we have access to in this which is jobs 2023 then all of the tables within that
database so these four tables and then you can further break it down or see what's in it by
expanding this and then this displays the columns within that table so I'm going to go ahead
and close that out cuz I don't want to really look at it and that first statement should already
appear on your window right here you can go ahead and run it again if you want by either
pressing run SQL query and it'll go through and fetch results

18:55

looks like there's 33,500 available you can also as a quick shortcut press controll enter
and it will also run the query now anytime I get access to the database this is one of the first
commands that I'm running in order to see all the different columns that I have access to
and to ensure actually that I have access to that table so in this case job posting is fact if I
wanted to I could also see another table in this data set I can put in company dim and press
control enter and see that hey yeah I also have access to this one

19:30

now these commands must be written in this order you would think you would want to
say from a table select these columns um because that would be the order but if you tried to
actually run this control enter it's going to give you this error syntax it has to be in that
specific order of Select and from and I'm going to be sprinkling also best practices in through
this so you'll notice the select and from in this case are all capital letters these keywords are
not case are not case sensitive so in this case I could have lowercase and

20:07
also uppercase mixed in with each other they're still going to work the values for things
like the table name are case sensitive depending on the database we're using SQL light in
this case so if I were to use a Capital C it's still going to work but whenever we start using
postgress in the basic or in the advaned section it is going to be case sensitive so it's best
practice to just leave this lowercase as it was intended in general these uppercase keywords
and then lowercase of things like the column

20:39

names or table names makes it easier to read and so that's what you want to do
especially whenever you have to get into troubleshooting later you want to be able to easily
read it and able to edit it now in this example we selected all columns but there's going to
be cases where you don't want to select all columns not argue in most cases you don't want
to select all columns it's very resource intensive for the server that's hosting this database to
provide all those columns so you'd want to actually fine-tune the specific columns

21:09

you want to use so in this case let's say I wanted only the two columns of job title short
and job location instead of using this asterisk which is used to select all the columns instead I
would just specify the columns so I'd say job title short and also job location I'm going to put
a comma between each one of those to say that hey we're moving on to the next one this
case when I press control enter executing the query we can see we have now these two
columns shown now moving in some best practice for this one I

21:46

like to have it to where whenever I'm selecting multiple columns I put it on separate
lines to make it just more readable now with specifying these column names from a table
they can actually be specified by saying the table name itself and then using this dot
operator to then showcase the column after it I can do this actually for both of the v's here
because they're both within that same table and whenever I run this query going to get the
same results now this is going to be more important later on
22:20

whenever we're combining multiple tables and you actually need to specify where this
column is coming from within a table but in this case because we're only using one table I'm
going to say it's not necessary and I'm going to go ahead and remove it but I just want you to
be aware of it for the time being and show you that it still works all right so we already
understand that quering all the different Columns of data set can be intensive for a Ser ex
actually for really big data sets another thing

22:49

that we want to do besides limiting The Columns is actually limiting the amount of
rows and this can be done via the limit statement we can specify any number of values after
this and this will specify the number of rows we want it to return right now these results are
returning around 33,000 job postings back so in this case let's say I only want to provide
back five I'd put limit five and that limit five needs to come after the select and from
statements and it's the very last statement you'll ever be writing inside

23:25

of a SQL statement I'll press contrl enter and now we have five rows retrieved we can
also see that for the five rows retrieved it only took 026 seconds running this without that
limit we can see that it takes around three times longer to get all those different rows now
we're working with a relatively small data set so you're like look this is only milliseconds yes
whenever we start getting into millions or even billions of rows this is going to add up and
limit is going to save you some time one note on best practices so I talked

23:59

about indenting these lines to make it more readable typical best practice is to have
anywhere between two and four spaces personally for me I just hit Tab and it automatically
inserts four spaces into here I could also do multiple different tabs and if I were to press
controll enter in this case it's still going to execute whenever your SQL statement is sent over
to the database itself so effectively all that indentation is removed and in this case like it's
one line although this is hard for a human to read the database itself
24:33

can obviously go through it and understand what it needs to be next up is the distinct
keyword and this one is going to follow select in order to select a distinct amount of values
within the rows of a column this is going to be a very resource intense type of calculation
because it has to go through all the values in a colum column and then Aggregate and find
what are the distinct values so let's take for example this job title short column right here I'm
going to go ahead and reset this to include all the different

25:07

values if I scroll down it I can see a lot of the values in here such as data scientist are
repetitive and then data engineer I've seen multiple times that analyst so let's say I want to
get all the unique values from this in this case I'll remove that other column and then specify
distinct and from here press control enter to run it now I also didn't have that limit
command remember so now we have 10 rows retrieved and these are all the different
values or unique values from this job title short

25:41

column now this isn't only limited to categorical data like in this case let's say I wanted
to get the unique values from the average salary column now I could do the same thing of
Select distinct salary year average column from this database pressing control enter and
then from here I now have all those different unique values within it looks like there's over
2700 unique values now you may notice I have two queries within here and you can see that
each one of these queries I've added a semicolon at

26:14

the end the semicolon sometimes you'll see me use it sometimes you won't it's used to
symbolize that this is the end of the sequel statement that you want to execute this case we
have two separate and distinct uh SQL statements uh no pun intended with that and in this
case with our editor we can only show one result of a SQL statement mainly that last one in
this window right here when we get to the Advance section and we start using a different
code editor you'll see how you can actually run all these different
26:47

queries and have them populate in different tabs anyway for the time being anytime
you're running queries I like to just keep it one in there and whether you have a semicolon
or not it's up to you I'm going to leave it semicolon free next is the wear statement and this
is used in cases when we want to filter out particular data we already talked about
specifically now you can do things like select columns or limit the amount now we can go
down even further and actually filter into what we need all right going

27:16

back to a query where we have the four Columns of interest that we're looking at from
our job postings fact table and we can see we have a lot of different values from different job
titles short so we can use the wear statement to filter down to something more I'm more
interested in such as data analyst the wear statement will be always directly after the from
statement and we want to specify a condition we want to specify that that the job title short
is equal to data analyst now we can go into

27:51

either other condition such as greater than less than all that we're going to keep it
really simple for now now the main thing to note right is that data analyst is in quotes in this
case because it's a string character and it's immediately following that column whether
there's spaces in here or not that doesn't really matter I just do this for readability pressing
control enter we can see that we've now filtered in on this data and it only has data analysts
so previously we had around 30,000 rows now we have less than 10,000

28:23

rows and we're not just limited to categorical or that text data we could also do it for
numerical data so in the case of that salary yearly average column we could say hey we want
to have everything that's greater than $90,000 and then now on scrolling through it we're
can see that we do we have around 16,000 values that meet this condition now now that
we're adding all of these more complex conditions this adds a great use case of now adding
comments comments are denoted by these two dashes or tax if you're from the

28:59

military before something that follows it so anything to the right of these two dashes is
ignored no matter where those two dashes are placed and this is the standard practice of
documenting your code and if you do something complex keeping track of it typically a
comment would come at the beginning of a query to specify what's going on in here and in
this case it's right at the front executing this query you can see that it ignored it you can
even put put it after a SQL command as you can see here it's

29:31

grayed out so that means that basically showing to you that it's going to be ignored
and so in this case when pressing control enter still executed and now I have a
documentation for why I maybe filtered out this certain subset of data the other common
use case of this is if you're debugging or troubleshooting say I didn't want to care about this
column or this column right here I can then put those two dashes in front of it whenever I
run the query it only Returns the columns I don't have have the dashes in front of

29:59

it now the other comment besides a single line comment it's a multi-line comment and
it's denoted by a forward SL Asis and then an asteris forward slash that forward or backs
slash just forward slash we're good anyway this is a multi-line comment and it's commonly
used whenever you have to use well multiple lines but when you have to be more robust in
your description on what you're trying to convey let's say I wanted to convey to whoever's
going to be using this query why do I keep on using these four Columns of Interest

30:28

over and over again well I can leave a note section in here that says hey these are going
to be the most common you're going to see throughout these queries and here's the reason
why well they provide the most comprehensive view of the data and there's the common
factors that most people are looking at um we also don't want to call all the columns
because it will just take up too much processing anyway if we go and run this query we can
see that hey it is in fact shown and just to show the point if I

30:56

were to remove these op ators right here and try to actually try to run this query it
would give me an error near the note because it's going to try to execute this and things it's
SQL syntax so I just fix that up real quick all right and the last keyword that we're going to
cover before we get into some practice is order by as you guessed it this is used to specify a
column and order it by that value so let's go ahead and use this order buy is going to come
almost directly last the only thing that

31:27

would come after order buy is a limit keyword in this case let's say we want to actually
Aggregate and be able to see the salary from lowest to high pressing control enter it's going
to go ahead and execute now it's going to notice here that we have null values first and
that's because null means that there is nothing there for that value so it's even less than if
you will Zero but if we scrolled over some pages that I've done here we can see that now yes
when we have values inside of here it is

32:03

going from that low to high those values are slowly rising and in an ordered case now
this is lowest to highest which actually you if you were write ASC and go ahead and execute
this it's doing the same thing here it's doing it in ascending order but now we don't need to
necessarily write uh ASC um especially as it's sort of repetitive but if you do want it in
descending order from highest to lowest you would then specify DEC and executing this
query we're now getting it from the highest to lowest

32:40
this would be really good in this case we want to see what is the highest salary we can
now see it 650,000 now you may be curious what order should you be writing these
commands in well here's a little convenient little cheat sheet on what you should be
following for this follow it goes select from where Group by having order bu and limit now I
don't have these commands memorized I even asked chat gbt for pneumonic on how to
memorize this and it said super fast works great beneath the surface and has

33:11

outstanding balance and Leadership and I going to use that mainly we're going to go
with trial and error that's how I've memorized this over the years you're going to make
mistakes get out of order but you'll understand what the error message is and you'll adjust
from there all right now that we cover all of those Basics keywords now it's your time to go
in and actually practice this I would play around with all those different keywords that we
have also exploring all those different tables that you have

33:36

available right now now for those that purchase the course notes and certificates you
have some specific practice problems available to you um there's about five for this section
right now we'll be adding to this um and in it you'll have things like the solution and also
results to make sure that you're on track and actually following along and doing it correctly
all right good luck with these see you in the next one all right let's get into this section now
on comparison and also logical operators and you've been

34:09

exposed to this previously when we use that wear keyword before to filter for job titles
of data analyst we used an equal operator this is called a comparison operator we're going
to go over all the different types for this these type of operators are used after the wear or
even the having Glock clause in this section we're going to be focusing on the wear Clause
we'll move on to the having as we get more advanced now in addition to this also we'll be
using the logical operators which allows even more advanced functionality to
34:42

fine-tune how you want to filter data since we already understand the fundamentals of
the equal operator we're going to move on to the not equal operator and we can us use this
symbol of basically these two Pacman try need each other or just the keyword not and
specifies that it's not equal to All Right Moving in back into the data let's say we're get this
data right here and we want to filter it further let's say I have some Insider information that
says those jobs that we get from the job platform AI Tech jobs.net is unreliable

35:16

and I don't want to use it that's not necessarily true we're just giving it an example
here well I conclude that wear keyword along with job via and that comparison operator of
not equal to providing it what statement I want I meet on and then press control enter I can
now see that they are removed now we can also use that not operator and that's used
directly after the where keyword where not job via uh AI jobs.net this is sort of confusing but
this in this case is going to rep return all the job Vias of this so in this case if we

35:50

wanted to not use it I'd put that equal comparison operator in there run it and then we
can see that it now has this like like this so not is a way to do basically opposite of what we
want to do now besides something like the equal or not equal to operator for those that
have numerical columns we can use things like greater than or less than to actually look
inside of there and find a value and find things that meet that condition so in this case I'm
going to filter for salaries that are going to be

36:20

only greater than 50,000 it's also going to remove these null values whenever I do this
pressing control enter and see that it is in Factor move in case like this I would include
something like an order buy for that salary year average and I did this out of order uh
because I don't have these memorized and putting this in after the wear statement and
actually running it it runs correctly and now we can see okay now they are ordered in this
order and it's starting at that 50,000 in this case now you're
36:50

not just limited to greater than or equal to also you have things like less than or equal
to I'm assume you have familiar RT with how this actually works so we're going to skip an
example on that one and we're going to move into logical operators now starting with the
first logical operator and this is great in conditions where we want to meet multiple filter
conditions so the example shown if I want to meet a certain salary and job title I can now
use the and operator to combine this in this case it's only going to show

37:18

conditions where both of those conditions met are equal to true so it has to have a job
title of data analyst and it has to have a salary greater than 100,000 so let's actually test this
query out by plugging that in for that wear and specifying those two conditions I'm also
going to leave in that order bu after this so we can read it more easily pressing control enter
we now have it now remember it's greater than 100,000 not greater than or equal to so we
don't have any 100,000 values in this and

37:51

scrolling through we can see that like we expect it Returns the conditions that we're
trying to meet for this now conversely to that and logical operator we also have the or
logical operator and this would be used in a condition where we wanted to see if we meet
any of the conditions so as the example shows it could be either data analyst or the salary is
greater than 100,000 so we could have something like a business analyst that gets greater
than 100,000 or we could have something like a DAT analyst that technically has less than

38:24

100,000 replacing that and with an or and then still keep that order by so we can get
through this quickly I'm going to press controll enter and as we can see we have nothing but
data analyst positions for the null values and for values that are greater than $100,000 in
salary we have everything that's not necessarily a data analyst it could be a data analyst and
here's one at 225,000 so an or condition I'll be honest I don't use or as much as the and
logical operator now let's say I wanted to search for salaries between 100,000 and
39:01

200,000 technically I could use that and operator combined with those logical
operators to say hey I want something that's $100,000 or greater than $100,000 and less
than $200,000 but there's actually a better way of doing this and that's using the between
logical operator it's still going to be used within that wear statement in this case it's much
more readable and much more concise we can say for a given column we want between
60,000 and 990,000 in this case and this is not just limited to numerical data

39:35

you can also technically use this for Text data or even dates so I've updated that
previous query now to have that between statement specifying the values and using that
and statement to join it I'm still doing that order bu to make it easy to look at pressing
control enter bam we have everything between including that of which is 100,000 and
200,000 now the last logical oper to cover is in and this I find is more common in Text data
so the example shown job location if I wanted to search maybe Boston

40:08

Massachusetts and also anywhere I could do this with an instatement and then I would
enclose all those values whether it's two or even more so let's say in our case we wanted to
look for both data analyst jobs and also data engineer jobs written currently it's a little bit
too robust in this too much wording as we're repetitive with using job title short and then
combining it with that or statement it does however if we went ahead and execute it we can
see that yep data engineers and data analysts are

40:39

only in this but instead I'm going to modify this now to include this in logical operator
and then have data analyst and data Engineers within parentheses separated by a comma
and this is also great because now let's say I wanted to add something else like a data
scientist I can just add that in I don't have to type in the keyword again pressing control
enter you can see now we have data Engineers data analysts and data scientist all right now
it's your turn to try out these comparison and logical operators for those that have

41:10

purchased the course notes and certificates you have a host of different practice
problems you can work through and test your skills for that all right see you in the next one
all right let's actually combine everything we've learned from that basic section and the
comparison logical operator section into a more advanced query we're going to do this with
a practice problem that I feel would be applicable if I was actually job searching so let's say
I'm looking for roles I work as a data analyst and I

41:41

could technically also work as a business analyst so I want to look for both of these
rules and for this though I have some conditions for data analyst I only want to search for
jobs that are greater than 100,000 and I know from some market research business analyst
pays even less so I want to look for those jobs that are greater than 70,000 now in addition
to this I also only want to include locations in we'll say that I'm located in Boston
Massachusetts or I want any remote works I'll also include

42:10

any of those that include the location of anywhere which are remote jobs for this we're
going to continue to include those four main columns that we previously did so starting out
with our core query we have our select statement of the four Columns of Interest we're
want to focus on and then our from statement whenever I'm breaking a these queries down
I like to actually iterate through it so the first thing I'm going to look for and actually query
down on is the location CU that seemed like the

42:36

easiest I want to get Boston and I want to get anywhere so I add this wear statement
for this job location and executing the query control enter we can see now okay we're
limited only anywhere in Boston all right the next easiest thing to move on to is instead of
just getting both that analyst and business analyst let's actually go down even further I want
to get data analysts that are greater than 100,000 so I also have to besides meeting the job
locations I have to meet this new condition now I'm

43:05

going to be putting both the job title and then also the salary condition within this so I
can use parentheses to say hey I want you to meet all of this and verify this before moving
on to verifying the and statement and what do I mean by that well you remember by our
order of operations uh from math the parentheses are going to go first so we're going to
meet the condition first of is a de analyst and is it greater than $100,000 are both these
commission conditions met it's true and then in the

43:40

next case it's going to check the job locations whether it's in Boston Massachusetts or
anywhere so the parentheses help control this order now remember we also want to meet
this condition of business analysts and salary year average greater than 8 $80,000 so I can
remove this comment here now I could put an or statement after this so we me this data
analyst or this one but now we have this statement we're going to meet and job location and
this statement and then it's going to be an or for this one so if I were to

44:15

execute this because of this or statement now I have this business analyst right here
and it's in D Texas and it doesn't meet our condition because I actually wanted it to be either
in Boston or anywhere although it does meet our salary condition anyway I'm going to fix
this by including this or statement now between the data analyst and the business analyst in
its own parentheses to further specify hey I want you meet these two end conditions or
these two end conditions then from there once that's met then verify the

44:47
job location let's check this out and now we have everything we want looks like I have
a typo here I did 880,000 earlier I meant to say 70,000 from our instructions we had 61 oh
now we have 64 jobs available anyway this me you the condition of those data analyst or
even business analysts anywhere with our salary conditions the key thing to understand
from this is you have to iterate through this query I wouldn't never expect you to just write
this out all in one shot and then try and test it you're going to run into so many errors

45:21

iteration is the key to success for this all right in this section we're going to be covering
wild cards and wild cards are used to substitute one or more characters within a string we
can use this by one using this keyword of like along with some special operators like the
percentage sign and underscore oh and all this is used within the wear Clause let's jump into
some examples and the first wild card to cover is the percent sign which in this case shown
here we're searching for analyst and this percent sign symbolizes zero one or

46:01

more characters so let's say I add some white space or some words before analyst or
after it would account for that and only look for those words of analyst all right so previously
we've been focusing on a job title short column and this is just a shorten version when we
actually look at these job postings more in detail we get to it further we'll see that the actual
job title and a list in these is a lot longer and sometimes there's a lot of fluff and necessary
data in it but it could be a condition

46:30

for this where we want to actually filter down on something so in this case we have
this business data analyst and also data analyst let's say we actually wanted to filter down
for anything that says analyst within the job title column well adding this wear statement
and then specifying the column using that like keyword and then I can put that as
percentage signs before and after running this now every in this job title has analyst within it
now it's important to understand right I would want it
47:04

before and after let's say I removed it to the ending in that case it's not going to find
anything with values after the analyst so if I run this I'm only going to find analyst on the end
of all those statements and I want to add that back because I actually want to see all the
different analyst rules now another way this could be used is let's say in this case right here
where I have this it's classified as a data analyst but the job title is business data analyst I
could put something instead let's say I

47:34

wanted to search for business analyst roles I could put in business percent sign and
then oops that's not a percent sign and then analyst running this now I'll get things like that
will meet the business data analyst that may be categorized as dat analyst maybe
categorized as a business analyst but it picks it up now the percent sign symbolizes 01 or
more characters let's say I just wanted to represent a single space in between something in
that case I would use an underscore going back to in our example instead of having this

48:09

percent sign here I'll put an underscore in this case I only want to find those that
directly say business analyst so I'll expect to see this one in there whenever I run this query
pressing control enter yep 343 I can see that anything has business space analyst I meet this
along with anything on the left and right using the that percentage sign all right now it's your
turn to go ahead and try this like keyword along with testing out those different Wild Card
operators of the percent sign and underscore those that paid for the

48:39

course certificate and notes you have some practice problems you can work through all
those that don't feel free to just try out and see how this actually tests out for these
different wild cards all right see you in the next one all right this short section is on aliases
and and I can relate to this cuz sometimes I want to be a different person and sometimes
even column names want to be different names this can be very helpful especially when
working with other people to pass off data with these really convoluted column names you
49:10

can rename it before giving it to them so let's say I want to use this table this core table
that we've been querying from the get-go in a presentation and I don't want to have this as
you can see a lot of it starts with job which is sort of redundant and then it's just the
nameing conventions are just really weird well at least weird for somebody that's not
familiar with this database somebody that's not familiar with this database just wants the
quick information about what it is real quick so I can go through and add these as

49:38

statements after these column titles right here and then they're now rename this
running this query right now pressing control enter I can see all of them are renamed now as
or alas can also be used not only for column names but it can also be used for tables and
frequently you will see whenever people post abbreviations for tables they'll use a single
letter or multiple letters at the start of the table so in this case I'm renaming it JPC and then
if I wanted to although not required in this case I could put it at the front of

50:12

these different tables right here and when I execute this query still going to work now
we haven't gotten into joining multiple tables and whenever you do this is a very common
practice so I wanted to make sure that you're aware of it before we get to that so we're
going to be doing that now the one last thing to note on this is sometimes you'll read other
people's syntax especially when it comes to tables they won't necessarily put this as
keyword in there instead they're just going to have a space in

50:41

between the two and it's still whenever I go to execute this it's going to go ahead and
work you can also do this for these keywords in here but this I argue makes it hard to read
but you will frequently see in this case here you will see it done without that as key keyword
all right now it's your turn to try out aliases I have a practice problem for you to go and try
out feel free to experiment around not only with those column names but also those table
names all right see you the next one all right let's now combine what we
51:13

just learned with wild cards and also that alias in order to solve a problem I want to
continue to build on that last problem that we're looking for data analyst or business analyst
roles but also we're looking for those that are not senior additionally we'll make aliases for
the columns but they don't make sense now before I attack a SQL problem like this I like to
break it down further into what are the actual steps that I need to look for for this so what's
common among this is or what's not common among this is we want to look

51:44

for data or business in it so that's going to be using basically an or statement we do


want to look for the word analyst in there and then we don't want to include things like
senior so let's jump in the and try it out all right so here's the core query that we're going to
be working with I'm going to go ahead and start with Alias this first I just rename the
location as location and salary is salary so let's just start attacking each one of these filters
one at a time and once again we're going to be needing to use that

52:16

wear keyword the first one we're going to be looking at is we're going to look at only
job tells that include either data or business so I built this or statement now that captures
that of matching the data or business and anytime we want to iterate through this to make
sure that we're on the right track press control enter it looks like okay we got Data Business
data Data Business okay so it looks like we're on the right track now we want to include
anything that would be basically data analyst or business

52:46

analyst so we want to meet this or condition and then basically the analyst is an and
condition so what I'm going to do is actually wrap this in parentheses cuz this is is the or
state we want to meet and then from there I'm going to go in and put in the one about the
analyst all right so now we have the analyst one executing this one looks like now I have
anything that has business or data along with analyst in it all right so finally this last one
don't include any Jes with senior followed by the character so this would

53:23

be another and statement here and we're going to be still searching job title and for
this one we don't want to use that is like senior we want to match basically a not condition
and so for this we can put not right before like to say not like this um do we have any seniors
in here oh we got this one right here so let's see if this one's still here after we refresh this
query and Yep looks like it was taken away so we're meeting all those conditions all right all
right we got a query now let's jump into learning some

53:59

more about operations all right let's get now into operations or more specifically
arithmetic operations we'll be using operators for this like addition subtraction and even
things like modulus and for this we're going to be using the invoice database which is located
at this URL as a reminder this is a fictitious data set on data science invo voices throughout
the year of 2023 we're using it for this because this these columns that it has in it are really
good for doing arithmetic operations on

54:36

now we have a column specifically on hourly rate assigned to a specific data nerd in
this data set and let's say that our accounting department is actually trying to figure out
whether they should raise these rates or lower these rates so we need to provide them with
a data set with these updated rates well we can use something like subtraction ction or even
something like addition in order to increase that rate we can use this within the select
statement so this is the core query we're going to be working

55:07

with in this case and we want to see what would be the hourly rate if we say dropped
it by $5 or if we raised it by $5 so to make sure I'm doing this operation correct first I keep
that core one that we want to change in here still and I'm going to rename that as rate
original then from there all I'm doing is I have this hours rate and a minus5 and that's going
to go through in every single column subtract five and then I'm renaming this as the rate
drop so let's actually execute this so we can see now

55:41

that it's going through and it's executed this for a rate drop subtracting all these
different columns by five conversely I can add this last statement here for the rate hike of
adding $5 and we have that now of now this band or whatever we have this whole table
made we can send this all to accounting for them to Crunch their numbers now I
demonstrated this operation use inside of the select statement but we're not just limited to
that it can be used in a whole host of keywords including things like the wear

56:12

order by and group by that we've used previously and even more that we're going to
use in the future so let's actually look at a case of how we can use this even outside of that
select statement and for this we're going to be using the multiplication operator figuring out
basically a total pay using hour spent times an hourly rate so going back to that core query
that we have let's say that accounting comes back now and they say hey we want to only
have the projects that are going to cost a total

56:42

of $1,000 and more after the rate hike and this project total after the hike is going to
be equal to that rate hike times those hours spent we don't have hours spent right now so
I'm actually add add that here so we can see it pring control enter got now the hour spent
we're going to do it times the rate hike I'm also going to remove this rate drop because we're
not concerned with that right now okay so we want a filter for this so we're going to make a
wear statement and we want this where the

57:15
rate hike times that hour spent is going to be meet this condition of greater than 1,000
okay executing this query looks like it's going through usually I like to double check
something like this whenever I'm doing a calculation so I'm going to go ahead and add this
up here where I have this rate hike of hours rate plus 5 times hour spent as the project total
go ahead and execute this to actually see what it is and scrolling over can see that hey yep
everything here looks like it's greater than a

57:47

th000 now as a best practice I want to go ahead since it's defined here replace it down
here and then now executing it we still have the same thing and it's more concise and easier
to read all right next up is the modulus operator and this operator is used to return the
remainder available after a division specifically let's say we have a number of hours say
something like 10 if we wanted to find out how many hours past eight that they worked we
could take 10 and do the modulus of eight and this

58:23

would give us two that's showing that somebody worked worked 2 hours past this so
let's say that accounting comes back to us and they want to do some analysis mainly on
projects that take between one in 2 days they want to find all the projects that don't
necessarily exactly end within 8 hours or 16 hours so the modulus so the first thing I want to
do is actually visualize this and see what this is going to look like so pressing control enter I
have now this hour spent modulus 8 and assigned to this extra

58:54

hours scrolling over a little bit just to double check my work we can see that at
whenever it's 14 we would expect 8 goes into 14 once and then it has a remainder of six so
accting wants to analyze these where these conditions are true or where the modulus is not
necessarily zero so from here I can just put this wear statement into parenthesis at an end
and then from there Define when extra hours are basically not equal to zero or greater than
zero and we got it all right now it's your turn to give
59:29

it a try we have some practice problems for those that purchase course certificate and
notes you can go ahead and go and try out these different operations within the query all
right that see you in the next one let's now get into aggregation functions and after using
those arithmetic operators previously probably notice that yeah it's nice to be able to do this
arithmetic operation across a row but what happens if we want to do it down meaning on
we want to sum a row or even count up an entire row on how many

01:00:01

values it has in it well this is where aggregation functions come in we have things like
sum count average Min and Max now I commonly use these within things like the select
statement but they can also be used with keywords like group by and also having for group
by we can actually aggregate depending on what value is selected a certain subset in order
to sum it up or even average it or maybe we can also use things like the having function
which we've been talking about from the get-go and finally

01:00:30

getting introduced to which allows us to actually filter data by an aggregation for this
portion we're going to be using the job postings data set specifically we're going to focus a
lot of time on this salary column and doing a lot of aggregation methods to it and the first
aggregation method to talk about is sum and this is a function that allows us to inside the
parentheses next to sum place things like a column name and from there it will Aggregate
and actually sum up all those values so if I want to sum up

01:01:02

all those average yearly salaries I could do this and it also give it an alias of salary sum
pressing control enter you can see we have over $2 billion worth of salary and job postings
here next aggregation method is count and similarly to sum you're going to place either a
column value or in this case you can always do something like a specializ operator the
asterisk or Star to actually select all the different columns so in our case say we wanted to
see not only what is the sum of all the salaries but we also wanted to see what
01:01:38

was the count of all the different rows I could do a new line sending in count specifying
that star symbol and then put from there as count rows from here pressing control enter it's
going to aggregate the same now I have the salary on the left and then the count of all the
RADS on the right now I can also use count in conjunction with another keyword distinct in
order to filter down on distinct values let's say in our case we want to get a distinct count of
all the different job title short values

01:02:11

that we have in here say count then add distinct and then from there the actual job
title short and then I probably name it something appropriately like job type Turtle pressing
control enter we can see that we have 10 different job types now besides sum and count the
other major types of aggregation functions are average Min and Max and as you expect they
work all the same we're placing the column value that we want to average inside the
parenthesis and then for there it's going to provide either

01:02:42

average Min or Max so I could use that average column across that salary year average
column and from there we're going to get this pressing control enter sign an Al an alias of
salary average we can see that the average salary inside this column right now is around
123,000 now I can take this a step further adding we on to this and then from there
specifying the job title short column of data analyst and from here pressing control enter we
can see that oh unfortunately that analyst the average salary is little bit less than

01:03:17

the real average of uh only $93,000 and if I want to see the spread of the data analyst
salaries I could add in those Min and Max functions in order actually go through and do that
and there we see we have a Min of 25,000 and that Max of 650,000 now we saw how that
average salary went from 125,000 down to whenever we filtered it just for data analyst
down to around 90,000 so we want to dive deeper into this well this is a great way of using
the group by keyword and this allows us to specify a column of interest in this
01:03:51

case we're going to do the job title short in order to now filter down further and in our
case actually be able to see all those different job titles and see where the disparity is so I'm
going to get rid of this where statement in that case and I'm going to add Group by
specifying job title short column now I can run this but um it does group it but I don't know
what these groupings are so we need to add that into here so I add it in at the top as Jobs
pressing control enter we now have this in and we

01:04:25

can now see the aage average men and Max I'm going to throw in a quick order bu in
order to group this by the average salary and as expected with that average being around
125,000 we can see that machine learning Engineers are middle data analysts have some of
the lowest whereas senior data scientists and Senior Engineers have some of the highest
now another popular keyword besides Group by that need have inol belt is having and
having allows us to use an aggregation method to then filter by unfortunately this one my
complaints

01:04:57

were sequel where you should be able to do this inside of the wear keyword but it
doesn't allow it so let's say we want to only analyze this further but for job postings that
have basically a certain amount of values if I insert into here I want to find out how many
actual counts of these different jobs they have first so we're going to add that into here so
I've added this job count column in here to basically total up how many there are in there all
right so actually looking into this list further we see we have

01:05:24

quite a bit for some of these but then something like Cloud engineer we don't really
have a lot of values in it and it may be skewing the data so let's say we want to exclude
anything that has less than a job count of 100 well we can insert a having keyword and this
needs to go between group buy and Order buy and I can specify it in this case hey we want
to do this for the count of job ties shorts that are greater than 100 pressing control enter
that cloud engineer disappears is no longer one of

01:05:55

our problems now you may be like luk can't we just use a we using this job count cuz
it's no longer using this aggregation function well removing this having function right here
and then keeping in that we when I actually go to run this I get this error there's a misuse of
aggregate count you can't use it inside of wear it's still trying to do this aggregation inside
the wear that's what we have to use having all right now it's your turn to dive in and try
some of these practice problems we have for

01:06:21

aggregation functions for those that purchase the course certificates and notes a lot of
different ones to test out and try and go through all right that see you in the next one let's
get into a practice problem combining what we've learned previously with the aggregation
functions and also with those arithmetic operations specifically we're going to be going back
to that previous problem where I was talking about the accounting department asking for us
to provide numbers around what would happen if we

01:06:51

would increase the hourly rate by $5 and so for for this we're going to calculate not
only the total earnings for a project but then also a theoretical situation of what would
happen if we increased it by $5 so previously we were taking that hourly rate displaying it as
that rate original and then also doing the additional 5 to show that rake hike right now I also
have that project ID right there conveniently that we're going to be breaking it down further
by so let's first go at tackling that first

01:07:21
problem of calculate the total earnings per project which we're going to be using our
spent times hours rate here I'm going to first start by just adding in that hour spent because I
want to keep track of it to make sure I do the calculations right from there I'm going to use
the sum function in order to sum up the hour spent times that hourly rate I'm going to
rename this project original cost from here I'm going to iterate through this so I'm going to
add this up right now now there's a mistake

01:07:49

in my SQL query if you haven't caught it yet as you can see we should have multiple
different project IDs right we're trying to sum it based on that project idea that's at least
that's what we want so we need to put a group ey in there and now I'm specifying project ID
doing this bam I have this for all the different project IDs and doing some rough math that
you always should do whenever doing any of these queries to make sure it's doing correct
looks like it's doing that hour spent times rate

01:08:13

original to get that project original cost now I add in this for that projection of what it's
going to be if we were to increase that hourly rate by $5 and putting it within that sum
function and then labeling it as a project projected cost now with this we can actually see it
but there's a lot of data here I'm going to go ahead and actually cling this up to actually
make it more visible and control enter now we have all of the different comparisons right
here in here all right now it's your turn to give it a

01:08:50

try this short section is going to be going over null values and we previously
encountered n n values whenever we were looking at the salary column of our job posting
data set and we filter filtered ascending so from low to high those null values appeared first
a null is a field with no value and that differs from something like a value where it's zero
because that does have a value in it although it is zero or when we're using some sort of
string character maybe in it that it contains like a space that
01:09:24

still would not be a null value because something is actually there taking up bytes of
data we're going to be using this with the wear and having caused in order to filter out and
look at this type of data so going back to that previous example where we're actually pulling
those common columns that we have then ordering it by that salary we can see that said
null values are appearing first so we can use that we keyword in order to filter this so I can
specify that the salary yearly average is null pressing control enter yep it's

01:09:54

still null or I can change this to more specifically is not null all right told you this section
was short on null values now I do have a couple practice problems for you now to go in and
try it out all right see you next one all right we're coming almost to the end of this basic
section on SQL queries specifically we're going to be focusing on this one on joins now these
are the four most common types of joins and we're going to be diving into each one of these
separately talking about what their use cases are and an example of

01:10:30

each so if you recall back from the beginning of this video when we first introduce that
job posting data set well the majority of the time in this section we've been primarily focus
on quering this job postings fact table right here and there's actually other tables associated
with this in the database and right now you have access to all these different tables that we
just spoke about inside this job 2023 database which I go ahead here and I actually just
showcase each of the different ones quering into it one quick

01:11:03

note you may be wondering why in general would you even have these different tables
well in the case of that skills dimensional table we can see we used a skill ID to relate it and
we have a skills column and a type column we could technically have this all of the skills and
type inside the jobs fact table but this is going to be very repetitive also in the case that we
have the SK skills of job table or sorry skills job dim table it allows us to be able to Aggregate
and put in more than one skill so there's multiple reasons
01:11:40

why you would actually have tables external to another table so the first join and by far
the most popular join that I find myself using is a left join and what this is going to be doing
here is what it's trying to Showcase with A and B are two separate tables and in it for this it's
going to whenever we use this left join it will return all of the contents of table a and then
whatever we're matching A and B on it's only going to return the contents from B that it
matches a on so going back to quering

01:12:12

that job postings fact table I have on here the job ID job title short and then Company
ID so we have a lot of different job title shorts and I want to see what the company ID or
what is that company it is if I want over to that company dim table as we're quering it here I
could see what it is but it's over here I want to combine it with that fact table so the first
thing I'm going to do is add this keyword of left join and then specify the table company dim
have a little typo right here and we're going

01:12:43

to then give it an alias I don't have to type this dim every single time now in addition to
this we need to specify how we're going to be actually connecting these two tables we need
to use the company ID from this job posting facts table to the company ID of the company
dim table so putting that in there and then going ahead and executing it they're now
connected but I haven't really brought anything over so now I can add that column from
companies now remember this is why I previously talked about you can put the table name
in

01:13:18

front of a column and this is very important to do in order to make sure you're not
confusing different column names between the two because you can see from here
companies also has a company ID name but more importantly we want to get that company
name so I have it here included as name from here I'm going to go ahead and press control
enter to load this all in and so now we see we've now used this left join in order to join that
company name along with those job titles this company ID isn't really necessary at this point
nor

01:13:52

did you need to actually do it from the get-go I was just doing it for illustrative
purposes I'm going to go ahead and run this and actually remove it and then also going in
and go in to renaming those column titles okay pretty cool left joins now for those that
would like a more illustrative view of how this is being done that job postings fact is that
table a that we're joining we're keeping all rows in it so it's that yellow and then we have
that company's DM which is that table B in our case we have all the different

01:14:19

companies available here so whenever we put these two together we then have that
company name right available inside of a results table all right and as you guessed it if
there's a left join there's probably a right join and this is going to be somewhat completely
opposite we're going to be basically for the B table we're going to say hey we want to match
all the contents of this and keep all the contents of this but anything that matches from that
a table we want to join it now similarly we're

01:14:47

going to be connecting backwards we're going to be connecting the company's table to


the job posting fact table the problem is though is there's more records within the job
posting fact than there are in the company's table so only a subset of data is going to be
returned so going back to that illustrative example from before of merging those job posting
facts to the company table if we're using a right join in this case it's going to do a very similar
thing now the thing to note is the company's

01:15:16

dim table only has one occurrence of Netflix or meta Experian whatever it may be and
there's multiple in the job posting facts what's going to actually happen is when we get a
results table there's whenever we join this up we may have that company appear more than
once now so in this case I can go ahead and just replace that left one with a right press
control enter we'll see that we have right now 33585 rows whenever I do it again this one's a
little bit longer takes a little bit more longer time to

01:15:47

actually compile and we have it back with all those previous roles so you may be like
Luke what's the purpose of this right join then if it's basically the exact opposite thing of this
left join well let's say for example in this case if we had more companies listed in that
company dim table than we did in our fact table itself well in this case we could spot those
irregularities by allowing us to now do this right join and ensure we are actually providing all
the key details of that company table so

01:16:19

although not necessarily common all the time it does come up from time to time so
you should be aware of it all next up is inter jooin what this is going to do is when we have
an A and B table whenever we go to combine them it's only going to return the contents
that appear in both so if there's something in a that's not in b not going to appear in a and
not going to appear in B so let's say there's a scenario where I want to look at jobs but I only
want to look at jobs where a skill is available this is

01:16:49

where inner join is going to come in so this is a fictitious making of our data set but we
have the job postings fact table along with the job IDs and let's say that this skill job dim
table is connected to it and let's also say it's an order by that job ID there aren't any
necessarily skills associated with this job ID of three and four so we don't want to necessarily
see that now this combination of these two tables is a single inner join in itself but as we
talked about earlier there's actually

01:17:21
another table so we need another inner join in order to do the same thing of filtering
for only skills that are available in this case whenever we look at these skill IDs we can see
that 1 three and two are used 1 two and three which correlate to python R and SQL so we're
not going to return anything that has Scala or Java in this case so let's actually build this
query out working from that left to right the first thing we want to do is an injoin on that skill
jobs Dimension table which we're going

01:17:50

to be connecting on that job ID so I specify in join specifying the table table of skill jobs
demm I've shortened that table or changed the name to skills to job and then specify what I
want to meet it on I want to meet it on that job ID of both of those pressing control enter
making sure that it works still working and in now in this case we're seeing that in the case
of this job Ida machine learning engineer they have quite a bit and we're actually whenever
we get into job ID number one we skip

01:18:22

over it's no longer included because apparently it does have any skills associated with
it so we're already seeing good that we're meaning on what we need to do and if I want to
actually see all of those different values or those skill IDs that I see I can see okay that's why
the machine learning engineer has so many and then moving on to the business data analyst
but we're not done with this because we also want to make sure that we have everything
only including those skills inside of here that are relevant to the Javas

01:18:49

table so we want to also do an inner join with this table so I'm going to go ahead and
add in that inter join specifying the skills dim as a skills table and then specifying that hey I'm
matching the skill ID of these tables I'll also go ahead and add the skill now pressing control
enter to go ahead and execute it bam we have now the machine learning engineer with a lot
of different skills that it's requiring um business de analyst and so on so inner join along with
that left join are two of the most popular type typ of joins
01:19:25

that I'm going to be doing on a frequent basis depending on whether I need to one
return all the contents of a table or two for the inner join only return those where it meets
in both those tables the last join to talk about is a full outer join basically if you're going to
have two tables whenever we join these together it's going to combine them together no
matter whether there are matching ones on the a table or the B table frankly as a data
analyst I never have a need for this and don't really

01:19:54

have an ex example use case for his go through and I think it's sort of a waste of your
time so we're not going to really focus on this but mainly we're going to just have it in the
back of your mind to know that this is available all right now it's your turn to give it a try if
you've purchased the course notes and certificate you have some practice problems for you
to go in and jump in and try out with a inner join and a left join all right bet see you the next
one as we're coming to the end of this

01:20:21

basic section there's a concept that you need to be aware of of especially as you
become more advanced in SQL and that is the order of execution of SQL queries now
whenever we send a SQL query via this flow path right here the statement itself into a
database it goes through a few different processes before it then goes and actually is
executed this execution is broken down into multiple steps once this is done within the
database itself these results come back to you now if you notice we have this

01:20:54

step first of the this parser now I don't want this to be confused with an earlier concept
that we talked about that handles in the earlier steps of this in between the parser and
Optimizer and that is the order to write commands as we've talked about multiple times you
have to go or you have to write this query syntax in a certain order you can't have a where
statement at the very beginning has to follow that from statement once our SQL query
we've built has meet these conditions and has gone through that Optimizer it then gets into
01:21:25

into the execution phase and that's where the order of this execution right here is
comes into importance and it follows this general order that we're following right here now
you don't need to have this order memorized mainly I just remind you of this because it's
going to be important one day down the road when you're writing a SQL query well this
order is actually very important because from the perspective of the database this ensures
that it's processed efficiently and logically right now we're working with a

01:21:53

relatively small database and every time you quer it notice it probably returns it in less
than a second but if you get it to billions of rows of data this can take seconds upon minutes
and by understanding this order we could potentially early in this phase filter down our data
in order to speed up this query so just keep this tool in your tool belt whenever you're
approaching a very complex query which we're going to be doing in the advanced section
and also within the portfolio project if we

01:22:21

need to speed up a query we're going to start here by looking at at the order of these
and working earlier up in this sequence to filter down and make our data as small as
possible to speed up those queries all right that see you the next one all right let's wrap up
this section on the basics by doing a practice problem and for this we're going to be focusing
on the joins in order to build a more complicated query using left joint specifically I'm going
to find something I feel that's pretty interesting for a given skill I want to

01:22:58

find the number of job postings for the skill itself and also the average salary as
expected we're going to be using that left join to combine our skills table to our job postings
table also have a few hints along the way go help us build upon this process let's jump into it
first thing I'm going to do is query the skills table or the skills dim table rename a skills and
then from there just get back all the different skills that we have available so right now we
have 225 skills let's build upon it further
01:23:30

after now that we've gotten the skills from the skill gy table I want to count how many
job postings mention each skill from the skills to job to gym table so the first thing I'm going
to do is add in this left join in order to go in and combine that skills gy uh skills job dim table
which I've renamed skills to job linking it on that skill ID now going to that table we can see
have a skill ID and the job ID so with this we can get a count of the jobs in there okay and we
have this back but we now we only have

01:24:03

it for one python because now right we're using an aggregation function so we need to
group it so I'm going to group it by the skills column and then we have this along with those
number of job postings all right now following along my plan of action we've now have the
skill names the number of job postings I want to now calc calate the average Yer salary for
the job posting associated with each skill so we need to bring in another table in specifically
back to that jobs fact table in order to get that salary that we need so once

01:24:38

again I do another left join bring in that job posting facts importing it in as job postings
linking it on that job ID making sure that it still executes it's running fine so now we have
access to that average year salary so I can now go ahead and add in the aggregation function
for that salary year average I renamed it average salary for skill and we'll go ahead and enter
and I forgot a comma so we'll go ahead and add that and press control enter and Bam we
got it all right now anytime I get any type of

01:25:15

results like this um we pretty much have gone through our entire path um but anytime
we get um something like this we want to order it by something I find the most for me I
want to see it ordered by salary and for this I'm going to specify for the average salary for
skill I'm going to put it in descending order and Bam there we have it actually opening this
up all the way this is actually pretty insightful you get to see a lot of these skills especially
around web development are more higher paying and

01:25:46

then more basic skills like something like Microsoft list vb.net get out of here webx are
uh lower paying all right sweet now it's your turn to give it a try and after this this wraps up
the entire basic section and now we're going to be moving into more advanced concepts so
take a second to reflect on what you've learned so far you really have come a long way I
mean look at this we just wrote a 12 line query and hopefully if you're keeping up with this
you understand everything that's going on in

01:26:15

this that's pretty impressive all right with that see you in the advanced section then
nerds welcome to the advanced section of this course and this portion will be on about what
the advaned sections on and more specifically how we're going to be setting up a database
locally on your computer using postgress now since we're going to be have this database
locally you're going to be able to do a lot more with this specifically going be able to do
things like manipulate it creating altering dropping tables doing a whole

01:26:46

host of things and actually getting into the core of the power of SQL now also in this
section we're going to be obviously covering a little bit more advanced topics such such as
how to handle case Expressions subqueries CTS and even unions so it's a whole host of
things that I use from time to time and it's who of you to know so previously in the beginner
section we were using inside your browser sqlite viz and this actually loaded inside your web
browser the SQL database that you were then writing these queries to and executing

01:27:18

and then getting those results back now this tool is great especially when you're
practicing and learning SQL in order get it up and running quick without any setup but this is
not how you're actually going to be interacting with it in the real world so I wanted to
provide a scenario similar to how you're going to be doing it in a real world scenario so
instead of using this app we're going to be using popular editor option VSS code and VSS
code allows you to track all your different SQL files that you're using and then also connect

01:27:50

to a host of different SQL databases but we'll get to vs code in a little b as the more
important thing that we need to actually get installed is the database itself and we're going
to be using postgress with this which is an easy to download program which about to go
through and able to get it running locally on your machine now as a quick refresher why are
we using postgress for this well looking at the 2023 stack Overflow survey the results from
this concluded that postgress is one of the most popular options among developers

01:28:23

now there's a lot of other popular options in the datalux community especially things
like MySQL SQL light like you've already used and SQL server and the concepts that you're
learning in this course regarding specifically with the SQL syntax can be applied to these
different databases so even if you don't use the number one popular option it's still going to
be able to have and use those skills in other databases also last data point I promise before
we get into the downloading with postest itself

01:28:52

it is topping the charts at the most admired and also the most desired SQL database
technology that most admired data point is the proportion of users that use postgress and
want to continue to use it whereas that desired metric or the blue is the proportion of
respondents who want to use this technology and in both of these cases it exceeds all of its
competitors and so that's why I think it's a great database for you to get into all right let's get
into downloading postest and for this you don't have to have a lot of computer

01:29:25
it doesn't take up a lot at all and so you're going to navigate to this URL right here or
feel free to just Google postgress and it will take you right here anyway I'm going to click
download and from here we're going to select our operating system of choice I have a Mac
right now but I've verified that the installation instructions are exactly the same for a
Windows user so you're going to follow the exact same steps now I'm going to download the
installer now this has a few different options available and also I'm not sure why you

01:29:52

had to select the operating systems at the beginning cuz you're just selecting it here
anyway we're going to go with the most current version so 16.2 I've done this previously on
older versions like 10 so don't worry if you have an older version this still should still work so
navigate over to your download folder you should have some sort of installer package like
this and just go ahead and click on it it may prompt you if you're sure you want to open from
the internet yeah go ahead and open it now we're going to walk through

01:30:16

the setup process for a lot of these things we going to be keeping it it's default such as
this location right here and as far as installing all these different packages I do want them so
we're going to continue once again default for the data location next is password and this is
very important that you remember this password because this is going to have to be used in
order to access the database and even start it up so if you need to write down what you're
going to have for your password for the

01:30:41

port number don't change it if you want to change it from this 5432 make sure you
write it down as well for the local keep default I don't know what this is okay setup's not
ready to install so let's get to installing it all right looks like it's installed and it's asked if it
wants to launch stack Builder which can install additional modules to help you out we're not
going to need this right now so I'm not going to launch it up and I'm going unclick that check
mark So now let's launch postgress
01:31:07

and anytime you're wanting to actually query this database that we're going to be
building on here you need to do this so for some reason you restart your computer and
bring it back up you need to restart postgress anyway going to navigate into here this folder
postgress SQL 16 and go into to PG admin there's the admin dashboard of how you're going
to control it and it's loading up so I have it up and running over on the left hand side this
little pane is going to have all your different servers and

01:31:36

database and all the configuration with it and then right hand side sort of like think of
it like your editor where you can do a bunch of configurations anyway I'm going to go ahead
and click this open and I have an older edition of postgress well databases at least on my
system we're not going to worry about that if you only have one you should see that 16 we'll
go ahead and open it up this is where you have to have your password memorized so I'm
going to go ahead and put it in I'm also going to

01:32:01

click save password and then go Fed so now that I've done that it's going ahead and
started up my databases I can see this in here by navigating into databases and right now
there's only one these are the contents of the database there's only one database in there
and it's conveniently titled postgress anyway now Beyond this inside of PG admin is basically
beyond the scope of this course as you can come in here and actually run SQL queries to
thus interact with your database but then what happens if you want to use another

01:32:33

about database such as like MySQL or SQL Server you're not going to use PG admin for
this so we're not going to be focused on this for the remainder of this course just understand
that you do have to launch PG admin in order to get your database running with that let's
get into getting the code editor so we can actually run some SQL queries all right in this
section we're going to get into installing vs code which going to be our code editor in order
to run our SQL queries now what is a code editor and this is a location
01:33:04

that you can go and organize any type of code you may have in our case we're going to
have these SQL queries written out inside of a SQL file and so from there we want to keep it
inside of a code Eder now another term to keep in mind is integrated development
environment also known as idees Ides are basically a text editor on steroids it's debatable of
what exactly VSS code is technically it's a code editor that we're going to be using but
because of the additions we're going to add to it it functions like an IDE nonetheless the

01:33:35

terms are going to come up and I want you to be aware of it so once again if we go
over to stack Overflow to see what are the popular options among developers right now
Visual Studio code also spoken vs code is by far the most popular option on the market so
that's frankly why the reason why we're using it and I just frankly like it in general now VSS
code is a code editor for a multitude of languages but there are editors that you can use that
are specific to SQL and I want you to be aware of them in case

01:34:06

down the road you want to decide to use it specifically down here if we scroll down
data grip that's one from the team over at jet Brains it's a very popular option although paid
and another popular option that didn't make that list there but I know about it from the DAT
analytics Community is De Beaver this is is a free cross-platform database tool for developers
and it supports things like postgress an app like this I would say is actually more powerful
when it comes to running SQL queries as you can

01:34:36

do a lot more functionality with it in regards to SQL but then if I want to run other
different languages such as python or R I can't do it inside of this that's why as me as a
python user I stick with vs code anyway let's get into downloading and setting up visual
studio code so navigate to this URL right here where you're going to go and then download it
you can download it for either Mac or Windows I would get the actual stable version and
actually get it this installation process is a lot simpler as once you unzip that file that
01:35:08

it gives you in like your downloads folder you'll have the app available from there you'll
take it and you drop it into your applications folder from there actually navigate into the
applications folder and then start this bad boy up so getting into a quick overview of how
VSS code works over on the left hand side we have our activity bar and I have a few extra
icons right here don't worry about that too much but overall I find that I'm using this right
here this explore and this is going to display all my

01:35:36

different files on the Le hand side I can also do things like search through my files and
then even add extensions which we're going to get to in a second so let's actually create this
project folder that we're going to be working within so I'm going navigate back here to the
explore and click open folder I'm going to navigate to wherever I want this project folder to
be I keep mine within a developer folder and I'm going to add this new folder of where we're
going to title it I'm title it SQL project data job analysis I use

01:36:03

underscore in between this it's just for coding purposes you don't have to if you want
to but I like to now with this folder selected I'm going to select open now that we're inside of
this folder right here titled SQL project. jobs I can access or add a file by directly just
selecting this icon of adding a file and then naming it appropriately so I named this one test.
SQL and that's going to create a SQL file now we have the file located on the right hand side
and this is where the editor portion of

01:36:36

it is so I just wrote some SQL code just to Showcase that you can do this and it's
possible you may have the option down here or you should have the option down here to
select your language mode depending on what language you're using you'll want it to
actually check this if you don't happen to see this down here come down to this Bottom bar
down here right click it and then from there you should be able to select editor language
anyway we want SQL for this CU we want to check it to make sure that it's

01:37:05

correct if I were to select something like python it's well it's not really checking
anything but right here it's going to throw some errors now because it understands with this
yellow and this red under statement that this isn't correct python so I'm going to select SQL
and those errors go away anyway we can write SQL now in here but the problem is we don't
have it connected to our actual database so we need to set it up to where VSS code inside of
our editor now connects to this database that is running on our computer so

01:37:38

navigate over to the activity bar and select extensions for this you're going to go into
the search bar and look for SQL tools doesn't have any spaces in it so that's why it took me a
little bit to find anyway we're going to go on ahead and select this this whenever we click
install is installing an extension to thus supercharge this code editor of vs code to allow to
connect to SQL so we're going to go ahead and click install now if you read the front print of
this it has that to use SQL tools you'll also

01:38:10

need to install the appropriate driver extension for your database a driver is now used
to connect this extension to a database going back to the search bar I'm going to type in SQL
tools and then also postgress SQL and pressing control enter we have it right here this is
called SQL tools postgress SQL cockroach driver so this installs uh the driver for all these
different things and click install so I can go ahead and close out of this no longer need
extensions going to explore so SQL tools should be installed on here

01:38:42

right now unfortunately if I zoom out by pressing uh command back I can actually see
it it's located right here it's pretty important for me so I'm going to just put it right up there
anyway I'm going to zoom back in and some other things are going to hide anyway click on
SQL tools so it should have add new connection so we're going to be adding a connection of
this postgress this initial database in here anytime you want to add a new database you have
to use this or set up this add new connection which also can be done bya

01:39:13

this icon right here so I'm selecting add new connection selecting that it's postgress
this is just the core postgress database so I'll use the connection name of postgress I'll keep
all the default settings the way they are for database I'll keep it postgress for username I'll
also name it postgress it's going to be really easy for the password we're going to be using
SQL tools the driver that we've installed before that way to verify the credentials scrolling on
down I want to say test connection and it says hey this

01:39:43

extension wants to sign in using the driver credentials I will allow it and then from here
now we need to enter in that password from before that we access postgress with enter
again in press enter boom so successfully connected I'm going to save connection now all
right so closing out this now this postrest database is not the database you can see by this
database icon right here um is not what we're actually going to be using for this course this
is just a core one there I don't really want to mess with

01:40:13

it let's actually create our own database that we're going to be using for this course so
if we come up here selecting on the database itself and select new SQL file I can then use
this to run a query on that database now this SQL file is connected to that postest database
and I know this because if I come down here to this bar down here I can see that it says
postest and this is because of the SQL tools if you don't have this rightclick it and make sure
that SQL tools extensions is uh has a check mark next to it in order to show

01:40:46
it so that's the shows what database you're connected to that you're about to run a
query on that if you P this run on active connection it's going to do it side note real quick you
may see during this this press command I to ask GitHub co-pilot chat to do something start
typing dismiss just ignore that this is my GitHub co-pilot an AI coding assistant that I use to
actually write SQL queries but that's for a whole another video just ignore that for the time
being I may use it from time to time if I do I will explain it anyway

01:41:15

I'm going to enter in this SQL command right here of create database SQL course and
this is going to create a new database named SQL course so I can select run on active
connection or I can highlight whatever SQL code I want to run right click it and then from
there go into actually running the selected query which is conveniently on Mac the shortcut
command e command D you have to press it twice anyway I'm just going to highlight it press
command e it's going to say hey command e was pressed waiting

01:41:49

for second key of chord press command e again anytime you run a query this SQL tools
extension is going to have a popup on the right hand side if you're running a query asking for
a table back or some results back it's going to display it here for us we just created a
database so we're not going to actually see anything anyway I'm going to close this out now
this new database is created I can go back to PG admin I can come up here and I can actually
see that's created by right clicking postgress SQL

01:42:22

16 and click cing refresh now we have two we have postgress and we have that SQL
courses I need to go ahead and click that right now it was gray out and next out that means
it wasn't running so now it is running and we have this we're able to connect to it so we're
going to go ahead now and create a new connection once again we want to connect to this
going to go through the same process I'm going to name it SQL course as far as the database
name itself the name is SQL course for the username I'm going to
01:42:55

keep it the same of postgress then from here once again I'm going to test connection
it's asking it can it use the driver connections yes enter in the password successfully
connected I want to save the connection all right so I'm going to close out of this I also don't
need this create database anymore so I'm going to close out of this I don't want to save it
and then now we have this when we click it it's available down and here's the database itself
now notice now that I've clicked on it down at the

01:43:26

bottom I have SQL course selected and I could navigate between the two we're going
to leave it on SQL course for basically the remainder of this video all right we now have vs
code completely set up for us I don't really care about this test. SQL file anymore I'm going
to go ahead and delete it you also may see this vs code file this is like your configuration all
this kind of crap in there don't worry too much about what's in there and how it gets
updated just keep it there so this concludes the inst

01:43:53

on vs code we're actually be moving into actually understanding more about creating
deleting and dropping tables before that we need to cover some data types because it's sort
of a prerequisite to understand creating tables so that see you in the next one all right in this
short little section we're going to cover data types and it's really important to understand
data types because we're about to be setting up and creating tables when to recreate those
tables we have to specify the data type for each of the columns

01:44:28

now if you recall previously whenever we were working with our job posting data set I
didn't call it out specifically but the columns were already set up to handle a specific data
type the job ID was only a numeric integer so in this case it was specifi as int or integer the
job title column had strings in it and so that was characterized under varar job work from
home was a false zero value or True Value which is one so therefore this was a Boolean job
posted date was timestamp and then salary year
01:45:02

average could have decimals in it so it's not an INT it's going to be numeric anyway why
is this important well these are necessary when setting it up in order to have data Integrity
within your database now if we were to go and add data to a database if it did not meet the
conditions of being an integer to be inserted into something like the job ID then we wouldn't
be able to insert it this is really good at just having a first line of defense of having clean data
now additionally because of these

01:45:36

characterizations of these data types it also makes these SQL databases a lot more
efficient in processing queries it doesn't have to guess what the data type is inside of a
column for integers it automatically knows and it can process this data a lot more efficiently
now if I navigate over to the postgress documentation they have a whole host of data types
that you can look at we're only going to just focus on a few for us and what we'll be working
with only really think you need to focus on these

01:46:04

top eight that I use on a common basis there's only a couple details I want to discuss
about this so the first two int and numeric int is for an integer numeric is for a decimal so in
the case of this we would specify in parentheses Precision which is the number before the
decimal place and then scale the number after the decimal place that we're going to have in
this next is text and then varar I'm just calling it varar I don't know what other people call it
anyway text has an unlimited length for the

01:46:34

amount of variables that we can put within it or the amount of string characters we
can put in it I don't like to have just this type of Freedom so we're going to use varar mostly
with this n variable where inside of it you typically specify a value such as something like 255
255 characters that it's going to allow for the maximum length the next is bul and this is
going to accept either true false or null remember our case we had zero or one this is
effectively true or false and then finally those around dates and
01:47:02

times we have a date one where it's just date alone time stamp where it's a date and a
time and then finally time stamp with time zone where we can go ahead and specify even
further what time zone we're in all right so that covers the data types and this is important
because as you can see from the SQL query right here whenever we go to create our tables
we're going to need to go ahead and specify the data type that we want to make a column
this is going to be very important like we said for data security

01:47:33

and with that I'll see you in the next one all right let's now get into manipulating tables
specifically we're going to be creating modifying and even deleting tables all within that core
database that we created titled SQL courses now there's four main ways we can manipulate
at a table first one's pretty easy just create a table we're creating it from stratch next is insert
into so once we have a table actually inserting in Columns of data into it next is altering a
table we can add additional columns even remove them

01:48:09

change types and then finally just deleting a table whatsoever anytime you're doing
any of these type of statements you need to be very careful and double check what you're
about to do twice because once you do it you're not able to NE necessarily just click undo so
picking up from where we left off last in the course first inside of PG admin make sure that
your database is still running if it's not you just navigate into it and it'll start right up and
here we have SQL courses the other thing to verify is inside of VSS

01:48:39

code navigate over to SQL tools verify that that database is still listed there so it is SQL
course and also that's clicked on so that way it's selected down here on the bottom for
where we're going to be running all these queries or about to do all right so I'm going to
come up here and we're going to create a new SQL file in order to run our queries on for me
I notice this run on active connections may disappear or may stay up there don't worry
about too much about that the first statement we're going to

01:49:06

look at is actually creating a table the Syntax for this is relatively simple we're going to
be using Create table along with that table name and then enclosed in within parentheses is
the column names along with that data type and you can list as many column names as
necessary to specify of the table here's our situation we're going to create a table called job
applied and this is a table basically that we can track all the different jobs we've applied to in
the past when I've been applying for

01:49:33

jobs I've made a table similar to this usually in an Excel spreadsheet but we're going to
do this in Excel and this includes things like maybe a resume I'm using cover letter a contact
person at a company and then all those different companies I'm applying to so we're going
to create this table called job applied and then I'm going to put those in closing parentheses
around it with a semicolon at the end to signify the end of the statement from there we're
going to be putting everything that we need to

01:49:58

inside of it so anytime you have any type of database you usually want some sort of ID
number so we're going to start with the Java ID number specify that it's an integer we'll also
be specifying other portions of the column specifically application sent date custom resume
whether we've modified it or not we can include what version it is the resume of the file
name a cover letter whether it's sent or not and then the file name of that cover letter and
then just a status in General on what is the job search going on with that

01:50:26

specific job ID now that we have the SQL query generated let's actually navigate into
SQL tools and then we're going to look down this first I want to show if I navigate into the
SQL courses if I go down to this drop down schemas and then the tables there's nothing here
right now so if we create this table it's going to go into there now for some strange reason
this green icon indicating the active connection switched back to postgress I can also see it
down here um so I'm going to go ahead and just press that right here to

01:50:55

select it and now I can verify it's connected by this down here at the bottom also of the
SQL courses all right so let's run this query to create that table we can either do this of run
on a connections selecting that or you can highlight it all rightclick it do run select a query or
finally my favorite just pressing command D and then command a and then anytime you run
a query it's going to pop up on the right hand side right here promise you you'll have stuff
here when you actually start querying for data

01:51:24

back right now we don't so we're just going to close this out so right now there's still
nothing here inside these tables what you need to do is come to this icon right here and you
need to refresh them and now magically what appeared was the job applied table and I can
navigate into it and I can see all those different values that we created right here were the
same ones from this create table so if I wanted to I could just do a select all from java
applied select it all command e command e run it

01:51:53

and whenever we get a return back it says that there's no data available now that we
have this empty table we need to actually insert into it the data we need and we do this with
the statement of insert into and then from there include the table name after this we have a
parentheses and we enclose all the different column names that we want to include for
inserting the data now if we're using every single column name inside this table it's not
necessary to include this it just makes the assumption that you're using every

01:52:22
single one however you have to make sure that whenever you get to this values
portion that where you're going to be specifying the different values put in that you do have
the same order as the table or you're going R into issues so just for Best Practices I like to
include both the column names and then no matter what whether it's all values or not and
then the values themselves so in vs code I'm going to put in that insert into along with that
table name I want to insert into now I'm going to

01:52:48

specify all the different column names that i' specified previously next I'm going to
specify value and then start a parentheses to put all those different values in now here's one
entry notice it's enclosed inside a parentheses and then we have a one with a comma and
then the next value just's date custom resume resume file name cover letter sent and cover
letter file name and then finally status now if I wanted to do more than one entry I'm going
to just put a comma and then insert more in so now in this

01:53:16

case I have two three four five different entries put into this all right so let's now
actually insert this into our table and I'm going to go ahead and just select it all and then
from there command e and then command e again and boom get another popup not
necessarily anything there but let's actually verify we inserted into this table so I have this
select star from jaob applied to basically query it all make sure I did this pressing command e
command e it's popping up here I'm actually going to move it over so we can

01:53:47

see it fully on this screen so I have the jav ID the application date resume resume file
name cover letter cover L file if we used it or not and then the status all right so sweet we
now have created a table from scratch and inserted data into it let's not get into altering a
table we're going to be doing this on this jav appli table the statement we're going to use for
this is Alter table and then the table name then depending on what you want to do you're
going to put one of these four different
01:54:15

options that we're going to select from we can either add a column rename a column
alter a column or basically change its data type and then drop a column all right so we're
going to start off with that keyword of alter table and then specify job applied and we're
going to be creating a table of contact names for whatever company we're reaching out to
so I have add contact and then specify that it's going to be a data type of far car and allowing
only 50 characters just to verify opening SQL tools opening Java applied we can see

01:54:46

that there's nothing there currently for this all right so I almost select it all command e
command e and whenever we go back to it we should see after refreshing it now we have
this contact appear but now you may be like Luke we quer that table there's going to be
nothing in it and you be right so let's actually look into it command D command D moving it
over so we can see it fully so we have all this information and then for the contact name it's
all null now this is a bonus command we get to cover

01:55:17

and that's update update allows us to modify existing data within a table for this we
use the keyword of update specifying the name set the column name that we want to a
certain value where we meet a condition and usually this condition is you specify a value
within one of the rows so we'll start with that update job applied next we'll do set and then
we're going to be doing this to that contact column name and adding erck Bachman and
then finally where we're going to be just using that index column

01:55:50

of jaob ID so we want to do this for the Java ID of one now we have multiple rows to fill
in so we're going to do multiple statements in this case so I'm going to select this all
command e and then command e again and it's going to go in and update it going to close
out of this and now using that select star from java applied running this as well I can see that
inside of the contact it actually inserted all those different contact names into it so now that
we put these names into this column called Contact come to the realization
01:56:23

that this name is not necessarily appropriate instead I want to rename it from contact
to contact name this is where the rename column statement comes in after specifying alter
table we're going to go in and specify the original column name to the new one so I'll start
with alter table job applied and then rename column contact to contact name and then
quering this table to actually inspect the contents of it we can see now that contact was
changed to contact name now sometimes times now that I think of it names can

01:56:54

be quite long and in this case we set that contact name actually Let me refresh this
right here this contact name to 50 characters in length well I want to change this data type
now from varar to text where text doesn't have a character limit that needs to be specified
for it you can put any amount of characters into it so I use this alter column statement
specifying the column name and then type to then specify the new data type so once again
start with that alter table and then specify the name and then alter column

01:57:24

column name and finally specifying the type as text so running this command pressing
command D command D we have a success message and I find the easiest way just to check
this data type is come back into here and we'll refresh this and now we see that it is text
now I do want to call out there are certain limitations in changing this data so in this contact
name example it has string characters in it if I wanted to actually change this data type to int
and then try and to actually execute it by

01:57:53

pressing command e command e I'm going to get this column contact name cannot be
cast automatically to type integer because it has a string in there it's not going to be able to
make any of those that are letters into an integer so it's not going to be able to do this so
really it's important that you get this correct on the first time whenever building your table
for the first time and now let's say this column of contact names we decided we're not going
to go ahead forward with it because instead
01:58:17

we're going to be using LinkedIn profile information to gather this information and
we're going to be creating separate table for this so we don't need this contact name column
anymore in this case we can use drop column which is the simplest and also probably the
most dangerous in that you just specify the column name and remove it so opening up SQL
tools I can see that refreshing it that we have that contact name right now is a text so
entering in that keyword of drop column and then specifying that

01:58:43

contact name I can then see how we can remove it by running this query command e
command e okay refreshing this we can see that contact name no more and it's gone all
right last command of dropping the tables let's say that we're now tired of having to go into
SQL and insert our records using all these different commands we're going to just use a
spreadsheet software instead which I probably should use anyway anyway we can go in now
and specify drop table and then specifying the name itself be extremely careful anytime
you're doing

01:59:13

these drop columns or drop tables as this is very much permanent so specifying drop
table and then job applied let's go ahead and Drop It Like It's Hot bam we ran this refreshing
this query right now where job is applied is right here it's still there got to wait a second okay
it took about a minute for me on my computer but refreshing this now we can see that
nothing is here sometimes especially if you're going to be doing large changes to databases
or dropping them or removing them there's a lot of code that goes on in the

01:59:44

background besides these three simple keywords right here that we're able to do this
with so you got to wait a little bit of time all right now it's your turn get give it a try feel free
to walk through this example that I just took you through making up any relative values that
you would rather use instead all right with that see you in the next one all right let's now get
into loading the database that we're going to be using for this continuation of the advanced
section and we'll be also be using it as well in the portfolio

02:00:12

project section this data is going to be of the same schema using the same tables and
column names that we were using previously in SQL light F however this data set is a lot
more robust and includes a lot more details from 2023 and for those that are maybe a little
lazy and want to continue just stay inside of sqlite viz and enter queries from time to time
you can continue to do so but I can't guarantee you that any of the queries that we go
forward with will work inside of here I highly recommend

02:00:43

you just follow along what we're doing so how are we're going to do this all well three
simple steps first we're going to download all the data which is in CSV files and also SQL files
from there we're going to move into VSS code and create the tables using some SQL files
that I'm going to give you and finally now that we have all these empty tables we need to
load the data all into it that'll be our final step so let's jump into it so if you navigate to the
URL in the screen right here you're going to be directed to this Google

02:01:10

Drive where it has a couple of contents inside of it first is a folder with all the different
CSV files in it and these are conveniently named after the four different tables we're going to
be creating inside of our database and you can even peek inside of there and see where the
contents are overall it's just Comm of separate variables inside of it all right so back in the
main folder the other main file in there is the SQL load folder and it has the SQL files that
we're going to be using to not only

02:01:39
create our tables but also modify our tables pecking into it we can see that this create
tables one it's pretty long so now we're going to just download these now you can download
these folders individually or I actually just zip them into their own little zip file to make this
quicker for you to download so you can do either or the one issue with the zip file is you
can't scan it for viruses so if you're not comfortable with it just download the other two files
exact same thing all right so navigate back inside of a VSS code we're

02:02:08

going to be now adding those folders navigate into that project folder that you created
from previously you may have it open if not open a new window and it should pop under
recent and you can just select it there now I'm going to go ahead and uh select the two files
that we've downloaded I unzipped them and then I have them right here I'm going to take
them and I'm going to actually just drop them right inside of here it says do you want to
copy the folders or add the folders to your workspace I'm just

02:02:32

going to select copy folders over and it's going to place them right in I did add folders
workspace previously and it did something funky I want it inside my project itself so that's
why I did the copy over now since we added a copy just going to do some cleanup you don't
need this ZIP file or this original folder that I have right here I'm just going to select it and
delete it and remove it from my uh remove it all right so the first step is done now let's
actually get into creating these tables for the

02:02:58

database so back of vs code I'm going to navigate into this SQL load folder and I have
this one here already on create database if you haven't done this already you can go ahead
and execute this in order to run and ex and create this database that we're going to need for
this but let me just verify real quick that we have this set up so I'm going to click on SQL
course I logged out recently so it just asked me to verify my credentials and enter in the
password and then I see I have the SQL course database already so I don't need
02:03:26

to do this create database now as a reminder again make sure that we maintain this
SQL course connection established for this is where we're wanting to create all these tables
I'm going go ahead and close out of these so navigating back into the explore back into the
create tables in this one is four different SQL queries for creating tables we have one for
creating the company dim one for creating the skills dim one for job postings fact and then
the fourth one for skills job dim these four commands are nothing different than

02:04:00

what we learned in the last section for creating table the only difference now is we
added a keyword in here for specifying when we're using a primary key and then when we're
using uh something like a foreign key so in this case right here as a quick refresher on
primary and foreign keys so if we go back to that core database schema that we have job
post postings fact so we have this job ID in there and that's unique to this column or this
data set right here that is the primary key in job postings fact however in something

02:04:33

like skills job dim the job ID is now the foreign key conversely skills dim is the main
table when it comes to skills so that skill ID is unique value in there so that's the primary key
in this one and in skills job dim it's the foreign key so back inside of our SQL file we can see
that the job ID itself is that primary key in the job postings fact table and then in that skills
job dim table it is a foreign key then we also have to specify references specifying what table
is the primary key in along with that value we expected to

02:05:11

be for that primary key conversely that skills dim table has that skill ID as the primary
key and then the skills job dim has that skill ID Associated as a boring key and we also have
that company dim table which has that primary key and then Associates it into that job
posting fact table as a foreign key right there's two major last statements I want to go over
first is the owner we go through and actually establish the owner as postgress for this
because we've already set up our connections already
02:05:42

within vs code using postgress we want to make sure it uses this and then finally in
order to speed up performance we have this create index and we're not going to be using
this at all don't worry about it it just speeds up the queries a little bit more and it basically
specifies the different foreign keys and helps with actually aggregating or creating as an
index to speed up the query performance so now that everybody's all comfortable and
knows each other let's actually get into executing this query I'm going to go

02:06:09

ahead and select it all and then from there verify that SQL courses is in fact the
connection that I'm connected to and then from there press command D command D now
this is going to take some time like previously in order for these tables to show up navigating
over to SQL courses schemas public and then tables oh actually looks like they popped in
already and just do an inspection everything everything's looking good conveniently we can
also see based on this that like job ID has that primary key which is has this gold icon for a

02:06:43

key and then Company ID has this foreign key of this silver one all right we're almost
done we've now downloaded all the files created tables now we need to load this down add
it in to the tables that we just created so navigate back into our files looking at the explore
menu we're going to go into this one now of three of modifying tables this is going to be a
new SQL command that we haven't seen before and it consists of three things in order to
copy the contents into a certain table based on a certain

02:07:13

file so this starts first with the keyword copy and then we're specifying the table we
want to copy data into next is from we're going to then specify the file path location which
you're going to have to update all these for your file path location and I will too actually and
this is going to specify where the CSV is living and then finally we need to specify that we're
copying into here this CSV file and I'm going to specify this with delimiter and then
specifying what it's using to separate all the
02:07:45

different variables in there which is a comma and then the keyword CSV header header
is going to specify there is a header or basically column names at the top row of this data set
okay so we need to update the file location for all four of these so I'm going to navigate back
into the explore into the CSV files themsel we're going to start with that company DM what
you can do is you can rightclick this and then select copy path now inside of these
parentheses here I'm going to go ahead and just

02:08:18

paste this into here and then I'm going to do this for all the remaining being very
careful that we're using whatever the name of the CSV file is for the file path itself all right so
I've gone through and actually updated them all to make sure that they match I'm going to
go ahead and actually save it by pressing command s all right so now it's ready last chance to
verify that you have all those correct CSV files underneath the correct table names select it
all command e command D and this is going to take a little bit

02:08:57

of time to load all this data into it all right so that was about a minute for me for this to
all load I'm going to go ahead once again anytime I do any of these there's nothing really
appearing here so I want to verify that I got this data into it I inserted it in correctly I'm going
to just create this simple statement real quick of selecting all columns from job posting fact a
lot of data in there I don't want to query everything just yet so so I'm just going to limit this
to 100 values selecting it

02:09:25

all pressing command D command D moving this window over here it looks like all of
the data loaded into it so now we're cooking feel free to also go through and check any
other tables as well verifying that they all loaded but based on this other one I'm pretty
confident that all the other ones are loaded just fine as well all right we did it just created
our entire database from scratch using CSV files but then loaded in using some SQL queries
and it's all lited up all right next we're going to be jumping

02:09:56

into some date functions and uh some more advanced features so with that see you in
the next one all right in this section we're going to be going over dates and also times this is
a very critical component as an analyst be able to understand how to manipulate dates and
times because on depending where in the world you're working on data from you may have
to convert it now for for this section we're going to be working inside that table job postings
fact and specifically we're going to be working with a column

02:10:28

of job posted date this not only has a date but it also has a Time associated with it so
it's a timestamp value and this value in each one of these rows it correlates to the date and
time that a job was posted so for this we're going to be focusing on three main keywords or
operators in order to handle dates the first is how to cast timestamps as a date next is how
can we work with time zones and convert to different time zones and then finally we're
going to work with my favorite extract being able

02:10:58

to pull out things like year month out of a date all right the first thing we're going to
look at is how to cast different values or types to a different data type so in the case of our
posted date this is a timestamp and we want to if we wanted to cast it as a date we would
use this double colon which allows us to specify the data type we want to cast something to
this is typically used within the select statement in order to assign it to maybe a column
name and we haven't seen this before which you can

02:11:31

actually run queries without actually using a from statement to select a database so in
this case I can say just select a string value in this case I'm selecting the string value of a date
and I'm going to press command e command e to run it so right now it provided this query
of basically the string inside of it but instead I can actually cast it by using this double colon
and then from there specifying the data type date from here I'm going to just go ahead and
command e command e and run it and now

02:12:03

it has this as a date now to show this for values we can actually see inside of here that
we've actually converted to a different data type I did some other examples converting this
one two three string to an integer true as a string to booing and then 3.14 string to real in
this case we can actually see the conversion the iner it has this gray box around it Boolean it
actually has the green around it to signify that it's a true value and then same with the float
the gray around it so let's actually see

02:12:32

this in action we're going to get back into actually running queries on our database and
remember be sure that you have it set up through SQL tools that you're connected to that
SQL course that it's actually active and you have SQL courses as the connection down here at
the bottom so I'm going to go ahead and run this query and for it we can see that we have
the title the location and then the date and with this date we also have this time stamp
associated with it because it is in fact a time stamp value

02:13:00

let's say we only needed that date value from this column and we don't really care
about that time stamp well I can convert this data type of timestamp to date and we do this
by specifying that double call-in and then the data type of date running this all together
pressing command D command D we can now see that we have the title location and then
date it automatically cleaned it up for us and removed that time all right next up is at time
zone a keyword in order to convert timestamps to as you guessed it

02:13:34
different time zones now it can be used with timestamp data whether it has a time
zone specified or not as we refresh it from our data type section timestamp alone includes
things like the date and time whereas time stamp with time zone includes all that and then
includes either a plus or minus to adjust the time zone based on where it is now if we go
back in to see the data that we imported by specifically going to that job posting fact table
CSV that was imported into our table and then scroll

02:14:06

over to see the job posted date column I have this convenient coloring scheme so you
can actually see it but anyway here's one of the dates right here right it's just a uh where the
time stamps it's a date and then a time there is no time stamp on the end into this so the
data in our database does not include time zone information so here I have that similar
query from before where we're looking at title location but also that date time we're going
to go ahead and actually investigate what the column

02:14:34

looks like and similar to before that date time includes things like date time and we
want to pay attention to these top values right here these aren't going to change from query
to query on the top 10 values actually I'll just leave it to five we're going to leave these up
here and actually pay particular attention to these as we go through this example to
understand this more and still the same ones up here but remember that 1746 now if our
data came as a time stamp with time zone whenever we use this at time

02:15:05

zone keyword we'd only have to specify at once so you'd specify the column name add
time zone and then from there specify the time zone you want to go to in this case we're
showing Eastern Standard time now in our situation is different so it makes it a little bit more
complicated because we don't have time zone information we need to First specify the time
zone that this value actually is by saying at time zone and then from there use at time zone
again to specify the time zone we want to go to in this example we're showing from
02:15:37

UTC converting it to Eastern Standard Time so going back to that previous query that
date time these values here I know from actually collecting the data this value is UTC so I'm
going to first start by specifying at time zone and then specifying UTC and then we're going
to do it again to go to the time we want to go to let's go to Eastern Standard Time so now I
have at time zone EST we can see it fully here now Eastern Standard Time is 5 hours prior to
UTC so whenever I run this I should expect it

02:16:14

to adjust this appropriately and Bam we went from that 1746 to 1246 so 5 hours prior
now if you Google the postrest documentation on different time zones you can see that they
have a whole host of time zone basically every single time zone available for you to use here
in our case we were converting that UTC which is at 0 to that Eastern Standard Time which is
that -5 the last keyboard we're going to be looking at is extract and this is used to extract
things out of the date such as the year month or

02:16:51

even day this is used like a function within the select statement so I would do
something like select and from there specify extract function and then inside of the
parentheses specify what I want to get from the column of interest in this case I want to get
the month from the column name and then I'm just renaming it as column month so going
back to that previous query we have the title location and then also date time let say we
want to extract the month out of that job posting date I would first

02:17:20

start with that extract function from there i' would specify what the value I want from
it is then specify the keyword from and then finally the actual column of interest and then
we'll name this one date month selecting it all and then running it we can see we have from
this the month values from that date time so 977 43 I could even take it a step further and
add something like the year to this and from here command e command D we can see from
this we got the year into this column now you may be like
02:17:55

Luke what is this actually useful for well when we actually use this something like this
in combination with something like the group by function I could do larger Trend analysis
with SQL specifically let's say we want to look at how job postings are trending from month
to month so let's start with a simple query and build on it further I want to first start by just
getting things like the job ID and then the month from each of the job posted date columns
running this query to double check that it's working I can see that

02:18:28

I'm getting it right here now I want to aggregate it so I want to do a count of these
different job IDs for each month so I'll start by putting a count around job ID and then from
there add a group bu to then specify we're going to group bu this new month column
running this all command D command D we're getting these first five values now we're down
to 12 so I'm just going to go remove ahead this uh limit column and then run this again to
actually see all of them so bam it's shown us all the different

02:19:03

months and then all the different values now personally I only really care about data
analyst roles so I'm just going to take it even a step further and then use a wear Clause
specifying the job title short of data analyst and then just for a little cherry on top we're
going to do an order bu and then in this case we want to obviously do the count so just not
to be repetitive I'm going to rename this job posted count and then specify this down here
of job posted count okay running this all command e command D we can see

02:19:39

that this is ordered from low to high and I obviously don't like that so we're going to
just change this to descending and then rerun this query again command e command D and
then from this we can see that there looks like there's a trend that earlier in the year so
January feary March have higher job posting counts specifically with January having some of
the highest and then later in the year like December November and September are lower on
the list and this pretty much tracks with what I would expect all right now it's your
02:20:10

turn to give it a try I have a few practice problems aligned with not only using the
different datetime functions that we just went over but also aggregating it with some
previous functions that we used in order to look at things like salary and counts of jobs all
right with that see you in the next one all right in this section we're going to get into a
practice problem using what we just learned on the dates and also we learned previously on
create tables specifically we want to create tables for each month of these job

02:20:46

postings so I want all of the data for say January in its own individual table a little bit of
foreshadowing here we're going to be using all these different months in upcoming practice
problems when we go over even more advanced operations so anytime I'm doing something
like this I want to just start the very Basics so let's start very basic first with the query to
actually connect to the database so we'll start with this of Select star from job posting fact
and then I want to speed up these queries for the time being so I'm

02:21:17

going to just put a limit statement on the time being to speed it up all right so we have
all our information so the next thing we want to focus on is this job posted date remember
we want to make tables for every single month or at least the first three months January
febrary March so let's start by filtering this job posted date column and only get values that
have January in it so in this case I can use that wear statement and then we're going to
specify extract and we're going to use the function we want to remember we want

02:21:49

to specify that we're gonna be using month from this job posted date now we need to
specify what this condition actually meets of extract month from job date in our case when
it equals one so I'm going go ahead and select this all here and press command e command
e again and scrolling over to that job posted date column I can now see that we have
nothing but January dates in this column okay we go ahead and close this x I don't need
need this anymore now we're trying to create a table for this job

02:22:25

post to date of only January and remember if we go back to our SQL tools extension on
the left hand side and actually go into our tables folder we can see we have these four
folders right now so we want to create a table an additional one inside of here so first I'm
going to remove this limit statement and once again make sure that it runs it's going to take
a little bit longer to run and Bam we looks like we have around 92 th000 values the next
thing I want to do is now use a statement of create table we're going to specify the

02:23:00

name of this table so in their case January jobs and then we'll use the as alas to assign
this and just to format this better to make it look more appropriate I'm going tab this over
bam so that's how we'd want to do this and I'll put a little uh semicolon at the end of this so
we have this for January we also need this for February and March I'm going to show you a
little trick that I do this so I'm going to copy all this by pressing command C and then if you
go into any AI assistant in this case we're using chat GPT specifically

02:23:32

the chatbot for this course I can then specify in there to make this for other months so
I specify make this query for all months in the year and then from there paste in that SQL
query let's see what it does boom okay so in this case Chad gbt went through and did this all
anytime you find very rep competive work you need to jump into chat chv to do this now
remember we only need these three months right here and I have gone through and verified
that it does in fact use the right syntax chat BT coming

02:24:01

wrong sometimes so make sure you're always double checking it anyway I'm going to
copy it by pressing command C and then going back in here and pressing command V I also
went in and added some indentation all right let's create all these different Tables by
pressing command e command e okay so it looks like once again we got this blank screen
that the tables were made I can come over here and I'm I'm going to press refresh inside of
SQL tools and Bam there it is January February and March jobs are now inside of here so we

02:24:29

created tables using the extract function now I always like to double check my work so
I'm going to run this select statement right here where we select job posted date from
March jobs running this pressing command e command D we can see that all these different
jobs in here looks like we're around 64,000 different values and they're all for Mar so I'm
pretty confident that all the other values should be correct all right it's your turn to give it a
try we need to get these tables built inside of

02:24:58

your database cuz like I said we will be using this in some future examples with that
see you in the next one all right in this section we're going over case expressions and this is
very common where if we want to create a column based on a condition we can do this
through this case now if you have any experience with something like spreadsheets or even
python on this is very similar to an if statement where an if a statement usually have some
sort of condition whether you're trying to test whether it meets it true or false and

02:25:29

then from there it assigns it a value whether it is in fact true or false so let's go over the
basic syntax on how to use this and commonly it's used within a select statement and that's
we're going to show here but you can use it in a whole host of other things such as where or
even Group by to list an expression to satisfy by case we start out with case and then we end
it with end and at the end you can also use an alias so we can have in this case as column
description so that's we're going to

02:25:57
name that column as whatever we build with this now this is going to go in logical
order so for the first one of when column name equals value one we're going to check
whether it does meet that and if it does then we'll provide it this value which will go inside
that column description column from there we'll move to that next line of when column
name equals value two and then description for column 2 and if it doesn't meet any of those
two conditions it will meet the else condition of then other and that will be what it will be

02:26:27

assigned so let's start simple with this query where we're just going to look at the job
title short column and also the job location column so let's look at a condition where we
would maybe want to reclassify where a job is located at and for this we need to look at the
job location column so I'm going to go ahead and run this query by pressing command D
command D and inside of this job location column we can see we have things like either like
a state city or even a country City and then we even

02:26:57

have things for like remote jobs where we specify that the job location is anywhere so
for the scenario I have three different conditions I want to look at I want to create a new
column and let's say I'm job searching I'm located in New York and I want to label things like
anywhere jobs as remote if they are in New York I want to label them as as local and then
otherwise I just want to label it as onsite that you have to be in that location so I'm just
going to disregard them so with any case statement I'm going to start with that

02:27:30

case and then I'm also have the end and the Alias we're going to use for this one is
location category now let's go for that first statement of for anywhere we want to label it as
remote so when job location is equal to anywhere I'm going to go ahead and move this over
a little bit get this out of the way then we'll label it as remote next when job location equals
New York New York then I want it to label it as local and then finally with everything else
because this goes in logical order and satisfies
02:28:04

whether it meets it I'm going to then specify an else statement of onsite all right let's
go ahead and run this all pressing command e command e okay I got an error right here and
that's because we don't have a comma after job location because it's starting that new
column so make sure you have that in there all right command D command D expand this
out a little bit all right now we can see the anywhere rows are now labeled as remote the
New York is located as local and then everything else is on site so

02:28:35

this is great that know we have this done but from an analysis standpoint I want to
dive into this further I want to analyze how many jobs I have I can apply to specifically the
local ones and the remote ones also look at the onsite as well so we can use something like
the group by function in order to aggregate all these different values so I'm going to start by
how we're going to aggregate this and that's going to be using a count and I'm going to use
that job ID column making sure to include a comma

02:29:03

also with this count job ID I'm just going to label this appropriate as number of jobs so
let's first start by just starting simple of doing a group ey and we're going to be doing this
using that location category let's actually run this query to see what we have so far
command D command D all right a lot more onsite verse remote verse local not bad for local
8,000 jobs in New York anyway if you recall I specifically care about the data analyst jobs so
we'll specify a filter for where and we'll TI

02:29:35

Go with the job title short equal to data analysis running this again seeing how little we
declined went from 8,000 jobs to 3,000 jobs local and down to 13,000 what was it previously
69,000 okay now I normally put an order Buy in here but it looks like for some reason it
automatically sorted it from highest to lowest so I'll consider this good enough for now
anyway we wanted to I could Dow dive deeper into these 3,000 local jobs that I could then
apply to all right with that it's your turn to give it a
02:30:10

try with case expressions for those that bought the course certificate and notes you
now have a practice problem specifically that goes into bucketing different jobs specifically
around salaries so assigning different values for each all right with that see you in the next
one all right in this section we're going to be going over both subqueries and CTE and the
concept behind both of these are that we're going to create basically temporary tables
inside of our SQL query that we're performing and then

02:30:43

perform and Analysis on this temporary table this is very useful whenever we're
getting into more and more complex queries that we need to be able to do as we want to
break it up into sections and subqueries and CTE allow us to do this now in order to show
the power of subqueries and CTE if you recall back from one of our previous practice
problems of creating tables for each of the different months we use this statement of create
table and then underneath it for it we specify that we wanted to only extract out those
values

02:31:17

where the job posting date fell in January in this case for the January jobs and then
from there it created a table inside of our database this table is sort of like permanent it's
there now the only way you get rid of it as we drop it let's tackle subqueries first because
they're used for simpler queries let's say in this case we wanted to create a temporary table
of January jobs well a subquery is a query as you guessed it inside of another query so in this
case we have a select star statement and then from and then Within

02:31:49

paren es we're specifying that subquery in our case we want to select only the jobs
where the job posting month is January and then we rename this table as January jobs
navigating back over to VSS code to actually see this in operation I can select this all press
command e command e and when we we scroll over to that job posted date we can see that
this is all the January so we did that select star on our sub query now the other popular way
to create a temporary table are common table expressions or

02:32:26

spoken as CTE and they can be used in even more locations such as select insert
update or even delete with this one CTE are defined first using the with statement and then
you're saying with this new table name and then the Alias as and then from there within a
parentheses we're then specifying the entire query that we want to run to put in this new
table called January jobs from there we can then run the next query of Select star from
January jobs all right to show this actually in vs code let's actually run this one so

02:33:04

command e command D and once again rolling over to that job posting date to make
sure that it did it correctly and yeah we can see all of them are from January for this so let's
jump into some harder practice problems for each of these first one we're going to focus on
is subquery remember that's a query with inside another query and we can use it in things
like select from where or having we're going to be doing an example where using it inside of
the wear Clause because it's in this parentheses think of order of operations

02:33:33

the inside the parentheses will be executed first and then everything around it will be
operated second so let's say I wanted to get a list of companies that are offering jobs that
don't have any requirements for a degree currently in our column of job no mentioned we
have a true or false value and this says whether a degree is going to be required or
mentioned in the job posting and it's located inside the job posting fact table so let's actually
go ahead and just look at it real quick so right now I'm pulling company IDs and

02:34:05

then the job no degree mentioned and for this all these are going to be true now if we
remember back from our diagram of the table going on here we have that job postings fact
table itself that has whether a degree is mentioned or not and we have that Company ID
however we don't have the company name in this that's in a separate table so that's why
actually in this case the subquery is going to be so powerful because we're can to run a
subquery to get the jobs that have the associated Company ID for no degree

02:34:36

mention and then from there filter inside of the company dim table so let's make this
into a subquery and we're going to start with very simple first we're going to add a select
statement and then we'll say company name which is the column name that we want to look
at naming it as name for the table we're going to be selecting that company name from we
want to use that company dim and now this is we're going to enter in that subquery we're
going to specify where that Company ID is in this subquery

02:35:08

itself now if you recall whenever we ran just this I'm going to just select this and run
the query again command D we're getting back in this a company ID and then whether job
note agree mention is true in this case realistically this column is not even necessary so I'm
actually going to clean it out and we're going to go ahead and run this again command D
command e and same numbers 1 n 10 yeah this is what we saw previously here so I know
it's pulling the right things so we're trying to say hey where

02:35:37

the company ID is within this table so within this table we're going to only want to
return those company names that are associated with it so let's run this all pressing
command e command e uh company name does not exist oops I have this backwards it's
actually name as company name my bad okay we'll run this again command D command D
and Bam we can see all these different things I just want to show this further that we are
actually getting the right data so I'm going to insert in that company ID

02:36:10
and I'm going to do some clean up real quick anyway let's run this again command e
command e okay we can see in here 1 3 4 6 7 8 9 so these are all the ones that are accepting
it and I realize now whenever we look at this one we're not seeing those other numbers and
I think this is just by an order by an issue so I'm going to fix this real quick by although it
doesn't really matter it's just inside of here putting an order bu and then specifying Company
ID then from there just running this subquery inside of here pressing command

02:36:41

e command D okay we can see all the one the same one so 1 3 4 6 and then navigating
over to the other one see 1 3 4 6 so this is correlating and checking out right and this is how
you actually should go through troubleshooting it let's wrap up this section with a final
example on using CTE or comment table Expressions this is used similar to a subquery to
create a temporary result set which in this video previously I may have been referring to it as
a temporary table that was a mistake it's a temporary result set they're two

02:37:14

separate things anyway the results of this temporary result set can be then used in
things like Cas select statement insert update or even delete this only exists during the
execution of the query and it's defined using a with statement before it then defining the
table and then using as and then in closing all of the in parenthesis how we want to create
this temporary result set we can then call this within a query below it so let's actually work a
problem to see how CDs are used for this we're going to be

02:37:44

finding the companies with the most job openings now we need to break this up into
two parts and that's why CTS are perfect for this because first we get the need to get the
total number of job postings per Company ID which is located inside of our fact table of job
postings fact but then once we have this total number we then need to combine it with the
company name which was in the company dim column so we're going to start by building
our query first for that first bullet of getting the total
02:38:13

number of job postings per Company ID so for any of these queries I want to just start
small we're going to be selecting the company ID and looking at the what comes back from
this so we can see that it has multiple different IDs in so now we need to go into actually
aggregating it for this we're going to be using the count function and we can just put in there
the asteris symbol is that's going to count the number of rows but anytime we're do an
aggregation function we need to specify the group by on how we're

02:38:38

going to be grouping it and in this case we want to group it by what's given in that
table right there of that company ID all right let's run this one okay so now we have these
counts so we can actually just go back and verify for zero we had four right here and it looks
like we have four returning so this is the core statement that we're going to be using inside
of our CTE so we can go ahead and create that now so I Define this using a wi statement and
then give it a table name of company job count using the Alias as and then inside

02:39:10

parentheses all the query that we want to do and then just to get started and making
sure that this works properly I'm just going to do a simple select from statement to actually
Define this and this will just query this temporary result set that we've defined up here and
running it we have the same results that we had before now it's just through that temporary
result set so just refresher on the schema we have that job postings fact data that is
connected to our company dim table using a company ID

02:39:41

so we needed to use a join method in order to combine these two tables together to
combine these two tables we're going to be using a left join and for this we want to use our
a table make sure we have everything from it as the company dim table because maybe
there may be some companies that don't necessarily have job postings that we aggregated
from the B table so we want everything to be listed there so way if there isn't there's a zero
associated with it that there's no job postings so as you guessed it B is going to be the
02:40:11

fact table that we're going to be combining to this so let's start simple with this basic
query down here and we're just going to be looking at first that comp company dim table
and so just running this query right here command D command D we can see all the
different names from it so now that we have this let's actually move into joining this I'm
going to do a left join specify that temporary result set and then what we're going to match
these two left joins on which is the company ID from each table

02:40:40

we're not going to necessarily get any different results with this but I'm going to go
ahead and exate execute this entire query to make sure it's actually working properly okay
it's working properly now if you remember we want to get the total number of job postings
per Company ID and have it basically associated with a company name so this is from the
company dim table and I'm going to rename this as company name so I want this value from
this count star statement right now I don't have an Al

02:41:05

should have done that before so we're just going to name this as total jobs and then
we want to have it appear in this new query that we have here so I'm going to then Define it
okay let's actually run this entire query Now command e command e bam and now we have
the company names along with their total number of jobs right now remember we want to
get at the highest who who has the most so we need to now do an order buy and I'll just add
this here at the bottom order bu total jobs in descending

02:41:38

order let's now rerun this query boom all right now we got it and it looks like umgo is
one of the highest amount along with other popular companies like city capital 1 Walmart
and centure so yeah all right it's now your turn to give it a try and I have for those that
purchas the course certificat notes I have multiple different practice problems that you can
go to three for subqueries and three also for CTE if you're still not feeling confident on
actually tackling these problems on your own just stand by because in the next
02:42:10

section we're going to do uh go into another problem of using CTE and a pretty fun
example so feel free to even hold off until after that section all right with that see you in the
next one all right let's get into a practice problem of how to use CTS even further than the
previous examples for this we have a pretty exciting example that Kelly came up with and
this problem was inspired by my app. nerd. te that Aggregates job posting skills depending
on a different job title you want to look at so if I wanted to look at

02:42:45

something like data analyst we could see what are the top results here anyway Kelly
came with this problem find the count of the number of remote job postings per skill so
we're going to be focusing specifically on this cuz say in my case I'm a data analyst looking
for remote jobs and that's what I care about most we'll display the top five skills by their
demand and then include other attributes like skill ID name and the count of job postings
now how are we going to tackle this well remember the

02:43:12

job postings fact table has all of our different job postings but it doesn't have
necessarily all the different skills in here instead we have to use that skills job dimm table to
get the all the different correlated jobs to skills and then our final skills dimensional table
that includes the name of those skills so the first thing we're going to do is build a CTE that
basically collects the number of job postings per skill so we're going to have to do some sort
of join between our job posting fact table and our skills

02:43:43

job dim table once we have this temporary result set we can then take this a step
further and then combine it with our skills dim table to actually give us our final results that
have the skill name in it for all the joins within this portion we are trying to get a count of
jobs that actually exist we don't really necessarily care about if there's values that don't exist
so for this we're going to find that inner join is the best method to use for this so let's start
with the very Basics and then build upon there for the CTE I just

02:44:15

want to look at first the skill ID column of the skill from the skills to job dim table which
we're going to just conveniently rename skills to job command e command e all right so
we're already seeing repeats in here I'm going to go ahead and just actually showcase the
job ID column as well put a common in there run this query again so we can actually see this
better because I don't think it's like showing fully what we want to see all right this one's
much better this one has so for the job ID

02:44:44

let's say zero which is one of the job IDs it has the skill ID Associated of it of zero and
one so there's multiple skills associated with this job so let's go ahead and actually join this
on our job postings fact table since that we know that they're correlated through that job ID
column so the first thing we do is specify in join after that from statement soing the table
and we're going to rename it just simply as job postings from there we're going to specify
how we're going to connect this and we're connecting it on the job ID I

02:45:12

want to make sure this works so I'm going to just do command e command D and job
ID is ambiguous so I need to specify right because it's both of the skills to job and the job
postings doesn't really matter which one we specify we're going to just specify this one right
here and then run this query again all right so the query is running there's nothing
necessarily new in here that we've included but if you recall we want to look at or filter for
the jobs that have work from home as true or remote jobs so

02:45:45

the first thing I do is just add to the select statement to actually see it shown in here
and it looks like we got a lot of false results I'm sure we'll eventually get some true oh we got
a True Result right here so let's actually now Define that where to meet this portion of the
question of remote jobs and we want to specify where job postings of job work from home
is equal to True ring this query again all right so they're all true in this case we don't need
this column anymore listed here so I'm going

02:46:15

to go ahead and delete it I just showcase it to make sure that we were building the
query correctly now getting back to that core problem we want to get the count of the
number of remote job postings per skill so we have this skill ID already we need to now get a
count and we're going to be doing that count star method we'll sign this at an alas of skill
count anytime we do an aggregation well we need to do a group by and we're wanting to
group it by obviously that skill ID now since we're combining by this skill ID and do this

02:46:48

aggregation this job ID column is no longer useful and actually we'll throw off our
aggregation so we're going to go ahead and get rid of it and we're going to go ahead and run
this query all right so now we're seeing that skill ID and skill count I can see here that skill
count these this zero and one Whatever skills they are associated with the IDS are this high
right here at 40,000 so we pretty much have our CTE built so let's go ahead and build that CT
out using that wi statement defining it as remote job skills and then using

02:47:18

the alien operator to put that all inside of there so now that we've combined that job
postings fact with that skills job dim table and we have this in a common table with a skill ID
for it we can now go further and do another inner join with this CTE to that skills dim table
so the first thing I'm going to do is just make sure that this CTE works by just doing a select
star of this remote job skills running this query I can see that it works so let's move into
actually doing that inner join with the skills dim table I specify

02:47:53
inner join and then the table assigning the Alias of skill and we're going to connect
both of these on that skill ID column and now that it connected we need to specify the
columns we want for this so the first we'll just keep it simple with only the skill ID but
obviously next we want the skill name with that defined we need to now have the most
important value skill count so let's go ahead and see if this entire query works then looks like
I have an error here I referred to the table in join as skill

02:48:25

when I reference it up here as skill so we're just going to go ahead and add Nest that
whoopsies and we going to select this whole statement again and run it again all right so
now we have that skill ID skill name and then the skill count couple things we are left to do
now is actually order this and then remember we only need the top five results so we'll add
an order by statement right here specifying that we want to order it by that skill count in
descending order and then we only want the top five results so I'm just going

02:48:53

to throw in a limit statement right here selecting all this query and then pressing
command e command D bam now we got it and we can see what were those 40,000 values
now it was obviously Python and then SQL followed next by AWS Azure and Spark now I'm
going to throw one simple caveat to this question because frankly I really care about data
analyst jobs so I'm going to filter our CTE further for only data analyst jobs JS using within
that and within the wear statement and and specifically where

02:49:25

that job title short equals data analyst all right let's go ahead and run this all to see
what it looks like for data analyst and Bam these are more of results that I would expect for
data analyst CU well that's the query anyway we got SQL Excel and then python Tableau and
powerbi so regardless of what you're filtering for it goes to show the importance of SQL this
is the top skill and so I think it's beoo of you that you've spent the time to learn this all right
for those that purchased the course certificates and notes feel free
02:49:55

to go ahead if you haven't already to work those practice problems working through
those example problems for CTE and also subqueries and once you're done with that we'll
be moving into the last major topic of this Advanced section focusing on unions with that
see you in the next one all right welcome to this last major topic we'll be covering in the
advanced section on unions this is very important for combining tables it's directly if you will
opposite of how we're doing joins so joins are used in the case

02:50:31

whenever we want to combine tables that maybe relate on a single value such as in
the case of combining like the job posting fact table with the company dim table we're going
to combine this on the company ID column now if you remember previously we created
three taable for those job postings in January February and March I'm moving this over here
so we can actually see it better the January table has the same columns as that February
table along with that March table so in this case if we wanted

02:51:06

to combine these basically rowwise we could use a union operator to do this so the
first we're going to cover is Union and it comines the results from two or more or select
statements you would use this operator by first defining a statement such as select column
name from table one and then specifying Union and then underneath it specifying the next
table you want to do it select column name from table two and now in order to Union this
there is one specific condition this has to meet they have to have the same number amount
of

02:51:40

columns in this case we're only doing one column here but in our January February
March jobs they can have a multitude of columns they just all have to match and they all
have to be the same data type and the last thing to note is it gets rid of all duplicate rows
whenever you combine this unlike the next operator Union all so let's start with this simple
query I want to look into seeing all the different job titles and Company IDs and job locations
for the month of January these are the main although they're not all the
02:52:11

columns are the main attributes that I care about I'm going go ahead and run this
query just make sure that it runs perfectly fine and we can see here here all the different
results that we're seeing from it so now let's actually join this with our February jobs table
first I'm going to specify Union then I'm going to do another select statement for the
February jobs and I'm making sure that I have the exact same columns listed as the January
jobs right now from January jobs we have around 92,000

02:52:39

jobs if we run this all together we can see now we have over 107,000 jobs so let's just
take this a step further and now add in all of the March jobs so we have now January
February March and we're specifying all the different columns how many jobs are we going
to have return and for this one now we're up to 143,000 jobs so that's the union keyword
moving on to Union all it's pretty much used in the same exact way we're still using two or
more select statements in order to combine different tables and they need to have
whenever

02:53:15

we're doing this the same amount of columns and the same data type now the thing
about the keyword all of Union all is that it returns all rows even duplicates and Kelly and I
have talked about this further and we find that we use mostly this one in our jobs as we
typically want to get all the data back to make sure we're looking at everything so going back
to that previous query that we built here all this takes is adding in all to both of these now
remember the previous query we got around 143,000 results so let's see what

02:53:47

we're going to get back with this Union all we should be getting back more values and
we do looks like 220,000 so quite a bit more than this almost 880,000 more values or
880,000 duplicates if you will all right now it's your turn to give it a try we have a few
practice problems built in order to implement both the union and the union all if you're still
not comfortable with either of these I do have a practice problem coming up where I'm
using Union also in combination with some other things like subqueries and

02:54:19

CTE so you can feel free to hold off until after that practice problem if you're not
exactly comfortable yet with unions all right with that see you in the next one all right let's
dive into the last practice problem of this Advanced section before we actually dive into our
project for this I want to do an analysis of the job postings from the first quarter that have
salary greater than $70,000 that's like my target range right now so we have those tables
already on January February March we're

02:54:53

going to use our Union operators to combine them all and then we're going to be using
it within a subquery to then analyze it allowing us to thus filter it for those jobs greater than
7,000 and get a snapshot of the jobs we actually want so anytime we're building a subquery
or CTE I like to just tackle it first for this we're going to be selecting all the different columns
from each of these different months tables so I'm going to go ahead and just run right here
to show what it looks like okay and

02:55:21

we can see that it has all the different columns associated with it so the next thing we
need to do is specify that Union and we're going to use all because we don't want to remove
any duplicates and then once again specify the next select statement for the February jobs
and then our final Union all statement so we can combine this finally with our March jobs
let's actually combine this all and see how many jobs we actually have here all right and it
looks like we have around 200 200 12,000 which checks with what we did

02:55:52
last time so let's go ahead and start building this into a subquery so I'm must do select
star and then from and then build this Union all portion into the subquery now because we
defined this table I'm going to give it an alias of quarter 1 job postings let's go ahead and run
this again make sure that it's working just properly okay sweet still returning all the different
jobs all right I don't want all of these different columns here so instead what I'm going to do
is I'm going to define the four main Columns of interest for

02:56:26

this we're going to be doing job tile short job location job via and the job posted date
specifically only the date and then remember the last portion of this question is that we
want to find the job postings that have an average yearly salary greater than 70,000 so with
all of this I need to add a filter on it using a wear statement we're going to specify all this for
greater than 70 ,000 going ahead and running this all command e command e boom we now
have all this information I'm going to modify

02:56:56

this further mainly two things one I want to see the salary and two I really only care
about that iist jobs so the first thing I add is quarter one job postings and then I'm going to
specify that salary year average and then inside of our weer statement I'm going to use that
and keyboard and specify where the job title short is equal to data analyst also one other
bonus I'm just going to throw in an order bu so that way we have this listed for the salaries
from highest to lowest okay let's go ahead

02:57:30

and run this bad boy bam now we have all these different jobs and I can see that I
need to apply for y combinator and it's a remote job that pays $650,000 now I went ahead
and removed this table name prior to to these column names some of you may run into
problems if you don't include this but I didn't run into problems with it so I believe shorter
and brevity is better in all the cases if I go ahead and remove that all for all those different
cases make this look a little bit more concise running
02:58:01

this query I still get those same results all right now it's your turn we're going wrap up
this final problem so we can finally moved into that portfolio project we're building with that
see you in the next one then nerds welcome to the project section of this course so you've
already learned so much in the basics and also Advanced section with using SQL queries but
now it's actually time to put all that knowledge to the test by building a Capstone project
now there's a few goals with this project we're going to be

02:58:33

working with that same data set that we built inside of the advanced section and
exploring it further and your goal for this is twofold you're going to be looking to analyze
what are some of the most top paying roles and also skills but there's going to be a list catch
to this similar to the previous queries you don't have to necessarily limit this search and
follow exactly like I do searching for data analyst remote jobs you can actually fine-tune it to
your job of choice Additionally you can change it to things like the location

02:59:03

you are within the world you should need to research into the data itself and find a
close enough location or locations so for this there's going to be a few different deliverables
that you're going to be able to Showcase as experience with the work that you've done I'm
going to show it here within the project SQL folder that you're going to be building and I
have five files here for the five different queries we're actually going to be diving into and
finding further insights about not only roles but also skills for top

02:59:31

paying jobs additionally with this we're going to be building out your own personal
read me file and this is basically a descriptor of the project itself and it's going to go in and
actually showcase all the different analysis you did and besides just analysis we can also go
as far to show show some of the results that we find from it and we're going to be using
everything that we find in order to draw some conclusions and help you out in finding out
what are the most optimal roles and skills you should be pursuing
03:00:00

so this project can not only be rewarding for those looking to learn SQL but also maybe
if you're job searching or looking for promotion you may able to find insights within this data
set and use it in your real life now all this work you did is great but if you just keep it on your
computer and don't tell anybody about it does really no good so we're going to be working
on this also by showcasing this to things like GitHub and Linkedin both of these sections
using GitHub and Git along with LinkedIn

03:00:27

are completely optional so whenever we get to these one of these sections I'll call it
out and let you know that hey this section we're working on building a repository you don't
need to pay attention to it if you don't want to but I highly recommend that you do anyway
with GitHub we're going to be actually able to Showcase not only all the different SQL that
we have which are all these different files right here but we'll be able to Showcase mainly
that readme that we built to Showcase all the

03:00:52

different work that we did and our findings and you may be wondering what questions
are we actually going to be exploring for this well they all revolve around top paying jobs and
also skills and so we'll be building on each one of these with each of the queries we do and
finally ending up with probably the most exciting one trying to identify the most optimal skill
something that's a high demand and also high paying we're going to dive much further into
this after the next section we're going to be actually

03:01:20

creating our project repository to thus upload into GitHub one quick note for those
that bought the course notes and certificates inside of your notes you're going to have
access to the questions and all the different problems that we're trying to solve with it in
addition to this we're going to break it down even further inside of here by including things
like the reasoning so what's going on behind each step of the query and why we're doing
each step along with the final query itself and what expected results you should be
03:01:51

finding all right in the next video we're going to be diving into actually setting up your
GitHub account and setting up the project repository so we can get this thing started to host
online so we after we get it completely built we'll be able to put it all there like I said at the
beginning this portion of the GitHub integration is completely optional but you learn a highly
valuable skill of git so I do recommend that you do it all right with that I'll see you in the next
one so what is a repository well it's a

03:02:26

personal library for your project where you can keep manage and record every change
if you know of track changes in Microsoft Word this is very similar to that so how does this
version control work well there's a special folder inside of your project that we're going to be
installing that's going to track all the changes that go through it here I have another data
science project inside of my finder window and it has all the different contents that are
necessarily visible to me but in this one there's actually some hidden files

03:02:58

there's this one for vs code we don't really care about we care about this one right
here onget and that's how we're going to be maintaining our Version Control inside of this
folder it tracks all the different changes that we're going to be doing to our project exploring
the contents of this is sort of out of the scope and probably above my PR grade but it's just
important to understand the purpose of this folder so on our local computer ourself we have
this project or if you will a working directory and then as we're working on

03:03:27

it and collecting new documents it's going to go into a staging area and once we have
something we want to save or commit we can then do that and it will save to our local
repository which is in that dogit file now all of this is on our local machine or our computer
locally but let's say we wanted to collaborate with others or even share this like we're going
to be sharing with this project we could then host it to something like GitHub in this case it
will then host a copy of all of the different content that we have locally

03:03:59

on our machine so in that case where I showed you my previous project that located
here on my machine I can also go up to GitHub and this is where I'm actually hosting that
same content at this is would be my remote repository so let's actually get into doing this for
our project and there's a few steps we're going to be walking through for this the first if you
haven't done it already we're going to go through and actually install git git is the most
popular version control system so we're going to be using it for this next we're

03:04:28

going to be diving into setting up a GitHub account if you don't have one already and
it's pretty simple once we have that we can now get into actually initializing a repository
within our project so basically we're going to be creating that dogit folder inside of our
project and then now that we have this local repository we're going to want to push this
repository into GitHub and thus we're going to have a remote repository for the world to see
so let's get into download and git don't be afraid if you're unsure if you have git

03:05:00

installed or not as reinstalling it's not going to cause any issues anyway navigate to this
URL that's on the screen right here or just simply Google get download and it should
navigate you to this page from here select the operating system of choice that has yours and
download it for Windows it's pretty simple as you're going to walk through a similar
download process that you did like postgress and it'll take you through all the steps of the
process and you'll have it installed for Mac users like me it's a little bit more

03:05:31

complicated you're going to want to open up your terminal and then from there insert
this Brew install git and you're going to press enter and execute it now if you don't have
home brew you're going to navigate to the link here that it has and and once again it's very
simple you're going to just copy this command that it gives you right here for installing home
brew and then inside of your terminal paste it and run this command then once you have
home brew installed you run this other command of

03:06:00

Brew install G anyway I promise you that's the last time you'll see the command line
for this course now that gets installed we need to move into creating your GitHub account if
you don't have one already because you need an account in order to host your repository
here's my GitHub account right here has information about me and all my different Social
Links and then from there it has all the different projects I've worked on here along with um
more of them in this one right here so you're going to navigate over to

03:06:27

github.com signup and it will walk you through the setup process to set up your
account inside of GitHub once you have this account set up I'd go in and actually edit your
profile putting a picture in and then updating all the relevant information that you want to
include on your profile all right so we have git installed on our computer and we have have
our GitHub account now we need to move into actually creating that local repository and
then pushing it to GitHub for our remote repository now there's

03:06:58

four major Ops you can do for this we're going to only be going through the vs code
option which I feel is the simplest for those that buy the course notes and certificate I have
detailed instructions for the second and third option as well if you want to do that all right so
let's get into creating that repository using vs code so inside of here you should have
navigated or have your project open for me the project is SQL project data jobs analysis I'm
going go ahead over to the activity bar and

03:07:28
select Source control and for this we have two different options the first is just initialize
a repository so basically create that local repository that dogit folder for you then to have
locally but this doesn't push to GitHub so we actually want to do the second option of
publish to GitHub as it not only initializes a repository locally but also it pushes all of your
different content that you have here to GitHub now I've already logged into GitHub so
whenever you click this button it may prompt you to go through the login steps

03:08:03

to set up your GitHub account to be associated with your vs code profile right here
walk through that process and then you'll get to the next step that I'm going to be at so I'll
click publish to GitHub it's going to ask do I want to publish to a private repository or a public
Repository I want this to be available for the world to see so I'm going to do public then it's
going to ask me which files you should include in this repository this is very important that
you get this step right as I had to repeat this

03:08:29

earlier I'm going I made a mistake so for one we don't care about these dot files right
here these are hidden files that have no effect on showing your work so I'm going to go
ahead and uncheck them additionally these are the other folders I have right now I have that
SQL load that you had that CSV files of all the different CSV file data that we Import in our
database and then I created another folder for advaned SQL that had all those different
problems that we work so far in the advance section this SQL files folder is

03:08:59

extremely large and so because of that GitHub isn't going to allow you to upload all
that data to it Additionally you don't really need to be keeping all that stuff on a GitHub
anyway to show your work so I'm going to go ahead and actually uncheck it so for you you
should only have one folder and maybe two actually created this point that you want to
leave checked you're going to go ahead and click okay it should give you this message of
successfully published this project to GitHub I'm going to go
03:09:28

ahead and open on GitHub and Bam here we have it inside of GitHub we have our
folder that I have for advanced SQL of all our different files that I created our SQL load for
the files to create our database itself and then this dog ignore file and I'm going make this a
little bit bigger if you recall these are the same files that we basically unchecked from before
they were put into thisg ignore file hence they're not being tracked in git and because
they're not being tracked in git they're not getting

03:09:59

uploaded to GitHub so we just did all four of the steps necessary to get git and GitHub
set up for this project and just to Showcase this is our project that we have right here I'm
going to go ahead and show our hidden folders that we have inside of here such as that get
ignore and also get so we can actually see that we're tracking all the changes inside of here
don't check don't touch any of these files right here but I just wanted to Showcase where all
these are and that they're actually in here in the

03:10:27

project now when I navigate back into VSS code I'm going to see that yes it has that
new do ignore file then you're going to also notice that these two folders here are grayed
out and that's because they're no longer being tracked by Source control and because
anytime we do any changes inside of them they're not going to be tracked so they're gray it
out to let you know so we're going to walk through two different ways to actually Implement
changes inside of your working directory and local repository and then have it updated

03:10:57

inside of your remote repository so the first thing we're going to be looking at are
pushing changes and this is basically uploading your local content that's been adapted or
updated and then pushing it up to GitHub so I'm going to create a folder called project and
then SQL this is the folder or directory we're going to be using to store all the different
queries that we're going to be working on during this project section inside of this project
SQL I'll go ahead and create a file and I'll name this
03:11:27

we'll keep it simple right now query 1sq now after creating it it should open up here on
the right and you're going to notice that these contents are green and if we navigate over to
our source control tab we can see that it's actually tracking that it not only it sees that we
have created this new folder but also this new file anyway I'm going to just just come inside
of here um I'm going to put in a comment delete this later and I'm going to save this I'm
going close out of this file so let's go ahead and now push these

03:11:58

changes so inside of source control I'm going to come up here and give this commit
that we're going to be doing a name specifically I'm going to just say name this push
example because we're doing a push example from there we're going to go into actually
committing it so we'll press commit and now these changes have been updated inside of our
local repository and these are stage changes if you will so now we want to get these changes
that are on our local repository up to the remote repository so we're going to then sync
changes so I

03:12:31

can see that there's nothing here in changes staged anymore everything should be UPS
to date let's go into GitHub and see if it was updated so inside of GitHub I'm just going to
refresh this page to see if it has and now we have this folder of project SQL which has our
query inside of it saying like this layer so that was example of how we can push changes
from local repository up to our remote repository now let's actually look at an example of
pulling changes we're going to make an alteration on our

03:13:00

remote repository specifically inside of GitHub and then from there pull it onto our
local machine to be in our local repository so you can also make changes inside of GitHub
itself in this case right here it's saying hey you want to add a readme and we do want to
eventually create a readme so we can just do it here if we wanted to so I'm going to select
add a readme and so I'm going to put a message in here to update the contents of this later
and now we're going to commit these changes to our

03:13:30

remote repository we have our commit message right here of creating a read me then
you should have your email that you're using for this when to commit it directly to the main
branch so I'll commit changes all right so now we have this read me here inside of here and I
can see the read me down at the bottom D update contents to this later now back on my
local machine inside of vs code this file is not there that's because we haven't done a pull yet
to get that information or get that new file here so

03:13:59

we can do that by going over to Source control and then selecting these three ellipses
right here and then going under pull push and specifically selecting pull now when I navigate
over to the explore menu we now have this read me file right here which the two do update
content toist layer all right so now we have our repository all set up not only locally but also
remotely and we're now ready to go ahead forward with actually doing our queries and
getting everything done for building our project now at

03:14:29

this point you know everything that you need to know for git however you'd like to
learn more I have this video right here where I go into further detail on how I use git as a
data analyst and break it down even further so feel free to check that video out I'll link it
with all the other resources I'll be providing for this course all right with that I'll see you in
the next one where we jumping into the queries all right so let's move into that first
question that we're going to be solving using a SQL query in order to

03:15:01

understand what are the top paying jobs for my role in my case I'm a using data analyst
as a reminder we're going to be walking through five separate questions and these all really
build on each other so we're going be gaining insights from each and then by the end we're
going to have a more holistic picture of what roles and what skills we should be targeting so
if you haven't done so already create that folder where we putting all this I created this one
called projector SQL and then from there

03:15:28

start a new file I named this one query 1 initially I'm going to actually go in and rename
it specifically I'm going to give it that one to make sure it's up at the top whenever we
actually save these files sequentially and then also name it top paying jobs cuz that's what
we're trying to solve so at the top of my file I'm going ahead and put what question we're
going to be solving and then also breaking it down into what are some deliverables I want
from this so the deliverables are I want to identify the

03:15:56

top 10 highest paying data analyst roles that are available remotely on top of this I
want to focus on job postings with a salary so I want to remove anything that doesn't include
a salary because obviously we don't know if it's top paying or not and finally I just put in
there why we're doing this overall it's going to offer insights into our final problem question
and answer we're going to get into of finding the most optimal skills and most optimal roles
to be pursuing as a data analyst so let's

03:16:26

start building our base query and we're going to start with these six columns first they
target things like the job ID the title location what type of job job it is whether it's contract
or whether it's full-time the salary what we hear about and then actually a job posting date
additionally we want to do this from the most important table that we've been doing is that
job posting fact table to make sure that we did this right as always as we're inating along I'll
be running these queries to see

03:16:55
that they're running correctly so pressing command D command D and then selecting
what actual database you're going to be using for this if you have to uh navigate out of this
I'm going to go through and set it back up and boom we're getting these results for the
different job tiles job locations job schedule type job posted date okay this looking good so
moving into that first bullet point we want to identify data analyst roles and those that are
available remotely so for this we'll be using a where keyword and the

03:17:26

first thing I'm going to specify is job title short equal to data analyst and then we want
to do an and statement because we also want to get those that are available remotely so we
have to specify a location so for location we'll specify it as anywhere so going to go ahead
and run this Now command D command D and boom looks like we're getting all these data
analysts r rolls anywhere and then oh we're getting a lot of null values for that salary year
average column so that's probably what we're

03:17:53

going to want to tackle next which is conveniently our second bullet point that we
need to do we need to remove all those null values so we're going to do this add to this sare
statement more by adding an and statement and then adding that salary year average equal
to or sorry is not null so let's try this again command D make sure that we're working and
yep we removed all those null values all right so the last things for this are we need to one
we want to get the top 10 so how do we do that well

03:18:25

first we know we want to do a limit statement we're going to limit it to the top 10 and
the other thing is we actually need an order buy and for this we'll do the salary year average
column again and remember this is initially or by default it's going to do ascending so we
want to do descending running this query we're getting all the different job and then they're
going from high to low and if you remember craw from earlier we had one that was around
650,000 and that was here now I'm noticing with this I
03:18:58

actually I want to know one other thing uh with this and that's company because this is
really great that it tells us these type of roles but I'm actually curious that not just these are
remote and it looks like that most of these are all full-time but also what company so we're
going to take this one a little bit of a step further I'm going go ahead and close this out I'm
going to add after that from statement I'm going to add a left join and I'm going to specify
that we want to connect this to that

03:19:24

company dim table that has the company name so and both of these are connected on
that Company ID column so I list the the table name in front of the column for both of these
and then set them equal to each other to get that left join so now I can come in include a
comma that column of the company name is name and we'll actually give it an alias uh to
make sure it's not sort of like confusing of company name okay let's go ahead and run this
bad boy and moving this over here so we can see it a little better we have all the

03:19:58

different things oh and now we get to see the different companies and so we have
things like meta so like a tech company AT&T also kind of a tech company Pinterest a tech
company it looks like some other Healthcare type companies all right so this is a great start
to our first problem we've identified the different jobs that we have and we've also been
able to see what could we expect at some of the top paying roles so now I have a good
frame of mind of what we should be targeting for and when you're working through this Feel
Free as

03:20:27

I mentioned before to modify this to your need whether you don't have to do data
analyst you can do something data scientist data engineers and you can change up to that
job location to where you are as we have locations from around the world inside of this data
set all right for that see you in the next one all right so let's get into the second query that
we're going to be focusing on and it's going to really be building a lot on the previous query
that we just did in analyzing the top paying jobs for
03:20:59

a data analyst specifically here's the results of that query again and I feel like it's still
lacking I know I added that company name into it to dive into it deeper but I don't feel like
that really answers the question that I'm truly trying to find out of what skills are also inil
important so that's what we're going to be doing with this diving in to find out what are the
top skills inside of each of these roles that are the main drivers behind why this salary yearly
average value is so high since

03:21:31

we're going to be building on this further I'm going to go ahead and copy this query on
over into our new SQL file that're going to doing for this and I'm going to create a new file
started with two and I'm going to name it top paying job skills and also make get a SQL file
press enter and then I'm going to actually go ahead and paste that query in and additionally
I added in what question we're going to be answering this which are around what skills
required for the top hang roles we need to be really doing two main things one

03:22:03

using that top 10 highest paying roles that we identified previously query and
somehow combine it in and join it with all of our skills now this because we've previously
built a query this is a perfect opportunity for something something like a subquery or CTE
since this is a little bit more complex we're going to be going with a CTE for this once again
let's make sure that this query Works while running it I'm going to move it over here and we
can see from this there's a few columns we can actually remove from this specifically I

03:22:34

don't really care about that job location or job schedule type and the job post to date
isn't really providing much value either I'm going to go ahead and clean that up and move
those out of here so let's start by making this into a CTE we do that with that with keyword
and we'll specify this as the top paying jobs result set and we'll put the parentheses to
enclose all of this in I'm actually going to select it all tab it over to make it have a nice little
formatting and then put some closing parentheses right there make sure the CT

03:23:09

is working properly I'm going to just go ahead and do a select all from this topang jobs
query right here or top paying jobs result set and I'm going to go ahead and run it to make
sure that it's running properly and looks like it's good if you're call from our diagram of the
actual database itself we're going to need to connect two tables of this so not only the skills
job dim table but also the skills dim because that's actually what's going to have the names
inside of it but what join method are we going to be using to

03:23:41

join this to our job postings fact table well remember from a join section there there's
typically only two joins that we're going to do it's going to be a left join or an inner join now
in this case that a table is going to be our job postings fact table and we really only care
about skills associated with a salary so if there's a job that doesn't have any skills we don't
really care about that too much so left join is not going to be applicable in this case instead
we're going to be going with

03:24:13

inner join so I'm going to start with an inner join on the skills job dim specifying for this
one we want to connect the job ID from our top paying jobs result set with the skills job dim
job ID additionally we want to do an in join on our skills dim table connecting both of these
on their skill ID column now right now I have a select star for all the different columns we're
just going to go ahead and run this to make sure that it works pressing command D
command D and then moving this over here

03:24:43

looks like we were able to join this all on but we have a a few columns in here that we
don't really care about so we're going to remove those so first for that top paying jobs result
set that we have here I want all of the columns from that and I could go ahead and write out
all of these columns but instead I'm just going to do this of dot top paying jobs and then do a
DOT and Then star and this is going to select all the columns from that table additionally I
want the skills from this skills dim table okay

03:25:15

let's go ahead and run this all right so it looks like the quer is working I do want to do
one last thing before we dive in and that's make sure that we actually organize this by the
salary so I'm going to start looking at those first so we're going to put a last statement of
order by and specify that we're going to do this for salary year and the same thing putting in
in that descending order this order buy may or may not be necessary depending on how it
queries it may come already in an ordered fashion but you should just do

03:25:47

this for best practices in case that this does happen to you and you want to get in the
order and it doesn't anyway let's select all this and actually run this query and start analyzing
it all right so this is pretty interesting I took a peek through all these different jobs and the
skills that are requiring now overall there's a lot of postings and we're going to actually use
some other tools to analyze this real quick but from what I'm seeing actually looking at it
there's a very much a commonality of seeing SQL and python in

03:26:15

almost every single one of these rules but with all these results with how much it is
you'd really be better off using it in your analytical tool of choice could be something like
Excel python whatever and you could do this by extracting this data out by Saving results
and you can save the results as a CSV so I'm saving it inside my desktop and exporting it out
anyway this portion is not required but I went ahead and gave chbt this CSV and told it these
are the top 10 diis roles I found in job postings in 2023 can you analyze the

03:26:49
skill column and display the insights and after I used some python code to go in and
actually analyze it it found that that sqls leading and eight of the different roles python is in
seven and Tableau is in six he even went a step further and had it make this visualization
which shows a lot better visually what is going on here with all this data that we just
collected anyway I'm going to go ahead and copy this Insight right here on the skill
breakdown for that and actually go in and paste it down

03:27:20

here underneath our query that we did additionally if we want to use these results
later I'm going to go ahead also and click this icon here to open the results as a Json file I'm
going select all of this by pressing command a and then coming underneath here making
sure I'm inside that comment right there actually pasting all these different results so if we
wanted to we can go back later and verify what results I had from this query anyway just be
clear this chat gbt portion completely optional I just wanted to dive into it

03:27:50

further and pasting the results completely optional the main portion is we wanted to
make sure that we have this query right here as this was solving what we were trying to find
all right now it's your turn to give it a try I'm curious to see from your perspective especially
if you're using something different from data analyst or a remote location how those skills
are different well with that see you in the next one all right in this portion we're going to be
diving into what are the most in demand skills particularly for me I'm be

03:28:22

looking at data analyst now if we look back to what question are be doing for this
we've already answered the first two already we're now on to the third and this one actually
we answered previously and well you can actually go ahead and use those results previously
or I'm going to actually work at a different method specifically in problem number seven we
found the count of the number number of remote job postings per skill and whenever you
run this query command e command e we found that for data analyst and also for work for
home
03:28:55

these were the top skills with surprise surprise SQL and also python topping this list
anyway if you have these query results saved you can go ahead probably just copy and paste
this then move on to this next section of query 4 but I'm going to go ahead now and work
this problem slightly different is this one right here how we have a written is a little bit
longer than I'd like it for a query to be I want it to be short and simple as possible because
we get into bigger data sets it's going to take more

03:29:25

time to run these type so let's try to rewrite this so I'm going to start by creating a new
file I'm going to name it top demanded skills and dsql all right so for this one we're going to
be focusing on what are the most in demand skills and to do this we only have a few steps
for this we're going to be joining our job posting tables to our skill tables like we previously
did we're going to reuse some code from that one and then from there we're going to limit it
down to those top five skills and for

03:29:52

this one I want to focus on all job postings not just those remote ones really just see if
there's any difference anyway I'm going to go ahead and open up that last one that we did
previously and I'm going to go ahead and copy this on over and then mainly pasting in that
inner join this is really a lot to type in so I'm going to I'm going to just streamline this a little
bit so we'll do it a simple select star just to start out from this and we'll do from the job
postings fact table and we're going to be doing once

03:30:25

again that injin to that skills table to the skills job dim table and then finally to the skills
dim table let's just go ahead and run this and see if this actually works and I already made
my first there as I didn't rename the table right here it I still had top paying jobs uh for that
result set anyway let's actually try to run this again hopefully it works all right this is running
for a little bit too long of a time and I understand that cuz we're combining all these different
tables together we're going to
03:30:53

be limiting anyway these uh to find the top five jobs anyway so I'm going to go ahead
and just throw this limit five statement here to speed up this query I was waiting about a
minute and it wasn't uh loading all right lot quicker whenever I do that limit five in there
don't do what I just did there and it looks like it's combining all these different tables I can
scroll the way to end and see that the skills got combined into this so remember what we're
trying to do we're trying to find what are the

03:31:19

most in demand skills basically we're going to find a aggregation of the sum of skills so
we're going to need to provide a count in order to find out how many SQL entries are how
many python Excel and so on so first thing I'm going to specify is that skills column because I
want to return that and then we're going to do that aggregation method we're going to be
doing a count for this let's use the job ID for this and I'll just put in sales job dim. job ID and
we'll put this as the demand count now

03:31:53

remember anytime we do some sort of aggregation we have to do a group buy so for


this we'll do the group buy and we obviously want to group buy the skills let's go ahead and
run this to see where we're at right this make sure it's running properly okay this is looks like
it's tabulating right now right now we're not having any type of sorting so that's what we're
going to need to fix next but we can at least see that we're getting those demand counts of
those different skills looks like they're

03:32:17

actually doing this in alphabetical order so let's go ahead and put that order by in there
now so order by and we're going to order it by the demand count run this again command e
command e and obviously we got to do it descending run this again all right so now we're
getting all the skills right now and it looks like sequels topping the list at around 380,000
python very close by now remember if we go back to our original question we want to find
out what are the most in demand skills for data analyst so we need to modify

03:32:51

this and insert in a we statement to filter by this so let's use that weer keyword to filter
this by we're going to be using that job title short column and then specifying that we want
to filter it by data analyst so selecting it all and running it command e command e boom now
we got it and it looks like we have SQL Excel python tableau and powerbi Topping the list
now I am curious to see how it compares to if it was only remote jobs so I'm going to add I'm
take this a little bit step further just to compare

03:33:26

it I'm going to add the and statement for the wear specify job work from home as true
ring this command e command e we can see when we compare these we still have so on the
left hand side we have those that table that is not uh that is that allows any type of work
whereas the one on the right is only allowing remote work or work from home and the
general trend is the same with SQL Excel python Tableau powerbi and it looks like it's very
much in the same proportion so there's not a lot of difference between

03:33:58

the two all right now it's your turn to give it a try feel free to once again modify this for
your specific role but also you don't have to stick to this thing where I just did remote you
can do your specific location or really anything particular to you that you want to filter down
further on with that see you in the next one all right so let's get into the second last skill of
analyzing what are the top skills based on salary now this is all in progression to learning
what is the most optimal skill if you recall earlier

03:34:32

we've already analyzed to find what were the most highest paying jobs with one at
around 650,000 for all those high paying jobs that we found we then dived into the skills
themselves and we found popular tools like SQL and python are common in a lot of these
rules and so we dived in further to look at how popular they were not just among the top
paying jobs but also all jobs and there was still a similar trend of that SQL and Python and
even things like Excel Tableau and powerbi for data analyst so

03:35:02

our ultimate goal which will be in the next query of what is the most optimal skill
looking at what is not only high paying but also highly in demand we need to actually answer
that question of what is a high-paying skill so for this query we're going to be looking at what
is the average salary associated with a skill specifically fine tuning this for me for data analyst
positions and we want to make sure this data actually does include salary data we're going
to exclude any null values and then finally

03:35:33

we're going to do this initially regardless of any location but then I'm also going to
modify it to me for my remote case where I want to look for remote jobs and what would I
expect it to be so back at vs code we're going to start with a new blank SQL document that
we we working from and have the problem outlined up at the top so where are we going to
start with this one so this one's very similar to the last query in fact because we need the
names of the skills from that skills dim table and also we need the salary data from the

03:36:02

job postings fact table and previously we did a count of values of this column but now
we just need to do an aggregation method basically doing the average of these average
salaries so navigating back to our third query that we did I'm actually go ahead and copy all
of this as a lot of this is going to be reusable and paste that in right here so if you remember
from this one we were doing the count of the skills I'm going to go ahead and actually just
execute it to show this was doing account of the

03:36:32
skills basically we want a very similar thing but we want the average salary right here
so for the time being I'm going to remove this and we have the from job postings fact our in
Joins where data analyst is true remember we're going to maintain this to look regardless
location so I'm just going to put a comment in here right now and move this and right before
this so we'll just move it out of the way for right now but we can put it back whenever we
want to analyze it for emote jobs from there we

03:37:00

have group by skills and then an order bu we'll have to actually update this as well um
I'm going to actually change this limit to a little bit heftier value of just 25 so like I said the lot
of this core query is already good enough and you could feel free to start over again with it
but I really don't like to repeat myself so the first thing I'm going to do with modifying this
query is previously we had the count up here I'm going to add in an average function to
actually average that salary and I'll

03:37:29

specify the column of salary year average now because we have an aggregation as
always we need to group by the appropriate thing in this case we do want to group by the
skills so we'll keep this as is and remember from our core things we want to focus on roles
with specified salary so I'm going to add an and keyword right here and then for this I'm
going to remove any null values that are in that salary column and also help speed up this
query a lot faster so I specify salary year average is not null and the last thing to do is

03:38:02

we have this order bu here that needs to be filled in so I'm going to go ahead and just
put this well we need to assign an alias for this so I'm going to do an as and name this
average salary pretty original and then whenever we go down here to the order by we'll
order it by that average salary okay let's actually run this and see if it works fingers cross oh
and it works okay and okay like usual I messed up and uh the order by I did it in ascending
because I did the default but we at least have all the
03:38:36

values here okay and it looks like it's working correctly also I'm noticing these decimal
places I don't really there's a little bit too much data for for me so first thing to fix the order
bu I'm going to change this to descending run this again and then let's clean up this salary so
there's a function we didn't talk about yet and that is the round function and we can put
that outside that average function and with round we need to actually specify not only the
column or the aggregation of interest

03:39:10

but then also after this we have to use a comma and we have to specify the amount of
digits want to round to I don't really care about any of these digits um so I'm just going to
make this zero you could make it two for like two c uh for um down to the 100th place
anyway let's go ahead and run this make sure this works okay we have all our values now all
right so this is the top 25 and looking through it unfortunately Python and SQL aren't
topping the list but it looks like it's more focused specifically for data analyst on web

03:39:47

development tools so like things like terraform um and then more Niche tools like data
robot so this does make sense anyway just out of curiosity I want to see how this Compares
I'm going to close out these other windows I want to see how this Compares for remote
work and I'm going to remove this comment right here and then run this section again and
looking at this for remote jobs there's a lot more familiar names that I'm seeing on here such
is Jupiter that's like a notebook where you can run

03:40:20

Python and probably the most surprising of all or coolest of all is postgress at least
made the top 25 anyway similar to what we did in query 2 because this is a lot of text Data
here a lot of analysis of what is actually going on or what all these different tools are used
for I actually went up here and select this floppy disc right here and had copy results as Json
to clipboard and I've been having an ongoing conversation with gbt got a lot of the results
from this and I just said hey here are the top
03:40:49

paying skills for D analyst the top 25 can you provide some quick insights into some
Trends into these top paying jobs and then providing all this Json value anyway it provided
three insights into what are these top paying roles consisting of for data analyst specifically
big data and ml skills are a high priority next up comes software development and
deployment and then finally cloud computing so a lot of specialized skills that require not
only SQL but also python anyway I went ahead and copi and pasted this into underneath

03:41:21

this query to basically break down what we found out in this query all right next one
we're going to be diving into what is the most optimal skill so you haven't done this one
already do it for you make sure you're adapting it to your need and then we'll be diving into
the final query which I'm super excited to get into all right see you in the next one all right
finally moving into what is the most optimal skill I should be focusing on as a data analyst or
remote jobs now if you recall from our third

03:41:52

query we found out what is the demand for certain skills granted we only limited to
Five results shown but basically we can get all the different results or the demands for the
skills in the last query we analyzed the skills from a different perspective and found what
was the average salary for each of those skills for a data analyst so honestly the easiest
solution to basically build on the code that we've already built is to use a CTE basically a CTE
for query 4 a CTE for query 3 and then combine these two results together

03:42:24

on something like the skill ID so inside VSS code I'm going to create a new file called
optimal skills give it that number CL five the fifth query and up at the top of the file I put in
what we're actually trying to solve for what are the most optimal skills to learn and we're
going to be focus on high demand which we've already calculated and then High salaries
which is already calculated as well in query 4 so the first thing I'm going to do is going to go
back and get the query from query 3 I'm just going to copy that all right
03:42:53

here and then paste that in remember we're going to want to put this inside of a CTE
we're going to be putting both three and four inside of a CTE so I'm going to title this with
skills demand as and then start that CTE right there and then from there tab this over
remove that semicolon and close the parentheses Okay so so I have the skills demand
coming in its own result set using the CTE the next thing I want to bring in is the number 4
query that we have so I'm going to copy this one also and paste it

03:43:28

over here I'll name this one average salary for the temporary result set and then start a
parenthesis here tab this over remove that parenthesis and then from there add in another
closing parentheses right here okay so we have our two CTE located right here so I'm going
go ahead and close out of this and close this side window we have what we need for this we
now need to move into actually combining these and if you look at it we could technically
combine it on the skills because each of those skills

03:44:04

are unique if you will but that's not really best practice we need to typically whenever
you want to combine anything you want to combine it on the actual um the key at itself the
either primary or foreign key so we're going to be using the skills ID for this so I'm going to
specify skill ID here and then also skill ID here as well now since we're going to be combining
these I don't want to limit this to five and I don't want to limit this one to 25 I want to
combine all of them that way they do the agation also I want to speed

03:44:36

up this query I don't care about an order by so I'm going to remove that in this case
and similarly I'm going to remove that as well here and then the last thing I'm going to
modify is the group bu so like I said previously we're going to be connecting these two tables
on the skill ID we can do groupy skills it's just not bre practice so we're going to change that
to the group ey of skill ID same thing for the groupy down here on skill ID so that way we
know for sure whenever we do this that we're

03:45:06

actually aggregating it correctly the last minor thing to note is for my things we're
going to concentrate on remote positions with specified salary so previously for our query 3
we were looking at all the jobs and not necessarily those that have or are missing a salary
value so I'm going to select this from query 4 selecting salary year average is not null and I'm
going to also insert it up here as well anyway now we have both of these CTE built we need
to actually move into combining them so we're going to move

03:45:41

into our select statement we're going to first identify the skill ID we have this in both so
we're going to just specify the first one of skill demand. skill ID next we need to identify the
skills so we'll do similarly with this specifying the skill then we need to bring in from query 3
that demand count along with that average salary from query 4 so now when you move into
combining these we'll use a from statement specifying the skills demand table or temporary
result set and then doing an inner join because we only

03:46:17

care about what exists in both of these tables and we're doing this with the average
salary temporary result set and both of these are combined using the skills ID so this has the
bulk of what we need already let's go ahead and run this because it's already been too long
already that I could have made a mistake and not caught it so command D command D and I
did run into an error and it resolves around this with statement right here so anytime we're
doing multiple CTE you actually group these together so this with keyword transfers

03:46:52

this into a temporary result set of the skills demand and then you put a comma and
then similarly we do this average salary so we're D two CTS right here so going ahead and
running this command e command e and when I added in those skill IDs it was ambiguous
sometimes you'll notice in if you bought the quz notes and certificates I'll just include all the
different table names in front of these and that's just so you resolve this ambiguity and don't
run into these different errors like I'm running into

03:47:20

right now so let's try this one more time to see actually if this works pressing command
e command D and spoke too soon again I didn't get all the skill IDs again so we got another
error and saying skills must appear in the group by Clause so I'm going go ahead actually we
already have skills up here with our count and that one should be fine so I'm going to
remove it here let's see if if this works all right so I've been troubleshooting this skills must
appear in the group ey and I know it can be done and anyway I come to

03:47:54

realize that compared to our I got to close out this so we can actually see it so in our
first CT We Have Skills ID and also skills we want that skills to be in there right because we're
going to be using this and actually displaying this in our final table and I don't want to do
another left joint down the road anyway I was specifying the wrong table earlier it should be
skills job dim and in this case I specified now skills di to clear up that ambiguity and so now
we should actually get this to work command

03:48:25

D command D okay and finally hopefully the last error I'm now getting this relation
average does not exist I'm like I know I can do average in here I got a dang typo right here on
the inner join and I specify average space salary so wasn't necessarily working oh my gosh
this is on me right now one other typo I don't have skills properly in there all right so we're
finally through that after quite a bit of troubleshooting but honestly that's what I find myself
doing from time to time with SLE queries especially when

03:49:03
combining it sometimes it looks like it's easy just to combine two queries and put it
into a final one and you run up to a lot of hiccups along the way so I wanted to include that
to show that sometimes you're going to have to go through these troubleshoo shooting
steps with this anyway we have our results back I'm going to move this over to this window
and we can see from it we have our skills our demand count and then the average salary
associated with it right now it's not really in a particular

03:49:30

order so we need to clean this up it looks like it's ordered by skill ID so I'm going go
down to the bottom of the query after the inter join and put in an order by and really I care
about demand first so we're going to put in demand and count put in this descending order
and also you can put more than one things to order by so in case ever they have the same
value we'll go to the second value to thus order that one so in this case we're going to do the
average salary and then also do this in descending order and finally we're going

03:50:03

to limit this similar to last time of just 25 values so I'm going to go ahead and run this
query now to get this all right so here we have it moving on over here here we have our
results and right now it's ordered by that demand count but then we can still see based on
what we know about some of the higher salaries for skills upwards of 150 to 200,000 we're
seeing these high demand skills are around 100,000 so I also want to see this this table
ordered from the average salary perspective first so I'm

03:50:38

going move this around with average salary up before the demand count and then run
this aggregation method again command e command e and this time as we can see
whenever we do it from this perspective we are getting these higher salaries right here but
this demand count is so low for these higher salary jobs based on what I'm seeing from that
previous table I'm wondering if we could do a wear Clause to limitate by the demand count
to make sure that we still have these high-paying jobs in there and
03:51:11

ordering it similarly but have a demand count grade than 10 so I'm going to add in this
of where and then specify the demand count greater than 10 okay I'm going to go ahead and
run this new query to see what it looks like for this one all right and I'm liking this one a lot
better it's a lot more represen istic of what we would expect because there's actually values
quite AET bit of job data values for each of these so just analyzing this roughly doing a scan
through it I can see that once again

03:51:45

we're seeing that one Cloud tools and specifically cloud-based databases are some of
the highest in here and then down here near the middle of this list of only 25 we're seeing
uh programming languages like Python and R now although this was a great way to
demonstrate the use of CTE I would recommend actually making this more concise and so
that's what I did here I went through and actually rewrote this entire query keeping it uh
pretty concise and all it basically did was actually combine in all those select statements into
one

03:52:21

that were previously spread out among those two CTS and I still had the same wear
group eyes and ordering the only thing I'll add is that little last caveat at the end where I
wanted to have the demand count greater than 10 which is just an arbitrary number you
can't put an aggregation method inside of a we so I had to do a having keyword here to
actually do that anyway running this all I can do command e command e and comparing it to
our previous table there's our previous tier table on the left and then our new table it's the

03:52:53

exact same results anyway mainly just show you this that there's multiple ways to
actually go through and analyze and what can be done and what can't be done all right next
thing we're doing is going to be actually packaging up all of our project that we have right
here generating a readme and then actually going out and sharing this project on How I
would go about doing this all right with that I'll see in the next one all right so now it's time
to actually showcase all the different work we did and doing this by uploading this
03:53:26

into GitHub but before we can do that we actually need to streamline and put all of
our analysis into one document so inside of vs code right now I have a few different folders
you may not have this Advanced SQL that is all the problems that we worked dur in the
advanced SQL section but you should at least have this project SQL folder which has all your
different SQL files that we worked on previously along with that SQL load table on how we
actually loaded the data into the database so let's actually

03:53:55

upload this all on GitHub and see how it looks right now and going into on the activity
bar on Source control see that it has all of our new files I renamed one of my files why it's
scratched out here and I give it this commit message of upload SQL files and then press
commit and then I want to sync changes navigating into GitHub can see my projects located
right here at SQL project. job analysis and this right here is what everybody's going to be
seeing when they first get to the project and overall there's not a lot

03:54:25

here as you can see that read me that created there's nothing really detailing it and
you have to navigate into all these different folders to even see what is going on here like
how do you even navigate it so here's an example I want to work off of I have a course on
chat GPT for data analytics and then you go through and use chat GPT to generate code
basically gener a project anyway in this project here I have for the read me it details all the
different analysis we do for that project there going through

03:54:55

describing it all and describing all the different code that's actually located inside the
repository so that's what we're going to be doing is building out a readme in order to handle
this so for this readme file I want it structured in a logical manner to actually go through and
so somebody can read it and actually follow along in all the analysis we did so we're going to
start with simple things like an introduction a background and then tools I used from there
we're going to move into the analysis the

03:55:20

analysis will be the bulk of the session of actually capturing all five of those different
queries that we analyzed and then finally we're going to wrap it up with what you learned or
what I learned during the analysis and then any final conclusions that we came to or Drew as
far as insights so back inside of vs code let's actually get to work I have the read me here
open and I'm going to drop in all those different sections that we're going to be working on
for this now conveniently inside of here

03:55:49

this is this is a text version right here so you can insert all this kind of text right there
but we're going to want to open on the right hand side this open preview and this allows us
to see what it's going to look like in the markdown version or the final version whenever
we're actually inside of giip Hub specifically if I navigate back to that chat gbt one and I
navigate here I can see like hey it's all formatted stuff but if I actually go inside of the read
me itself and look at the code basically

03:56:18

it's just a bunch of text and so there's certain format that you need to do in order to
get things like headers here or links or whatnot let's go over a few brief ones that you should
know about as we're building this out so the first thing is headers I actually all of these need
to be headers and what do I mean by that I'm put a little pound sign right here in a space
whenever you use that that is a header one I can also do two pound signs to make a header
two or header three whatnot for each of these

03:56:49

I'm going to make them individual header ones all right so I have all the different
headers now I'm going to be going through and actually filling this out let's start with that
introduction section first so I make this intro of dive into the data job market focusing on
data analyst roles this project explores top paying jobs and demand skills and where high
demand meets High salary in data analytics and then I give it this of SQL queries check them
out more here so need to insert a link so for this I want to provide a basically a

03:57:18

clickable link so I know that it's going to be going to this Advanced sorry it's going to be
going to this project SQL folder when they navigate to it I want them to see this so I'm going
to start with some brackets and then from there actually Define the folder that we're going
to and that's project SQL folder and that's just what's going to be actual viewable then from
there I'm going to put a parentheses next to it as you can notice right here well let me close
this out right here as you can

03:57:43

notice it's now again clickable but it's not going anywhere so we want to put some sort
of Link inside of it so I press forward slash and then inside of here conveniently all the
different folders pop up that I want to link to link to I put project SQL and now it should be
good so whenever I click this it should want to navigate me over as you're seeing here but
whenever I have this in GitHub it will actually navigate them over to that folder next up is
background and I'm not going to bore you by reading through all this but

03:58:14

basically I went through summarized what the background for the course was also put
a link to the course and then from there defined the five questions that I wanted to be diving
into that we did in our final project analysis next up is tools I used and I wanted to
summarize four or five main tools of SQL postgress vs code and then also git and GitHub and
I put a little disclaimer across each I'm noticing right now so you can use something like a Tac
or a dash to get it in bullet point within here but I'm

03:58:47
still having a hard time actually reading what are these different tools I have I can
actually bold these different values by putting double asteris before and after any value it
then goes ahead and bold cases it all right next up is the analysis we're going to start first
with just putting a blanket statement of what we're actually doing here that we're aiming at
investigating specific aspects this is can to be broken up into multiple sections so the first
one I used a header three and then specified

03:59:13

hey it's number one one one that we did of top paying data l jobs by the way we're not
going to go through all five we're going to do one of these from there I'm going to put a
quick summary in and so for this one it was just hey to identify the highest paying roles and
then what I filtered it by and everything like that now I want to showcase my code from this
query so that way they don't necessarily have to go back to this file right here and actually
see it instead I'm going to copy this code right here I'm going to

03:59:39

close this panel we're get a little smoed and I'm going to put it into here the problem is
if you look at it it looks like a hot mess over here so what we can actually do is format this as
a code block so for this I'm going to put three back ticks one at the beginning and then one
at the end and so if you look inside of here now we can see that this is all formatted
correctly and it actually looks like SQL code additionally it's good practice inside of here to
specify what language this is and so inside of this markdown file

04:00:11

right here it actually if you notice that it form formatted it for what it is here and
similarly it's doing that here as well with this color highlighting so making it a lot easier to
read so definitely recommend doing that following this clo boock I wanted to top this all off
with the findings so I put in here hey here's the breakdown of the top. UN list jobs in 2023
and then had three different insights of the results that we had found earlier now if you
recall from earlier during that second
04:00:38

query when we dived into it I had chat gbt visualized the data that was provided well I
also had Chad gbt visualize the results from this as well to showcase the different salaries
associated with the top 10 jobs anyway I want to show some of these visualizations inside of
here to better convey some of the analysis that we did here so anytime we need to include
these type of images inside of marktown we actually have to include the images inside of
the project file itself so I'm going to create a new folder called

04:01:13

assets now this option of including an image inside of here is completely optional
whether you want to do it or not but it shows even more functionality of how to use
markdown anyway I can take that folder or take that file itself and then go ahead and drop it
into here and so we can see it's right here in top hang roles I'm going to go ahead and then
go and select copy relative path now in order to display an image in markdown you need to
use this nomenclature here where you're going to have an

04:01:45

exclamation point and then brackets around some sort of alternate text for this I'm
going to name this top paying rules next we need to do that URL so I copied that relative
path I'm going to go ahead and paste it in you notice whenever I do that now it's accessing it
right here from our folder so then I just end this off with a little comment at the end in italics
that hey this is the bar graph visualizing these results and also that chat GPT generated this
graph for my SQL query results all right

04:02:14

so now I'm going to repeat this for all five of those other areas as well all right so I've
gone through and put all this in this took me about uh like 30 minutes to go through and do
right so here's what I did very similar to that first section as I continue the same layout of
just outlining it and then providing an a quick snapshot of what the breakdown is and then if
I had a graph I included it now when I got down here into queries 3 4 and 5 I found
visualization were less uh helpful for this so I started inserting tables did
04:02:48

that for both four and then also for five proving that final analysis now tables
themselves are pretty easy to put into markdown you can put it in with this format right here
I found that actually just copying and pasting the results from here in vs code into something
like Chachi BT and having to get format for me saves a lot of time so that's what I did for a lot
of this so last two sections that I'm going to include the one is what I learned this really
going to be dependent on your situation where you are in your Learning

04:03:21

Journey I just put in these three examples of hey we worked on querying data
aggregation and then also just getting into actually analyzing so I named it analytically
wizardy all right the last section to move into is the conclusions and I'm going to break this
into two separate sections first is insights and I'm going to just capture all the different ins
sites that we had up here from before and captured into just five main different value points
that I found from this analysis and then finally I'm going to have some closing

04:03:55

thoughts for this section I'd really be looking at what you actually took away holistically
from this project for me just working on this alone really built up my sequel skills although
I'm already you know pretty confident in those skills I feel actually going through and
teaching you through this I learned a lot myself so so I put a lot of that in the closing
thoughts here anyway so this read me has everything we need inside of it I'm going to go
ahead and save it and then I'm going to upload it into GitHub

04:04:24

for this it's going to be committing those two images that I had in the readme along
with the changes to the readme so now when I actually go to this project here I actually
have a fullblown layout of all the different work we did and all different analysis here I'm
pretty blown away this is a lot of work we put into this you should be super proud of this the
other thing I'd suggest is actually pinning this repository to your profile so you can come up
here to customize your pins and I'm going to come and select this SQL
04:04:55

project data analysis and save those pins and then from here if I wanted to I can drag it
around and even put it up at the top all right now that we built this entire file and this
readme we need to go forward with actually showcasing it so that's what we'll be doing in
the next section of actually getting this onto something like your LinkedIn to show case as
experience all right with that I'll see you in the next one de nerds congratulations on making
it to the end of the course been nothing short of your hard work and there's a

04:05:28

couple steps I'd recommend taking further now and actually showcase your work so
it's been great that you put this work on GitHub but now you need to get the word out and I
recommend doing this via LinkedIn there's three main things you can do for this one add the
certificate of completion for those that purchased course no certificates which it's not too
late to do that you can add this to your profile another thing you can do is with your GitHub
project now on the internet you can also showcase

04:05:53

this project on your LinkedIn profile and the last and third thing that we'll cover is a
social media post so after you complete the end of course survey I will send you an email via
an Automation and you'll receive the certificate of completion with this email you can go
ahead and then download that certificate to your computer after that you should never over
to LinkedIn specifically to your profile and in it make sure that you have your certificate
section and also your project section enabled within

04:06:22

your profile by going underneath the recommended and adding these two for the
certificate you just navigate to that section click the plus icon and then from there fill it out
most of the information here is self-explanatory for the issuing organization you'll be able to
put in my name of Luke barus for the skills I would list these five core skills of SQL postgress
and SQL light now if you did the GitHub portion of this you can also put in git and GitHub
finally it has an option to add media and you can go through and actually

04:06:51

upload that certificate that you had from previously from there click apply this
obviously in my certificate so I'm not going to save it now for sharing your project you're
going to navigate down to the project section and then click similarly that add icon now this
one's also pretty self-explanatory you're going to put in that project name a short little
description I have about what I did and what I investigated and what I found similarly I put in
the same skills that I had in the certificate

04:07:13

section and if you obviously you did the project and you listed a GitHub you can list get
in GitHub when it gets to add media that's where you want to actually go in and add the link
to your GitHub repository and this way it can be actually directed them right to it from there
put that start and end date everything else can pretty much be left blank and for this one I'm
going to go ahead since I actually did do this project I'm going to go ahead and save it so
now that we've both showcased our certificate and also this project the

04:07:40

last thing we have to do is just make a social media post you prompt you either after
the certificate or even this project to maybe start a post now for this post feel free to tag
both me and also Kelly the co-creator of this course in your post I love seeing everybody's
progress and specifically diving in and actually checking out your different projects so I'm
looking forward to seeing that all right once again congratulations on wrapping up this pretty
hefty course on SQL I feel like it captures everything that you need to

04:08:07

know to dive in and be confident in your skills no matter what the field you're working
on in data science is for what to learn next you probably learn from the data that python is
another popular skill and I'll have a tutorial for that coming real soon so stay tuned but until
then I find that chat gbt speeds up my process a lot in data analytics so if you're curious
about that check out this video right here also for those that didn't buy the course notes
certificate still not too late check out this link

04:08:37

right here with that I'll see you in the next one

You might also like