0% found this document useful (0 votes)
9 views15 pages

Session 2 - Methodologies and Workflows in Data

Uploaded by

PALAK SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

Session 2 - Methodologies and Workflows in Data

Uploaded by

PALAK SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Session 2- Methodologies and Workflows in Data

Analytics
Alright, let’s dive in! Welcome to the second session of our Data Analytics Fundamentals series. This
is actually our second class, and I know some of you might be joining for the first time. No worries,
you’ll be able to keep up, but I do suggest checking out the first session if you can. Today, we’re
gonna focus on methodologies and workflows in data analytics. A lot of folks ask me why we even
talk about this stuff since it can sound a bit dry.

So, the theory is that when it comes to talking about methodology or workflow in a company, a lot of
folks don’t realize how much we rely on tools. Like, many people think that if they just learn Tableau,
they’ll land a good job. That’s kinda true, but it’s not the whole picture. The same goes for Power BI
or any ETL tool—yeah, knowing those can help you get a solid job because the industry definitely
wants people who know their stuff.

You might think that just learning certain tools is enough, but that's not the case. Sure, you could
land a job, but you won’t really know what to do once you’re there. The methodologies and
workflows we talked about really define what your job will be like. So, if you get a job as a data
analyst, whether you’re just starting out or you’re already experienced, these are the things that will
influence your everyday work right from the start.

Let's dive into this project until the very end, and it's super important that you really grasp these
concepts. It's not just about learning something, passing an exam, and getting a grade—it's way more
than that. Understanding these ideas is crucial because you'll apply them in real life when you start
working. So, we'll go over what those concepts are. I'm just giving you a quick overview, and here's
what we’ll cover today: we’ll kick things off with methodologies in data analytics, and there’s
something called...

Data analytics workflow is super important, and there's a big difference between data and insights.
This is actually my favorite part! A lot of folks think data and insights are the same. Just because you
have a ton of data doesn't mean you automatically get insights. Honestly, I think “insight” is one of
the most confusing terms out there. People tend to label anything as an insight, but that’s not always
true, right? Then we’ll talk about turning messy data into actionable insights, and that’s the last
topic.

understand this but as of now we are just scratching the surface of this topic um so these stages
include understanding the business collecting data preparing data building models evaluating and
deploying the results I mean the easy way will be see beat any job right now we are specifically
talking about data analysis but take any job not even data analysis uh take something which is not
related to data analytics right let's say you are working as a driver you run Uber let's say you are an
Uber driver now if you're an Uber driver

I mean you can just get into a car turn on the Uber app get some rights everything works but see
what happens is uh you know if you're an Uber driver I mean if you want to be really good driver
what you do is you follow certain procedures right for example at the beginning of the day you start
cleaning your car because you want your customers to feel uh that your car is clean once you are
inside the car you probably use some air freshener inside the car and then you set a time so you you
probably

say every day 9:00 I'm going to start and every day 5:00 I'm going to end and you also uh when
you're driving Uber you also decide to figure out okay how many customers I'm getting from this
particular area and how many long rids I'm getting from this area how many short rids I'm getting
what is the busiest time of the day so I mean if if you want to be an Uber driver this is not required
you can just uh take your car and get some rights but if you really want to be a good driver you apply
these methodologies so this is

what you call methodology a series of repeatable steps so that you can evaluate and understand
what you're doing maybe you are an Uber driver and last month your revenue is very low why you
want to know why because your Revenue was very low was it because you didn't get so many trips
was it because the customers didn't like your car maybe they give poor rating was it because of the
weather that it rained and you didn't get any customers so if you want to understand and improve on
any of this you use a methodology so even in data

analytics we use methodologies right and so these were basically the step uh that that we use in a
data analytics methodology now I know that these are just names you know they're like broad names
they don't mean anything and very interestingly uh we have a software to help you I mean this is a
PowerPoint presentation right I mean in a PowerPoint presentation you can just read you cannot do
anything what if I want to communicate this with you in a very effective way so I don't know if you
have heard about this uh there is

something called a mind map so there is a software uh yeah so this one so there is a software and
and we will share this with you I mean how you can download it and you know how you can get it so
basically this is called xind I mean the name of the software is actually called XM mind now this is a
really great software even for you for example you want to communicate some idea to somebody
right you are working in a company and somebody asked hey can you probably um um look at our
company's performance for the last one year and

present it to somebody now you can come up with a PowerPoint you can come up with so many
other things but what I would really use is a mind mapping software and that's X mind and I'll show
you how this works I mean so here you can see we have created several sort of like visualizations and
you can zoom into each of them for example this is one thing and this is what we are discussing right
if you look at my slide here so these are the different stages in uh data analytics methodology for
example the first step is business

understanding data understanding the second step preparation modeling evaluation and deployment
now if I want to talk more about this how do I do it for example what is business understanding the
first step I can write it in the PPT but a better way will be to come here so as you can see in there are
six faces uh one two three four five six and I can just expand each of them so there is business
understanding I can just expand it you see there is a popup yeah so let's first understand what are
these steps and then I will

give you a real world example so you understand it so the first step is business understanding now
this is actually your understanding about the business I mean so this phase basically uh it it is also
related to something we call domain expertise I'll give you an example right so one of the airline let's
say one of the airline Indigo so they are basically saying that they're losing lot of customers now you
cannot just go to Indigo and say okay I'll figure out why it is happening you need to have an

understanding about the aviation industry and the business right how tickets are being booked what
is their revenue share so many things right so you can't just into Indigo and say okay I'll figure it out
why it's happening no so business understanding is really critical so normally when we are working in
projects for example uh wherever I have worked so I have personally worked with uh this company
called uh ge General Electric okay so GE is one of the largest manufacturers of commercial aircraft
engine you have these engines

in flights right engine you call around 80% of those engines are created by GE they are one of the
leaders there are other companies also like Rolls-Royce and all so when we are working in the
aviation industry uh we get trained for about six months about what are the parameters and so many
other things related to Aviation and then only your business understanding will come into picture so
if this if if they tell you a problem then if you want to really understand the problem you need to
have a domain expertise so that's the first

step business understanding right and the Second Step will be data understanding so this is
something similar to what we have discussed in the last class so you are basically collecting the data
and getting familiar with the data for example U if you're talking about aviation industry right uh a
business understanding will be something like uh for example there is a lot of turbulence happening
so the airline is complaining that okay there is a lot of turbulence happening and data understanding
so you

know turbulence right like you feel like you're going to die but you really don't die that's turbulence
you know like shaky shaky that's turbulence so data understanding so if you so the question is like
why is it happening I mean it may be happening due to many reasons but can we withstand it or uh
are our engine is really good at handling probably turbulence so in case of GE uh we were collecting
data from sensors so each of these engines will be equipped with lot of sensors so we collect data
from

sensors and we understand the data okay at this particular time what did the sensor report
something like that the third I mean the next phase is data preparation so this is like in case of the
aviation example like when we collect these sensor data there will be lot of faulty data missing data
data which is not in the proper format so we prepare the data data preparation basically means you
you follow certain activities to convert uh the data into proper format that's step number three uh
step number four will be what

we call modeling so I think in the last class people were very curious about machine learning right so
modeling is how do I say part of your machine learning so in this phase modeling techniques are
selected and applied for example uh if you're working in the aviation industry uh there will be lot of
machine learning models which will predict what will happen Yeah so basically let's say you uh you
collected some data from these engines you feed it into a machine learning model and then you ask
the

machine learning model okay what do you see and the next step will be evaluation so this is where
we test the model to ensure they generalized against unseen data so probably what happened as an
example I'm saying so far let's say you're working for GE right I don't know if it is a very complex
example but let's say you're working for GE and you already have a machine learning model and this
model says everything is going to be fine now let's say two days back one aircraft reported
emergency landing

or something so what happens you collect the data and feed it into the model and evaluate it you
know so the model get some new data which was previously not available so now the model says
okay if these these conditions are there then probably the aircraft may not function properly or it has
to take an emergency landing or it'll become faulty so you evaluate it finally you deploy it so
deployment is where you actually deploy the machine learning model and make it work so this is like
an ongoing process

so if I have to explain all this in one shot first you understand the problem business problem I'll give
you a simpler example but in my Aviation example the business problem will be like are my engines
susceptible to turbulent something like that then new data understanding is where you collect the
data from sensors make sure it is the correct data data preparation is where you clean the data and
make sure it is ready in modeling what happens is you train a machine learning model using this data
and in evaluation you test the
model deployment you deploy the model now what is the point of doing all this people used to ask
me so the point is imagine there is a very bad weather as an example I'm saying Okay so let's say
there is a very bad weather and the wind speed is something uh precipitation is this something
something and the flight had to take an emergency land so once you collect the data prepare the
data and train your machine learning model next time if you can look at the weather and if you feed
the data then

the machine learning model can predict okay there is a chance if you fly during this weather the
aircraft might get affected yeah so so that's where the deployment and and continuous evaluation
comes into picture I don't know if I made it like too complicated though people don't understand but
this is an ongoing process right so I mean so any business will follow these steps not only the aviation
industry so that's why we have a very simple example you know the aviation example is good but
let's

take even a simpler example so look at here so a Piza shop want to sell more pizza so let's say our
business business understanding is okay we are working for a pizza shop maybe a very big chain
something like Pizza Hut okay so pizart want to sell more piz so that's the business understanding or
the problem we want to solve now the data understanding will be uh where you collect the data uh
initially and getting familiar with the data so this is where you collect the data uh in a sense like okay
how many

pizzas are sold on a particular day on weekend how many are sold what time of the day the pizza is
sold etc etc right and the next phase is you know uh you know data preparation so in data
preparation what you do is the Piza shop organizes the data in a way that shows which pizzas are
popular at different time of the day so once you collect the data in data preparation you or probably
uh chicken pizzas are very famous during the evening time of weekend something like that so in data
understanding you just collect the data

in data preparation you sort it out and you make sure that the data is presented then you have the
modeling so in our Piza example the pizza shop uses this data set to predict which pizzas will sell best
at different times so in modeling we use machine learning we train a machine learning model so
what we do is we feed all the data all the pizzas we sold probably last 10 years and we tell the
machine learning model hey you know what this is all the pizzas we sold the time and how many
people ate

them everything we have so you learn from this so the machine learning model will learn from this
and it will predict okay on a Sunday this time uh most of the people are going to buy Margarita Piza
something like that right and then there is evaluation so evaluation means the machine learning
model has predicted something but you want to test it right so the Piza shop test these predictions
by comparing them to actual sales so this can be done in a variety of ways okay so one thing that we
do is
we ask the machine learning model to predict something which happened in the past for example
last Sunday we know how many pizzas are sold right last Sunday we have the data we don't share it
with the machine learning model we ask it to predict so if we can compare this with the actual data
we'll come to know okay whether it is good or bad or what is the accuracy of the machine learning
model or you can actually ask it to predict the next weekend and probably be ready for that for an
example like

pizza it's okay but for crucial things you can take a risk right and then there is the deployment phase
so deployment is where uh uh you know you actually make a decision uh as to when and how how
many pizzas needs to be sold so I think this gives you an idea about so this this tool is really good
now you see so even when you are working you want to create something present something you
can use this xmine software this is really good uh this is free you can create your own charts and um
there are many different

ways and hierarchies you can create but but this is what we have right and if I go back to here yes so
this is the example that we were discussing so in business understanding you want to sell Piza uh in
data understanding you collect the dat data how many pizzas when it is sold in data preparation we
figure out popular pizzas and in modeling we create a machine learning model feeding this data
evaluation uh we ask the model to predict something which already happened in the past maybe last
month one

particular day how many pizzas were sold so if the model predicts with let's say 90% accuracy you
are good and in deployment you actually deploy them model in uh your uh industry now this is an
ongoing process because you keep on selling pizzas and the trend might change yeah so maybe I
don't know if it is summer maybe people eat less pizza if it is winter probably people eat more pizza I
don't know so uh those Trends keep on changing you can't really predict everything so that means
this Step 1 2 3 4 5 6 again 1 2 3 4 5 6 this

has to be uh kept on repeating then only uh it makes sense right so that's the uh methodology we
have now one of the very popular uh standard uh that we use or a methodology that we use is called
the crisp it's actually called crisp it stands for cross industry standard process for data mining so crisp
DM is the name of the methodology it stands for cross industry standard process for data mining
now why this is very uh important as you can see it is cross industry so this particular methodology
can be used

in any industry it it's not so this can be Healthcare Finance rocket science whatever you want it's not
limited to only one type of Industry that's why it is called cross industry and it is a standard process
that means it follows uh certain repeatable steps um cross industry standard process for data mining
so data mining is basically the process of discovering patterns and Trends and correlations in data in
general right so this crisp DM is very popular um in every industry right so the piz shop example that
I just gave
you follows Cris crisp DM in fact we created this content following the crisp DM that's what I'm
saying so uh you see when you start working that's what I'm saying uh it's not like you got a job
you're data analyst okay give me the data I'm going to Crunch the data I'm going to kill the data it's
not like that right first you understand the business then you figure out what problem you want to
solve so you need to have a methodology in place and I have seen in lot of I work as a consultant for
a lot of companies I

have seen lot of companies they run out of business because they don't follow a methodology I
mean they will have clients they will have data they will have everything but they really don't know a
methodology to implement so there are a lot of companies who fail because they don't understand
this methodology any methodology it need not be like crisp or or or or any of them actually right and
now we talk about something called analyst mental model what is this if you are going to be a data
analyst

you need to keep a mental model what do you mean by a mental model when you start working you
need to look for three things there are three important things one is the difference between data
and insights second is trans forming ugly data into actionable insights third is communicating data
inside these are the three most important steps for a data analyst so it's a mental model that means
this is not something that you advertise or you talk to people this is inside your head right but in your
day-to-day lives when you're

implementing something you need to keep this within you so that's called uh data analyst uh ment
model like I said there are three things uh let's have a look at each one of them so the most crucial
thing I feel is the difference between data and insight so I mean lot of people have a confusion here
and I'll give you some examples for example when you look at data right collection of facts and
figures is actually data right for example you may be collecting uh data in the form of numbers dates
and strings

and many formats so collection of facts and figures is actually data but what is the Insight right result
of analyzing this data to understand patterns and trends for example let's say you are collecting
Cricut data so when you collect Cricut data it will be like name of the player centuries half centuries
runs code Etc so your data will be just numbers and and string values and whatnot right the Insight
will be finding correlation okay how consistent a player is something like that now if you look at the

structure of the data in a lot of cases the structure will be unorganized and unstructured right uh in
the sense I'll give you an example for unorganized and unstructured data right uh images and audio
files they are very good examples of unstructured data right uh a good example will be let's say
you're collecting 1 million images you're collecting 1 million images now this data is practically
useless because what will you do with 1 million images if I give you 1 million images on your laptop
right now what can
you do nothing absolutely nothing this data is useless but then what kind of an Insight I can get from
this data I can probably use a deep learning technique to understand whether this data contains
humans or animals or Etc and that's what companies like Facebook do so Facebook will have 1
million images from that it can figure out your face and tag so 1 million images is just unru data but
tagging your phas is actually the Insight that it is getting uh now the other thing is your row data will
be in the form of Excel or

spreadsheet or anything like that so sometimes when you collect the data uh it will be in the form of
like spreadsheet right so let's say you are working for a company and in this company all the data is
in Microsoft Excel and there are like millions of lines of data so this is just data how do you get
insights from this the best way will be to use charts dashboards graphs basically visualization
visualization is the easiest way to get insights from this data right so that's another example and

then there are several examples so basically what I'm trying to explain is like row data is one thing
insights are another thing also very interestingly sometimes what happens I'll give you an example uh
let's say you are getting some data from uh College okay so this college says okay uh we are placing
all of students with high salary so you are getting the placement data right so maybe the highest
package is one CR somebody got one CR highest package it's like 30 lakh 20 lakh 12 lakh so what
people normally

do they get this data they create a bar chart and then they say that okay my insight is the highest
package is one CR that's not an Insight it's just an observation anybody can tell you for example in a
cricket match anybody can tell you who scored the most number of runs that's not an Insight that's
just an observation on the other hand and uh in my college example why somebody got a one CR on
the other hand somebody got 10 lakhs that is an Insight you can get this one CR person why did
somebody

had such a difference right or what is the average salary somebody is getting that's an Insight so just
by creating a bar chart I mean people do this all the time they just create a bar chart and say okay we
have a lot of insights for you I don't think that's the the right way to go about I mean again my
personal opinion um and now let's talk about transforming ugly data into actionable insights now
there are several case studies related to this and we are going to share it with you now I may be

discussing one probably or some of the steps I may probably explain but we have curated I think at
least two to three case studies is on this particular thing transforming ugly data into actionable
insights and we will share it with you right I think I can go to my mind map so the advantages you
know you can just zoom in like this so this is another type of Mind map so these are the different
steps in uh you know transforming ugly data into actionable insights so first step is again
understanding the business
business needs right so this is where you identify the key business objective problem to solve
question to answer an example that I personally would like to share is from the e-commerce industry
so in the e-commerce industry let's say a lot of people are cancelling the orders okay so maybe not
cancelling the orders so one thing you can think about is let's say when you open flip cart or Amazon
you have this uh what you call uh cash on delivery right so this cash on delivery is something which is
very attractive

for the customer so let's say I want to buy something I can say cash on delivery that's very nice
because I don't have to pay any money once I get the product I can pay but it is a huge problem for
the company because let's say your flip cart somebody ordered a phone and it is cash on delivery
what you do is you collect the phone you pack it you ship it and once you reach the destination
probably the customer can say I want to cancel right now so you're losing lot of money right because
you have to do all these steps get the phone

pack it ship it the customer can cancel the order at any point in time so let's say this is one of the uh
business problem you want to solve the business problem is I want to minimize cancellation of cash
on delivery Pro orders that's the business uh problem so next step will be naturally collecting the
right data right so since you are the owner of e-commerce company all right uh this might be easy so
probably you will collect all the previous orders which when for cash on delivery you also collect the
non-ash on delivery orders

you do a comparison you take the right data you want uh in cleaning of data we already discussed
this you might uh remove any duplicate orders and maybe some of the orders may not have some of
the values uh some of the orders might contain errors very rare but let's say somebody ordered a
phone uh it is an iPhone iPhone 14 and the price is showing 10,000 you know it is incorrect right so
somehow the system made a mistake so you need to correct it iPhone is never 10,000 rupees right
and some of the order data when you collect

it has missing values you know the price is missing so you need to handle all of them in cleaning the
data so it's not just you collect all the data you need to clean them also and then you you go for
analyzing the data you remember we talked about descriptive predictive and prescriptive analysis
techniques so this is where you try to find out why people are basically canceling the order maybe it
is because of the delivery delay that they are cancelling or maybe they are just window shopping
right so you go through like

what happened how it happened all these faces and then you go for visualizing the data so in
visualizing the data you can use a tool like a tableau or power B and you can probably create a
dashboard or a chart so you can explain this to somebody who doesn't understand the technical
jargons right see this is what is happening due to poor service in this area 33% of customers are
canceling the orders something like that and now you come to making datadriven decisions right so
basically you get the
insights and you make decision so this is a project which I have worked uh where we had to find like
why people were cancelling the you know cash on delivery orders so we figured out there were many
reasons but then you need to make a decision right so how do you make a decision for example you
cannot remove cash on delivery right so if you remove cash on delivery from the website people are
not going to like it because it is a very attractive option so what you do is you create a machine
learning model and that

will predict what are the chances somebody will cancel an order so let's say a new customer placed
an order and uh you know the machine learning model will predict for example okay there is like uh
90% chance this customer will cancel the order then you call the customer there is an automated
system which will call the customer and it will ask the customer okay press one if you want to cancel
two that's how you address the problem so you use a machine learning model in the data driven
decision and that will uh

tell you this right and in measuring the outcome you know you measure the outcome after
implementing decision so now you have the machine learning model uh which will predict whether a
customer will cancel the order or not and now you run this for one month and see that uh whether
the cancellations are declining or not you also use something called metrics and kpis so kpis are
basically key performance indicators uh they are related to the business and one thing one another
example I can give you

is let's say your uh marketing team gives you uh 10 lakh Rupees okay and the marketing team says
okay we have spent this much money on advertisements uh but the sale is still low yeah so the
marketing team want to know why so basically let's say you are working for Reliance Reliance Trends
right where you go for like clothes and shopping and all so reli trends says we are spending 10 lakh
Rupees in advertisement every month we want you to figure out why the sales are dropping so here
one of the kpis will be what we

call footfall okay so footfall is like how many people are visiting your store Yeah so basically when you
enter the store the security or the Watchmen will make a note okay one person entered the store
something like that so what happens is using this you can so that's the kpi so what you do is you say
that okay you are spending 10 lakh Rupees on advertisement and definitely the number of people
are increasing last month there were only 500 people coming to the store this month it is thousand
rupees because of

the offer but maybe due to some reason they're not buying so we are able to attract the customer
we are not able to retain them right so that's how you measure that's that's what you call kpi
basically then there is iteration and optimizing where you continuously analyze and change your
strategy last but not overcoming challenges so overcoming challenges is always a very difficult step
right so for example you want to seek feedback you need to collab team members you need to
improve the data literacy skills suitable tools you
need to use so there will be a lot of challenges when you work out on such uh project right and
sometimes what happens your your knowledge data literacy skills and feedback can be a problem I'll
give you an example right uh when big Bazar open for the first time right you know big Bazar the
shopping company when big Bazar opened for the first time in India uh what they realized is that uh
you know some of the customers are buying for lot of money see normally when you go to big Bazar
how much money

will you spend I don't know I spend like thousand rupees maybe I mean if I'm buying something
normal right or maybe 5,000 maybe 10,000 rupees what the data told is that some of the customers
are spending like 30 lakh rupees 50 lakh rupees on big Bas hey how can you spend 30 lakh I think
today you can and do like you you buy that crazy Sony TV it's like 20 lakh rupees the other day I saw
like there is a Sony TV wor 20 lakh I mean I was like so today you can buy but I'm just saying so so big
Baza thought

this is an error in the system it was not an error what was happening was these shop owners were
actually buying from Big Bazar see the smalltime shop owners right they were actually coming to Big
Bazar and buying in work and selling in their show so that's where you understand uh you know
what's happening within your business so this is a challenge you can't overcome this unless you
understand what's happening right none of this will basically help you to figure it out so lot of
challenges are there basically yeah just

to give you an idea so this is uh basically what we discussed just now is transforming ugly data into
actionable insights that means you first understand the business needs you collect the data clean
analyze visualize all these steps basically right so if I go back H so the same thing that I have
explained just now and it's very important to communicate Data Insights I think uh this is again the
same thing that uh we were discussing uh but communicating data insight is very very important I
will uh give you an example for this

right maybe later uh but communicating Data Insights basically means you start with understanding
the business context same thing right um so anyway for any methodology any approach you start
with the business context and speak the business language see why this is important uh when we
start working right uh I think most of you can relate this your customer or your stakeholder does not
really care about your technical skill yeah because they hired you to do a job see I I guess you
understand what

I'm saying right see you may be um having so many degrees and so many certifications it doesn't
matter if you cannot solve the problem or communicate effectively so communicating the insights is
very very important so again to give you a simple example for this uh this company called Walmart
right Walmart I think back in 2007 they got it into uh data analytics right I think 2007 long back they
they started a data analytics project they started working on it but the problem was their Engineers
were not able to
give any insights or they were not able to communicate so Walmart already had lot of it guys they
were working on the data and they had no clue they were able to figure out some Trends and
patterns but they were not really able to communicate the data insides so then what happened
Walmart hired uh an external consultant so they paid some money got an external consultant and
this guy figured out the famous Walmart beer diaper story so where is it yeah so you can actually
read about this big data Big World the beer diaper

story beer and nappies I mean what is the relationship between beer and diapers I don't don't think
there are any relations right but this external consultant who was hired by Walmart what he did was
that um you know he analyzed the data and he figured out that on Friday evenings people buy a lot
of beer and diaper together seriously I don't know why probably it's a weekend they want their
babies to be happy so they buy diaper and they want to chill so they buy beer I no idea why it is
happening but this

guy was able to figure out this correlation and he was able to communicate and present it to the
company in a very effective way so what Walmart did they started an offer on Fridays if you buy beer
and diaper together you get like 10 percentage off you can read it I'm not making this up okay and
this increased their sales so this is the impact of communication that that's what I'm saying so it's not
just theory that I'm talking about communicating your insights is very important so when you
communicate uh you

normally avoid technical jargons right for example you don't say like okay the regression model
released a coefficient to you so this guy probably will get fired if you say something like that in front
of the customer because the customer has no idea what is a regression coefficient or something right
so you have to avoid these technical jargons tell them okay the sale is low because of this something
like that right and visualization is very very important when you want to communicate right I I don't
have to

explain this to you I believe right um how many of you have read uh I think in my Tableau class I
already discuss this but how many of you have read Game of Thrones book the original book uh well
I think there are like three books or four books seriously I have not read because each book is like
some 600 pages and you will sleep if you read yeah if you're a true Game of Thrones fan baby you
will read but how many of you have watched The Game of Thrones series lot of you so that's
visualization right data is in the books

but when you visualize it's more interesting people can understand anybody can understand same
happen in the industry also you can throw a set of spreadsheets to anybody and they won't
understand anything but at the same time if you create a chart graph nice color people are going to
love it and tell a story so this is not just Theory again okay there is a very important phase we call
storytelling I can give you an example right I have created a story maybe I can show you this story
yeah see this is a story I created about
U you know this is visualization as you can see this is about um the price of houses in different area
so you see these bar charts represent like the streets and the price of houses so this is talking about
total assets in an area then this is talking about the sale price first price and last price for houses then
the next chart is talking about you know different areas so you can see I have added some story lines
here the high cost areas are this average square feet is in this area more days to cell is this area etc
etc right

so basically you combine your insights into what we call a story and communicate this with your
customer or anybody finally you come up with the insights so this is the Insight that I got from the uh
data set we had this was basically houses which were sold in an area m is the best area for medium
class with average tax ra is lesser lesser price than M area HS is the area for little better than medium
Class K is also High area C is the costly area there is no regular pattern on each year in terms of price
increase however there

is a seasonal pattern on sale June to September is the sale month this is how you present a story you
know you will have charts and each chart will have some insight so this particular bar chart is saying
that yearly M and HS area are selling more only five months of 1993 crosses the sale of other three
years so finally you you so you can create this story in Tableau this is just a simple example you can
have more complex stories so when you are going to your customer or when you are going for a

presentation uh you you create a story it's not like uh you just blabber whatever that comes to your
mind also you can show you cannot show them the data right they don't understand Excel a lot and
all if you create visualizations also they're going to ask you create a bar chart they're going to ask
what is this bar chart all about or maybe you created a pie chart they'll ask what is this pie chart all
about so you to come up with a story storytelling is a very important skill in communication coming
back to my original PPT so speak

the business language visualize the data tell a story focus on actionable insights right so this is where
you may have lot of insights but you concentrate on those inside which are actionable for example
one of the Insight you came up with will be like uh we have to increase the price for this particular
product now probably the team will say that we cannot increase or you you will say that okay the
marketing the marketing team need to spend $1 million that's your Insight but that's not going to
work right so it's not like you know

everything is going to work right so your insights should be actionable somebody should do
something about it uh practice active listening that's a very common thing right I mean listen to your
clients questions understand what they want seek feedback so when you implement something
always seek feedback and continuous uh development of these things so communicating your
insights is very important and this is what we call a data analytics life cycle all the things which we
have discussed in probably one
slide business understanding data collection um then data cleaning extracting insights
communicating choose the right model and monitoring the outcome let me give you an example so
that you can understand the data analytics life cycle the first thing is business understanding
naturally you need to understand what is a problem you are trying to solve a real life example will be
let's say you are a school principal okay and you notice that the attendance has been dropping
students attendance you want to understand why

the attendance is dropping so that's your business understanding First Step second is data collection
right so you want to collect the data related to attendance right so you collect probably School
attendance records students health records academic performance even weather data so you want
to know why there can be many reasons right so you collect the actual attendance maybe the
weather data uh health record of students whatever data that you can get in data cleaning now you
have all the data right

so you probably want to remove any errors or inconsistencies etc for example one student says I was
marked absent but I was present on that day it's a mistake right or the teacher forgot to mark him
present something like that so in data cleaning you follow all these processes in um extracting the
insights interpreting and extracting the insights now you start looking for patterns right so you realize
that okay the attendance is dropping when it rains right or you realize that students with lower
grades are more likely to be

absent so two things you figured out if it rains a lot there is like you know lot of absence and the
other thing is like students with low marks they tend to be absent now you communicate the results
right so you have to share it with so maybe you created a presentation a chart or something you
connect it with rainfall and attendance and show them right and then you choose the right model so
now now you have to use statistics and machine learning to predict the attendance of future I mean
it's a simple example you probably don't

use machine learning in a school but let's say that's a use case you feed all this data and you ask the
model okay next month what is the average number of students who will be present or who will be
absent that's choosing the right model and then monitoring the output so you have implemented
some changes say for example you provided some extra rain quods to students you called some of
the parents and asked them like why are they not sending right and then you start monitoring for the
performance so this

is what we call the data analytics life cycle basically right uh and now if you want to quickly recap uh
this is what we discussed uh maybe if you look at the whole session we had today this is what we
have discussed so if you look at here we start by understanding the business context speak the
business language visualize tell a story focus on actionable insights active listening send feedback and
continuous learning and Improvement and these are the topics we have for the next session the
third
session uh for example introduction to data quality so data quality is a very interesting topic okay so
what do you really mean by quality of data and how do you ensure that you have data quality etc etc
uh and there is something called root course analysis RCA so whenever there is any problem in any
industry we do RCA root course analysis what and then organizational skills so these topics will
extend beyond this I mean like these are just the headlines so that's all we have for [Music] today

You might also like