The Data Science Guide
The Data Science Guide
Data Science
CAREER GUIDE
The Data Scientist
Some years ago, a completely new position appeared on the horizon –
was no ‘data’ on it. Quite quickly the laws of economics started governing
the recruitment and the scarcity of supply drove up the price of data
scientists. According to Glassdoor, since 2016, data science has been the
In this guide, we show you the different career paths you can take if you
Medicine
Probability
Microsoft Excel
This table summarizes the career Statistics
Python
will find detailed information about
SQL
the positions and the courses that
Tableau
you may pursue in order to land a
Advanced Statistics
job in the industry.
SQL + Tableau
Machine Learning
Opportunities
Data science presents many opportunities for people who are We at 365 DataScience have prepared this guide for
quantitively inclined, some more than others. Either way, one can easily three different data science careers that you may want to
pursue.
imagine how some data science is, or will be, required for positions like:
✓ Marketing Analyst;
✓ Business Analyst;
✓ Data Analyst;
✓ BI Analyst;
✓ Data Scientist
The BI analyst has two defining traits: they work with inhouse data and have a
business orientation. These also define the two main skillsets needed - data and
business.
To be more specific, let’s say you have been tasked with preparing a report about
how long computers have been on in the office (uptime). If you are the first to
ever do this, you will need to plan your data journey, design your metrics, gather
the data, and eventually analyze it. You will be expected to visualize it in a
manager-friendly way and tell the story of office computer uptime.
Becoming a BI analyst combines the worlds of business and data. All the skills
involved are easily transferable into other business or data science positions.
What is the required expertise for a
Business Intelligence Analyst?
The Business Intelligence Analyst
The following list encompasses the main competencies that you may be
expected to have when entering a company. While it is highly recommended
that you are proficient in all of them, responsibilities vary from employer to
employer. Two BI Analysts could be asked to perform completely different
activities, even in the same department. This is a product of the specialization
of labor that is observed in the current economy.
No matter the particular job, you will be required to have at least conceptual
knowledge of these activities.
Expertise of a BI Analyst
1. Data
Data related responsibilities could be
arranged in four main categories:
Gather, Maintain, Design, and Analyze.
04 Processing and Analysis
2. Business
Business related responsibilities could also
be arranged in four categories: Identify,
Evaluate, Report & Recommend, and
04 Optimize staffing levels, personnel responsibilities;
automate processes
Optimize
03
Prepare financial and operational reports.
Visualize using rich and dynamic
dashboards. Tell a story and recommend
SQL + Tableau
1. Intro to Data and
Data Science
The best way to start exploring a
position in data science is with a
comprehensive introductory guide
such as this one. However, a better
alternative is to take a course which
aims to summarize, organize, and
explain all data science buzzwords,
terms, tools, approaches, and
techniques. Only after you have seen
the bigger picture can you put the
pieces of the puzzle together and dive
into studying data science.
1. Intro to Data and Data Science
The image on the right can help us gain an idea
of the relationship between different fields in
data science. Moreover, it provides examples of
real-world activities related to each data science
field.
Identify
Perform
and report
• Customize statistical tests • Visualize your data
3. Statistics
ANOVA testing
Margin of
error Central limit
At the workplace, a BI Analyst is expected theorem
to understand the root of various
problems. She should be able to rapidly
Basic
identify possible reasons for both under-
and overperformance of certain metrics.
Statistical
significance
concepts
Correlation
While business judgement is needed,
data-driven decisions are formed
through statistical tests.
p-value
Distributions
Regression
Confidence
analysis
intervals
4. SQL
SQL is a domain-specific language that
serves the niche of relational database
management. It is mandatory for
anyone employed in data science to be
able to work with databases and SQL is
the way to go. There are different
platforms for SQL, such as Oracle,
MySQL, and Microsoft SQL Server.
While they have their own peculiarities,
the underlying language is virtually the
same.
4. SQL
At the workplace, one often needs information from the
database. There are two options: extract it on your own,
or contact the IT team. When you are the BI Analyst, you
usually need all data at all times and don’t want to
depend on another person. Apart from utility, it is also
the responsibility of a BI
to interact with a database
and pull whatever is needed Export
for her data-driven decision. Transform
Load
Design
5. Tableau
The best description of Tableau comes
from its creators: ‘Tableau can help
anyone see and understand their data’. It
is the leading visualization software in
the business intelligence and data
analytics field in the recent years.
Whenever you see beautifully visualized
data, chances are that Tableau has
something to do with it.
Visualize data with customized tools Analyze KPIs with fresh eyes after
for just about any purpose. Report seeing what your data actually means
by sales, location, focus group, and and present it in the most engaging
much more. way.
Plan your data journey. Once you connect your database Raw data is rarely suitable for It is up to you to create the
Professional visualization implies with Tableau, you can write queries visualization right away. More often visualizations intended in the beginning.
thinking your problem through to extract information directly in the you must use some degree of Tableau provides a seamless experience
from data collection to the axes interface. Additionally, you can preprocessing to design the metrics both for single plots and professional
of your final plots. combine several databases. you’ll later visualize. dashboards.
FAQ at interviews
1. Describe the different parts of an SQL query.
2. What is the difference between INNER JOIN and OUTER JOIN?
3. You have a table called with Cust_ID, Order_Date, Order_ID, Tran_Amt. How would
you select the top 100 customers with the highest spend over a year long period?
4. If you were stuck on a desert island with a database that contained all the
knowledge ever created, but you only had 10 SQL statements that you could ever
use, what would they be?
5. What is the difference between DELETE and TRUNCATE? What is the difference
between UNION and UNION ALL? What is the difference between a WHERE
statement and a HAVING clause?
6. The conversion rate for a specific chair is 0.5% for the first 50,000 shoppers that look
at it. The price of the chair is $250. Our company makes 27% profit on the sale. The
next 50,000 shoppers will get a 10% discount. What is the conversion rate we must
achieve to receive the same profits as before?
FAQ at interviews
7. What experience do you have with Tableau? Our BI team is brand new and is
under-financed. We have no standard procedures or training and everything
is ad-hoc. How would you go about this situation?
8. You get X amount of views on a website, Y amount of people click on the ad,
then Z amount of people enter their names after, where X, Y and Z are given.
How much does it cost to acquire a customer? What’s the conversion rate?
Would it make sense to run the campaign comparing the value of customer
acquisition to the revenue gained from conversion rate.
9. You have been asked to send an e-mail campaign to customers that have
made a purchase on Amazon.com in the past but not recently. How you
would go about the process. What query would you use?
FAQ at interviews
10. Our firm is going to send 2 different catalogs to their customers. One of the
catalogs costs 50 cents to make and is 50 pages long. The conversion rate for
the catalog is 5% and each customer brings in 315 dollars. The second catalog
costs 95 cents to make, is 100 pages long and each customer brings in 300
dollars from it. The profit margin is 30%. What should the conversion rate for
the second catalog be to make at least the same amount of profit as the first
one. After you find the conversion rate for the second one, there is a second
part of the problem. Wayfair is planning to make a new catalog which is going
to cost 10 cents more than the 100 page one. The more expensive catalog is
going to be sent out to 20% of the customers while the remaining 80% are
going to get the 100 page one. Assume the same 30% profit margin and $300
profit from each customer. What should the conversion rate for the new
catalog be in order to receive the same profit at the end?
The Data Analyst
The Data Analyst Track
The data science department of companies is the most rapidly growing one in recent
years. The data analyst is the building block of a data science team. While more and
more individuals learn about the position, many still do not understand the nature of
the work or simply don’t have the skills to perform the job of the Data Analyst.
The Data Analyst is similar to the BI Analyst. While a BI analyst performs technical
analyses based on large datasets, the data analyst creates and runs complicated
statistical models to not only extract insights but also predict outcomes. Ideally, the
Data Analyst has deep statistical knowledge and superior programming skills; this
makes her much more capable than the BI Analyst to work with big data. However,
less business knowledge is needed to be a Data Analyst – it’s actually all about the
data.
Main functions of the Data Analyst are gathering data, creating and running models,
trend analysis, testing, visualizing, making recommendations, and storytelling.
The Day of the Data Analyst
There are three major activities for a Data Analyst: data cleansing & management, programming &
Let’s say that you are a Data Analyst and you are asked to create a model which identifies units that
are likely to become faulty. How would you typically approach the task? First, you would get your
data, or design a way in which you can gather data in a given timeframe. Then you would create a
model that fits the observed dependencies (e.g. the more an item is used, the more likely it is to
break down). Once there, you’ll test your models and achieve a certain level of accuracy. Finally, you
would gather up the findings and create a presentation tailored to your audience. This usually
means manager-friendly and light on data and methodology, and you will explain what you found
Being a Data Analyst equals swimming in data. The more projects a Data Analyst has been through,
the deeper understanding of data analytics and predictive modelling in specific she’ll have and the
We have prepared a summary of the required skills for a Data Analyst, based
on the responsibilities that employers assign to the position.
The following list comprises of the main competencies that you may be
expected to have when entering a company. While it is highly recommended
that you are proficient in all of them, responsibilities vary from employer to
employer. Two Data Analysts, even if they are sitting side by side, may be
asked to perform completely different tasks. This is a product of the
specialization of labor that is observed in the current economy.
No matter the particular job, you will be required to have at least conceptual
knowledge of these activities.
Expertise of a Data Analyst
Filter, profile, assess third
party, acquire new,
control the quality of data
Design, implement, map, ETL,
Continually process raw data,
identify weak spots
test, optimize, move across
platforms, extract in different
formats
Data cleansing
Programming
Visualize and
Presentation of findings
tell the story
Analysis of the data of the data
Sophisticated statistical models
Identify
Perform
and report
• Customized statistical tests • Visualize your data
3. Statistics
testing
Margin of
error Central limit
At the workplace, a Data Analyst is theorem
Regression
Confidence
analysis
intervals
4. SQL
SQL is a domain-specific language that
serves the niche of relational database
management. It is mandatory for
anyone employed in data science to be
able to work with databases and SQL is
the way to go. There are different
platforms for SQL, such as Oracle,
MySQL, and Microsoft SQL Server.
While they have their own peculiarities,
the underlying language is virtually the
same.
4. SQL
At the workplace, one often needs information from the
database. There are two options: extract it on your own,
or contact the IT team. When you are the Data Analyst
you usually need all data at all times and don’t want to
depend on another person. Apart from utility, it is also
the responsibility of a Data
Analyst to interact with a
database and pull whatever Export
is needed for her data- Transform
driven decision. Load
Design
5. R
R is a programming language specifically
designed for statistical analysis, data
manipulation, and graphics.
BIG DATA
R is designed to handle extremely
big data sets, usually gathered by
Bayesian
Transform inference
Basic statistics lays the foundation of the field and
focuses on frequentist inference. Advanced
statistics builds upon it, entering multi-
dimensional spaces, through knowledge of Advanced
mathematical methods, transformations and
Regression
topics
distributions. Moreover, more complex means of Decision
theory
analysis are introduced, such as regression,
classification, clustering and factoring. Finally,
Bayesian inference and decision theory allow the
Data Analyst to solve problems of dynamic and/or
behavioral nature. Factor Classification
Cluster analysis
analysis
FAQ at interviews
1. If you have a 10x10x10 cube, what is the outside surface area?
2. You have 10 bags with 10 stones each. One of the bags is lighter than the
others. Using a digital scale, how would you figure out which one is it with
just one weighting.
3. What is the sum of numbers from 1 to 100?
4. A snail falls down a well 50ft deep. Each day it climbs up 3ft and each night
falls down 1ft. How many days does it take him to get out?
5. How many SUV’s in the parking lot downstairs?
6. What is the difference between UNION and UNION ALL? What is the
difference between DELETE and TRUNCATE? How would you find median
value for a given columns?
7. Identify the issues in this excel spreadsheet.
FAQ at interviews
The Data Scientist is on top of the data science ladder. However, describing her job gives
everyone a headache. In fact, the Data Scientist has such a slippery definition, that if you
look in five places, you will find five different definitions of what a Data Scientist is. For us at
365 Data Science, a Data Scientist is a person who has a broad range of knowledge in
multiple disciplines, while specialized in one or two. She understands the business processes
of a company, including marketing, strategy and sales, but also engineering and product
development. Nonetheless, where she truly shines is machine learning and statistics.
Main responsibilities of the Data Scientist are gathering data, structuring databases, creating
and running models & analyses; strategy, marketing, product placement, pricing, making
recommendations, and telling the story of the data.
The Day of the Data Scientist
‘A data scientist is a better statistician and economist than most programmers, a better programmer and
economist than most statisticians, and a better statistician and programmer than most economists’
An example: Think about a hotel chain. One of the most vital activities for them is revenue management.
There are two important considerations. First, some days are much more important for a hotel than
others. Second, customers are willing to pay several times higher prices depending on time of the year
and location. A data scientist can apply different statistical methods together with domain expertise to
identify the most important days. For the pricing part things are different. With machine learning in the
data scientist’ toolbox, she can predict with extreme accuracy the highest price customers are willing to
pay for a particular date and hotel. Best part? The whole process can be performed in real-time and
completely automated.
Given that the Data Scientist swims in data, and data is rarely super nice; she faces challenges at every
turn. But the more projects she goes through, the deeper her understanding of the business and machine
learning becomes, and the more valuable she is to any employer (or client).
What are the required expertise for a
Data Scientist?
The Data Scientist
We have prepared a summary of the required skills for a Data Scientist, based
The following list comprises of the main competencies that you may be asked
Any two Data Scientists are different. This is because each one of them is
No matter the particular job, you will be required to have at least conceptual
Advanced Statistics
01 Develop statistical models based on internal
and external variables, analyze using predictive
multidimensional analysis
Machine learning
02 Research, design, and execute algorithms for
numbers, text, emotions, images, decisions
and many more
Storytelling
03 Make a story out of the data & the machine
learning outcomes and tell it to people who
do not have technical knowledge
How should you approach a Data
Scientist career?
Landing a Data Scientist job, depends on these skills
Machine learning Tableau, and SQL, Tableau, and Python will give a
Identify
Perform
and report
• Customized statistical tests • Visualize your data
3. Statistics
testing
Margin of
error Central limit
At the workplace, a Data Scientist is theorem
Regression
Confidence
analysis
intervals
4. Tableau
The best description of Tableau comes
from its creators: ‘Tableau can help
anyone see and understand their data’. It
is the leading visualization software in
the business intelligence and data
analytics fields in the recent years.
Whenever you see beautifully visualized
data, chances are that Tableau has
something to do with it.
Visualize data with customized tools Analyze KPIs with fresh eyes after
for just about any purpose. Report seeing what your data actually means
by sales, location, focus group, and and present it in the most engaging
much more. way.
BIG DATA
R is designed to handle extremely
big data sets, usually gathered by
Bayesian
Transform inference
Basic statistics lays the foundation of the field and
focuses on frequentist inference. Advanced
statistics builds up on it, entering
multidimensional spaces, through knowledge of Advanced
mathematical methods, transformations and
Regression
topics
distributions. Moreover, more complex means of Decision
theory
analysis are introduced, such as regression,
classification, clustering and factoring. Finally,
Bayesian inference and decision theory allow the
Data Scientist to solve problems of dynamic
and/or behavioral nature. Factor Classification
Cluster analysis
analysis
9. Machine learning
Machine learning is often confused with
artificial intelligence. In reality, machine
learning is a revolutionary approach to
developing AI programs, but is not the AI
itself. One of the definitions of machine
learning is: ‘extracting knowledge from
data”. In fact machine learning is closely
related to data mining and statistics. In the
context of data science, the Data Scientist
will be looking for ways to analyze the data
using machine learning algorithms, in order
to solve problems that are too complex or
incomprehensibly big for the human brain
to process.
9. Machine Learning
Machine learning is a relatively new field
that is constantly evolving. In order
Supervised Unsupervised Reinforcement*
create and run machine learning
algorithms, one needs solid statistical
In supervised ML, the In unsupervised ML, the In reinforcement ML, the
knowledge and programming skills. In algorithm’s goal is to find algorithm’s goal is to goal of the algorithm is to
the best way to perform reach a result, which is maximize its reward. It is
the field of data science, most often, the task given by the unknown to the inspired by human
researcher. It ‘learns’ what researcher. Once an behavior and the way
machine learning is divided into three
the approach is output is given, the data people change their
subsets: supervised, unsupervised, and (mathematically, finds the scientist is expected to actions according to
perfect fitting function for interpret what the incentives, such as getting
reinforcement machine learning. Each of the problem). program has done. a reward or avoiding
punishment.
them is based on different traditional Common methods: Common methods:
• Regression • Clustering Common methods:
statistical methods, thus has different
• Classification • Decision process
strong sides and shortcomings. • Reward system
*The literature on the topic divides machine learning into supervised and unsupervised. In AI frameworks, reinforcement is typically considered a subset of supervised and/or unsupervised. However, in the
field of data science, it is common to divide it in a distinct subset due to the nature of the methods used. That is also the classification that we have adopted.
10. SQL + Tableau
Knowledge of SQL and Tableau are two
indispensable skills for a Data Scientist.
What truly distinguishes an analyst from
his/her peers is interdisciplinary
knowledge, breadth, and ability to
combine expertise from different
domains. One of the most impressive
ways you can differentiate yourself from
other analysts is the ability to work with
data from the very source and then
present it through beautiful, meaningful
and professional visualizations.
10. SQL + Tableau
Plan your data journey. Once you connect your database Raw data is rarely suitable for It is up to you to create the
Professional visualization implies with Tableau, you can write queries visualization right away. More often visualizations intended in the beginning.
thinking your problem through to extract information directly in the you must use some degree of Tableau provides a seamless experience
from data collection to the axes interface. Additionally, you can preprocessing to design the metrics both for single plots and professional
of your final plots. combine several databases. you’ll later visualize. dashboards.
11. SQL + Tableau +
Python
Knowledge of SQL, Tableau, and Python
are all indispensable skills for a data
scientist. While traditionally different
activities are performed with each tool,
Tableau developers have worked hard on
integrating them. This advancement has
brought about new ways to present data.
Interactive dashboards and real-time
maps, created by machine learning
algorithms, coded in Python, with SQL
data, all visualized in Tableau. The
ultimate data science experience!
11. SQL + Tableau + Python The dashboard user inputs
data and gets a prediction
in the data scientist arsenal. In the image on the right You can input Minutes listened, Price paid,
Review, Completion rate, and Support
you can see a beautiful dashboard with text, Tableau requests for a given customer.