Different Roles in Data Science
Different Roles in Data Science
It is
imperative that you understand how these diverse data science roles and functions di9er from each
other and how they come together to solve any business problem. A complete data science
setup consists of roles such as data engineers, business analysts, data scientists, and data
analysts. These professionals come together to solve any business problem using a data-driven
approach. Although it might seem that their skills and responsibilities look similar, that's rarely the
case.
As we mentioned that a complete data science setup consists of data engineers, business
analysts, data scientists, and data analysts. Let us understand the roles and responsibilities of
each of these professionals.
Data Engineer
A data engineer is responsible for creating robust pipelines to make sure that the team of analysts
and scientists get the right and relevant data to work on in any data analytics setup. A data engineer
is the first line for any analytics project, making sure that the right data is extracted from the right
sources using the right tools. Once the data is captured, the next step of the process is the analysis.
Business Analyst
A key player in the analysis process is the business analysts. These professionals traditionally
looked after helping organisations building strategies based on the insights they develop from their
current business, their experience so far and their gut feel. However, with the advent of new
technologies and with the increase in data, organisations can now leverage their data and draw
meaningful conclusions out of it to study their current business and also plan for their future.
Once they devise technical solutions and set up the evaluation metrics designed for the problem, it
is also their task to explain all the insights drawn to the higher management. This is done with good
visualisation and communication and data storytelling skills. They, of course, need to have a
stronghold on some business processes as well.
Data Analyst
A data analyst is often asked to work on business problems involving data with complexity and
therefore, it is imperative that they not only understand how to program the techniques well but
also understand what techniques should be used in what kind of a business problem. They perform
a rigorous analysis and interpret conclusions that not only help the organisation solve the business
problem but also set some policies for them in the data-driven approaches they adopt in the near
future.
Data Scientist
Like any other scientist, a data scientist has a fair few numbers of tools in her/his arsenal, which
he/she uses based on the situation. For example, if the problem is obvious enough, it can be solved
using a simple visualisation, which will be made in a BI tool like tableau or even MS Excel to some
extent. As the problem becomes more complex, the tools to be used to crack them become more
sophisticated. They might range from the plain vanilla logistic regression to an ensemble of neural
networks built to utilise data coming from an NLP functioning.
Additional Reading
The following links will help you understand more details about the structure of any general data
science setup.
What it means is that to be a successful data science professional in either of the roles mentioned
earlier, you need depth of knowledge and align your skill sets according to the specific
responsibilities you shoulder. We understand that choosing the right profession in data science is
not an easy task.
Data science and analytics, as you’d know by now, is not a single monolith. It is like any other
industry, the work of a team—a team with a diverse set of skills and experiences, which
complement each other in the best way possible to accomplish the entire project.
Data science is an emerging field and various professionals come together to solve any business
problem in any data science setup. One of those professionals is data analysts and people working
in the business intelligence stream.
• Diagnostic Analytics: This field solves business problems by answering the question ‘Why
has it happened?’
• Predictive Analytics: This field solves business problems by answering the question ‘What
should be done in future based on the findings of the business problem?’
• Prescriptive Analytics: This field solves business problems by answering the question
‘What can be done based on the current situation?’
They deal primarily with looking keenly at the data, perform open-ended analysis on the data,
improving the quality of data, and also performing rigorous analysis on the dataset gathered and
drawing meaningful conclusions and insights. Business intelligence refers to the complete
business strategies of any organisation that help them with their analysis of data.
To summarise, data analytics refers to leveraging data and transforming it into useful information.
Data analytics and business intelligence are both highly valued professions that are tasked with
enhancing the decision-making process by leveraging the historical data and meeting the business
needs of the organisation by carefully devising data-driven solutions or strategies. Therefore, it is
paramount that the data analysts have good programming skills to work with messy data, fix any
data quality issues, and derive meaningful insights.
These professionals also need strong communication and visualisation skills that not only help
them communicate well to the technical stakeholders but also help non-technical stakeholders
visualise the insights received from their organisation’s data.
The data analytics specialisation is geared towards learners who've good programming skills or at
least willing to put in some practice to be one. A data analyst may be often tasked with performing
an open-ended analysis of the data in addition to data cleaning and exploration. These tasks often
require good programming knowledge and an exploratory mindset to understand the nuances of
data, validate any hypothesis and generate interesting insights. Knowledge on the functional
domain and a keen business sense will help the data analyst a great deal in their task of
understanding the data and help make sense of any anomalies or any interesting insights they may
come across.
Recall in the about Data Analytics, which had given an overall introduction to the analytics
landscape and some fundamental terms associated with it.
• Descriptive Analytics: This field explores the business problems and answers the
question ‘What has happened?’
• Diagnostic Analytics: This field explores the business problems and answers the question
‘Why has it happened?’
• Prescriptive Analytics: This field explores the business problems and answers the
question ‘What can be done based on the current situation?’
• Predictive Analytics: This field explores the business problems and answers the question
‘What should be done in future based on the findings of the business problem?’
• Making sure that the data science solution is valid and follows the existing business
knowledge
Based on the techniques used, the analysis is done and the insights are drawn, business analytics
can help organisations derive better strategies for their growth.
Business Analysts act as a bridge between the business and technical teams. An adept business
analyst must be able to understand the core requirement from business leaders who may/may not
be very tech-savvy. They also need to apply business principles and solve the business problem by
first translating it into suitable data problem and also fixing the evaluation metrics. They also need
to explain these insights to the leaders who would not be very tech-savvy.
The insights drawn from these analyses should also be communicated well to both the technical
and non-technical audiences. And for this, the business analysts need to have good data
storytelling skills.
We're currently living in the digital era. One of the first examples of digitisation was the advent of
emails. With emails, message sharing has become convenient and time-saving. Hence, we still use
emails for both personal and professional message sharing. Almost all the services around us have
become digital, from booking tickets to transferring money to another bank account.
Ever wondered how businesses are storing, processing and analysing the huge amount of data
every day/hour/minute/second? Data engineering holds the key!
Today we are generating 2.5 quintillion bytes of data every day, and now it is mainstream for the
businesses to make data-driven decisions.
While we know the role of Data Scientists, let's get an answer to the question where do data
engineers fit in?
Data engineers are the first members of the team to handle the data. Data Engineers builds and
tune the systems that fuel the data into the analytics funnel. The most common task performed by
a data engineer is to build data pipelines, which involves sourcing data from multiple sources,
followed by combining, cleaning and formatting and storing it to one or more tables for analysis or
model building performed by the data consumers down the funnel.
Although the job profile of a Data engineer is not famous as much as of a Data Scientist,
nevertheless, both of these opportunities are growing at a fast pace. The role of a data engineer is
one of the most prevalent roles in the data industry. It has been estimated that for every single data
scientist, an organisation needs at least three data engineers to fuel in the data. This number tends
to increase depending on the complexity of the big data problem an organisation is trying to solve.
Please note that with the increasing adaptation of the big data technologies, the job titles such as
'Big Data Engineer' and 'Data Engineer' are being used almost interchangeably in the industry.
Additional Reading
The below links provide additional reading for understanding more details about data engineering
and its job role.
Big Data Engineers: Myths vs. Reality | Myth Debunked | upGrad
What is data engineering? What does a data engineer do? What are the common responsibilities of
a data engineer?
Big Data, for better or worse: 90% of world's data generated over last two years
The skills of a data engineer can be broadly described as a combination of software engineering
and data handling skills. The preferable background for a data engineering candidate is-
• Or a relevant data experience, for example, ETL developers, database administrators, data
analysts, data warehouse engineers, etc., with a firm grip on an OOP language
Programming Languages
The candidate must be proficient in an object-oriented programming language such as Java, Scala
or Python. Since data engineers often deal with databases, they are also required to be skilled in
advanced level SQL.
While the skills set and the underlying technology is subjective to an organisation, let’s list down
the skills that are often a part of a Data Engineering JD.
• Understanding of distributed systems and the Hadoop ecosystem
Distributed systems are at the core of any big data framework. The Hadoop ecosystem is one of the
most widely used big data frameworks in the industry. Hence, an understanding of these systems
forms the core skills of data engineers.
As we have discussed already, data engineers have to deal with databases often. Hence,
knowledge of concepts such as data management, entity relation diagrams, normalisation and
dimensional modelling is also included as a requirement in the JD.
A large part of the data that we are generating today is unstructured and cannot be handled by
traditional database systems. This is where NoSQL comes into the picture. NoSQL stands for Not
only SQL, and these databases are non-relational databases that can handle big data e9iciently in
certain use cases. Some of the most commonly used NoSQL databases are Apache HBase,
DynamoDB, MongoDB and Cassandra.
We cannot adequately describe a data engineer’s job without talking about data processing, be it
cleaning, transforming, combining or formatting data. Apache Spark, Apache Flink and MapReduce
are some of the commonly used tools for this purpose.
ETL stands for 'Extract', 'Transform' and 'Load'. It combines these three processes to bring the data
on multiple systems to a single system and prepares it for analysis. This single system is referred to
as a data warehouse. Some of the commonly used ETL and data warehousing tools include Apache
Sqoop, Apache Flume, Apache Hive, Apache Impala, Amazon Redshift and Informatica.
While ETL defines the framework where data is processed and analysed as per the use case, there
are certain use cases where we may need to process and analyse data in real-time. Tools such as
Apache Kafka, Flink, Spark, Storm and Amazon Kinesis are used for handling data in real-time.
• Architectural projections/design
We have already discussed ETL and real-time pipelines separately. However, in a real-life industry
scenario, a big data solution would require both ETL and real-time pipelines to work in sync. This is
why a data engineer must also have an understanding of most of the big data frameworks,
platforms, libraries and other resources in terms of their features and use cases so that they are
able to design a complete data pipeline using various tools.
• Automation
In a big data architecture, several tools are required to work in sync to perform a particular task. The
primary responsibility of a data engineer is to design, develop and run these systems. Workflow
management platforms such as Apache Airflow, Apache Oozie, Azkaban and Luigi help data
engineers to automate data pipelines.
As data engineers work closely with data scientists, analytics and machine learning skills are also
required for a data engineering profile. Many big data tools o9er languages/libraries to analyse big
data. For example, Apache Hive provides an SQL-like interface to query big data, and Apache Spark
provides a set of libraries to perform exploratory data analysis and machine learning.
Career Landscape
Now, let’s take a quick look at the career landscape of a data engineer.
• They are specialised in a set of big data tools. They spend their day tuning and developing
various parts of a data pipeline.
• They are specialised in the overall big data ecosystem. They design and deploy data
pipelines into production environments.
• Apart from expertise in the big data technology stack, they have the required project and
people management skills to lead a team of data engineers.
• They have a firm grasp on big data systems, and they design optimised and cost-e9ective
solutions for data engineering problems.
Principal Data/Big Data/Solution Architect
• They provide end-to-end big data solutions for big data problems at an organisational level.
Technical Director
• They manage the entire vertical and align it with the business requirements.
Additional Reading
Data science is a very vast and evolving field, so it is natural that is filled with buzzwords. With the
exponential growth in data collection technologies, one data type that has come to the forefront is
text or in simpler words, language. It should come as no surprise that one of the most common task
of a data scientist is to deal with text and hence the need for this specialisation, Natural Language
Processing. Speech and text is the most common method in which human species exchange
information. And if we want to move to a future wherein we can have virtual assistants to serve us
like Jarvis serves Iron Man, it is imperative that we are able to communicate with machines in the
same way we communicate with fellow humans.
Natural Language Processing is how we attempt to do the same. If you have ever tried learning a
new language, you'd know how di9icult it is. To replicate the same in a computer is one of the most
challenging problems of our era. It is in fact so important that Alan Turing made conversing in
natural language the centerpiece in his test for intelligence.
So if you are interested in trying to decode the myth and the legend that is the human language, this
is a field waiting for you. Now that you understood what natural language processing is and why it is
considered a worthy challenge, you need to understand the key tasks performed by an NLP
engineer.
A sound knowledge of fundamental statistics and common machine learning algorithms is a must.
Now let’s take a look at the key skills mentioned in a typical JD:
• Use e9ective text representations to transform natural language into useful features
• Find and implement the right algorithms and tools for NLP tasks
As discussed, the NLP track focuses exclusively on learning about the various techniques and
machine learning models that can processs textual information and due to the inherent challenge
involved in the processing natural languages, an NLP engineer would require good programming
skills to write e9icient code and use novel machine learning techniques. The NLP field is a
promising field that's constantly evolving with new and improved techniques being created at an
unimaginable speed. It's, therefore, essential that professionals in this field constantly keep
themselves updated of the latest trends. If you find the domain of language understanding excites
you, the NLP track is a good track to specialise.
Data science is a very vast and evolving field, so it is natural that is filled with buzzwords. With the
exponential growth in data collection technologies, one data type that has come to the forefront is
text or in simpler words, language. It is hence no surprise that one of the most common
buzzwords in the current climate is Deep learning. most of the information that human beings
consume is in the form of images. One of the most important senses of humans is vision and what
we see and analyse at each second is unstructured data. Think about the countless google images,
youtube videos, CCTV feed, X-ray images, etc
Eyes are widely considered to be the most powerful sensory organs and vision is considered by
many to be the single most important reason for the advancement of the species. In case we want
to really make an intelligent system, it is imperative that it can use vision to gather information
around itself and make relevant decisions.
To achieve this, no better system exists better than the human brain. The brain with its complex
system of billions of neurons is the perfect way to deal with the barrage of information we gather
every moment. To transfer this ability to a machine, we try and mimic the human brain and hence
the name, neural networks.
As you learnt, we can consider a DL engineer as a specialisation of the data scientist role. Hence, a
sound knowledge of fundamental statistics and common machine learning algorithms is a must.
Now let’s take a look at the key skills mentioned in a typical JD:
• Find and implement the right algorithms, tools & frameworks for DL tasks
• Building deep learning Design Computer vision applications with the help of neural
networks like CNN, RNN for complex problems like object
detection/recognition/tracking/classification
DL engineer is primarily responsible for using neural networks to perform machine learning
operations on visual data.