0% found this document useful (0 votes)
100 views27 pages

Chapter 2 - Key Roles and Responsibilities - Updated

The document discusses the roles of data analyst, data engineer, and data scientist. It provides details on the typical skill sets, responsibilities, and day-to-day activities of each role. Data analysts focus on data acquisition, handling, processing, and statistical analysis/interpretation. Data engineers require intermediate programming skills to build algorithms and have expertise in statistics, machine learning models, and ensuring data quality. Data scientists need expertise in data, statistics, programming for machine learning/deep learning, and strategic planning for data analytics.

Uploaded by

ferran fang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views27 pages

Chapter 2 - Key Roles and Responsibilities - Updated

The document discusses the roles of data analyst, data engineer, and data scientist. It provides details on the typical skill sets, responsibilities, and day-to-day activities of each role. Data analysts focus on data acquisition, handling, processing, and statistical analysis/interpretation. Data engineers require intermediate programming skills to build algorithms and have expertise in statistics, machine learning models, and ensuring data quality. Data scientists need expertise in data, statistics, programming for machine learning/deep learning, and strategic planning for data analytics.

Uploaded by

ferran fang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

•Data Analyst

Most entry-level professionals interested in getting into a data-related job start off
as Data analysts. Qualifying for this role is as simple as it gets. All you need is a
bachelor’s degree and good statistical knowledge. Strong technical skills would be a
plus and can give you an edge over most other applicants. Other than this,
companies expect you to understand data handling, modeling and reporting
techniques along with a strong understanding of the business.
•Data Engineer
Data Engineer either acquires a master’s degree in a data-related field or gather a
good amount of experience as a Data Analyst. A Data Engineer needs to have a
strong technical background with the ability to create and integrate APIs. They also
need to understand data pipelining and performance optimization. 
•Data Scientist
Data Scientist is the one who analyses and interpret complex digital data. While
there are several ways to get into a data scientist’s role, the most seamless one is by
acquiring enough experience and learning the various data scientist skills. These
skills include advanced statistical analyses, a complete understanding of machine
learning, data conditioning etc.
For a better understanding of these professionals, let’s
dive deeper and understand their required skill-sets.

Skill-Sets
The below table illustrates the different skill sets
required for Data Analyst, Data Engineer and Data
Scientist:
Data Analyst vs Data Engineer vs Data Scientist Skill Sets
Data Analyst Data Engineer Data Scientist
Data Warehousing Data Warehousing & ETL Statistical & Analytical skills
Adobe & Google Analytics Advanced programming knowledge Data Mining

Programming knowledge Hadoop-based Analytics Machine Learning & Deep learning principles

In-depth programming knowledge (SAS/R/ Python


Scripting & Statistical skills In-depth knowledge of SQL/ database
coding)

Data architecture & pipelining


Reporting & data visualization  Hadoop-based analytics

Machine learning concept knowledge


SQL/ database knowledge  Data optimization

Spread-Sheet knowledge Scripting, reporting & data visualization  Decision making and soft skills
As mentioned above, a data analyst’s primary skill set revolves around data
acquisition, handling, and processing. A data engineer, on the other hand, requires
an intermediate level understanding of programming to build thorough algorithms
along with a mastery of statistics and math! And finally, a data scientist needs to be
a master of both worlds. Data, stats, and math along with in-depth programming
knowledge for Machine Learning and Deep Learning.

Now that we have a complete understanding of what skill sets you need to become
a data analyst, data engineer or data scientist, let’s look at what the typical roles and
responsibilities of these professionals.
Next, let us compare the different roles and responsibilities of a data analyst, data
engineer and data scientist in their day to day life. 
Roles And Responsibilities
The roles and responsibilities of a data analyst, data engineer and data scientist are
quite similar as you can see from their skill-sets. Refer the below table for more
understanding:
Data Analyst Data Engineer Data Scientist

Pre-processing and data gathering Develop, test & maintain architectures  Responsible for developing Operational Models

Emphasis on representing data via reporting Understand programming and its complexity  Carry out data analytics and optimization using machine learning &
and visualization deep learning

Responsible for statistical analysis & data


interpretation Deploy ML & statistical models  Involved in strategic planning for data analytics

Ensures data acquisition & maintenance Building pipelines for various ETL operations  Integrate data & perform ad-hoc analysis

Optimize Statistical Efficiency & Quality Ensures data accuracy and flexibility Fill in the gap between the stakeholders and customer
• Data analyst. The data analyst role implies proper data collection and
interpretation activities. An analyst ensures that collected data is relevant and
exhaustive while also interpreting the analytics results. Some companies, like
IBM or HP, also require data analysts to have visualization skills to convert
alienating numbers into tangible insights through graphics.
Preferred skills: R, Python, JavaScript, C/C++, SQL
• Business analyst. A business analyst basically realizes a CAO’s functions but on
the operational level. This implies converting business expectations into data
analysis. If your core data scientist lacks domain expertise, a business analyst
bridges this gulf.
Preferred skills: data visualization, business intelligence, SQL
Business Analytic Team
IBM ICE (Innovation Centre for Education)
• A machine learning engineer combines software engineering

Data scientist and modeling skills by determining which model to use and

what data should be used for each model. Probability and


(not a data science unicorn). What does a data scientist do?
statistics are also their forte. Everything that goes into training,
Assuming you aren’t hunting unicorns, a data scientist is a
monitoring, and maintaining a model is ML engineer’s job.
person who solves business tasks using machine learning

and data mining techniques. If this is too fuzzy, the role can Preferred skills: R, Python, Scala, Julia, Java
be narrowed down to data preparation and cleaning with
• Data journalists help make sense of data output by putting it
further model training and evaluation.
in the right context. They’re also tasked with articulating
•Preferred skills: R, SAS, Python, Matlab, SQL, noSQL,
business problems and shaping analytics results into
Hive, Pig, Hadoop, Spark
compelling stories. Though required to have coding and
•To avoid confusion and make the search for a data scientist
statistics experience, they should be able to present the idea to
less overwhelming, their job is often divided into two roles:
stakeholders and represent the data team with those unfamiliar
machine learning engineer and data journalist.
with statistics.Preferred skills: SQL, Python, R, Scala, Carto,

D3, QGIS, Tableau


Continue ….
• Data engineer. Engineers implement, test, and
• Data architect. This role is critical for working with large maintain infrastructural components that data architects
design. Realistically, the role of an engineer and the
amounts of data (you guessed it, Big Data). However, if role of an architect can be combined in one person. The
you don’t solely rely on MLaaS cloud platforms, this role set of skills is very close.
• Preferred skills: SQL, noSQL, Hive, Pig, Matlab, SAS,
is critical to warehouse the data, define database Python, Java, Ruby, C++, Perl
• Application/data visualization engineer. Basically,
architecture, centralize data, and ensure integrity across
this role is only necessary for a specialized data science
different sources. For large distributed systems and big model. In other cases, software engineers come from IT
units to deliver data science results in applications that
datasets, the architect is also in charge of performance. end-users face. And it’s very likely that an application
engineer or other developers from front-end units will
• Preferred skills: SQL, noSQL, XML, Hive, Pig, Hadoop, oversee end-user data visualization.
• Preferred skills: programming, JavaScript (for
Spark visualization), SQL, noSQL.
How to integrate a data science team into your company

• As a data science team along with the company’s needs grows, it


requires creating a whole new department that needs to be
organized, controlled, monitored, and managed. This huge
organizational shift suggests that a new group should have
established roles and responsibilities – all in relation to other projects
and facilities. So, how do you integrate data scientists in your
company?
• We’ll base the key types on  Accenture’s classification, and expand on
the team’s structure ideas further.
Decentralized
• This is the least coordinated option where analytics efforts are used
sporadically across the organization and resources are allocated
within each group’s function. This often happens in companies when
data science expertise has appeared organically. Business units, like
product teams, or functional units at some point recognize their
internal need for analytics. They start hiring data scientists or analysts
to meet this demand. Sometimes a data scientist may be the only
person in a cross-functional product team with data analysis
expertise.
• The decentralized model works best for companies with no
intention of spreading out into a data-driven company. It may also
be applied to the early stages of data science activities for the short-
term progress of demo projects that leverage advanced analytics.
• There are a number of drawbacks that this model has.
• This model often leads to silos striving, lack of analytics
standardization, and – you guessed it – decentralized reporting.
• The hiring process is an issue. When managers hire a data scientist
for their team, it’s a challenge for them to hold a proper interview.
They clearly understand, say, a typical software engineer’s roles,
responsibilities, and skills, while being unfamiliar with those of a
data scientist. So, putting it all together is a challenge for them.
• Managing a data scientist career path is also problematic. While
team managers are totally clear on how to promote a software
engineer, further steps for data scientists may raise questions. The
same problem haunts building an individual development plan.
• Lower quality standards and underestimated best practices are
often the case. The point is that data scientists must gain knowledge
from other mentoring data scientists. As such an option is not
provided in this model, data scientists may end up left on their own.
This usually leads to no improvements of best practices, which
usually reduces data quality and the quality of a product as a whole.
Functional Model
• Here most analytics specialists
work in one functional
department where analytics is
most relevant. And, it’s often
marketing or supply chain. This
option also entails little to no
coordination and expertise isn’t
used strategically enterprise-
wide.
Functional Model
• The functional approach is best suited for organizations that are just embarking on the
analytics road. They have no need to analyze data from every single point, and
consequently, there are not so many analytical processes to create a separate and
centralized data science team for the whole organization.
• Drawbacks of the functional model hide in its centralized nature.
• Keeping off from the global company’s pains. The approach entails that analytical
activities are mostly focused on functional needs rather than on all enterprise
necessities. Such unawareness may result in analytics isolation and staying out of
context.
• Weak cohesion due to the absence of a data manager. As an analytical team here is
placed under a particular business unit, it submits reports directly to the head of this
unit. In this way, there may not be a direct data science manager who understands the
specifics of their team.
Consulting

In this structure, analytic folks


work together as one group
but their role within an
organization is consulting,
meaning that different
departments can “hire” them
for specific tasks. This, of
course, means that there’s
almost no resource allocation
– either specialist is available
or not.
Consultancy Model
• The consultancy model is best suitable for SMB companies with sporadic and small- to medium-scale data
science tasks. As all DS team members submit and report to one DS team manager, managing such a DS
team becomes easier and cheaper for SMB.
• However, there are always some pitfalls.
• First of all, poor data quality can become a fundamental flaw of the model. As data scientists can’t adhere
to their best practices for every task, they have to sacrifice quality to business needs that demand quick
solutions.
• Also, there’s the low-motivation trap. As data scientists are not fully involved in product building and
decision-making, they have little to no interest in the outcome.
• A serious drawback of a consulting model is uncertainty. Deadlines are not clear as data scientists are not
clearly familiar with data sources and the context of their appearance. Long-term and complex projects are
hardly accessible because sometimes specialists work for years over the same set of problems to achieve
great results.
• The prioritization method is also unclear. It’s still hard to identify how a data science manager prioritizes
and allocates tasks for data scientists and what objectives to favor first.
Centralized Model
This structure finally allows
you to use analytics in
strategic tasks – one data
science team serves the
whole organization in a
variety of projects. Not only
does it provide a DS team
with long-term funding and
better resource management,
but it also encourages career
growth. The only pitfall here is
the danger of transforming an
analytics function into a
supporting one
Centralized……
• One of the best use cases for creating a centralized team is when both demand for analytics and
the number of analysts is rapidly increasing, requiring the urgent allocation of these resources.
Introducing a centralized approach, a company indicates that it considers data a strategic concept
and is ready to build an analytics department equal to sales or marketing.
• As always, there are some pitfalls in the model.
• There’s a high chance of becoming isolated and facing the disconnect between a data analytics
team and business lines. As the data analytics team doesn’t participate in regular activities of
actual business value units, they might not be closely familiar with the latter’s needs and pains.
This may lead to the narrow relevance of recommendations that can be left unused and ignored.
• This leads to challenges in meaningful cooperation with a product team. Once the analytics group
has found a way to tackle a problem, it suggests a solution to a product team. The biggest
problem is that this solution may not fit into a product roadmap. And, conflict may appear. The
only way out here is to create a team that would assess, design, and implement the suggested
solution. This alternative, however, takes much effort, time, and money.
• Sometimes, you may find that a centralized model is described as the Center of Excellence. And
it’s okay, there are always unique scenarios. But we’ll stick to the Accenture classification, since it
seems more detailed, and draw a difference between the centralized model and the center of
excellence.
Other Model
Center of Excellence (CoE)
• If you pick this option, you’ll still keep the centralized
approach with a single coordination center, but data
scientists will be allocated to different units in the
organization. This is the most balanced structure –
analytics activities are highly coordinated, but experts
won’t be removed from business units.
• Due to its well-balanced interactions, the approach is
being increasingly adopted, especially in enterprise-scale
organizations. It works best for companies with a
corporate strategy and a thoroughly developed data
roadmap.
• However, even such a deeply data-focused approach has
its drawbacks.
• While this approach is balanced, there’s no single
centralized group that would focus on enterprise-level
problems. Each analytical group would be solving
problems inside their units.
• Another drawback is that there’s no innovation unit, a
group of specialists that primarily focus on state-of-the-art
solutions and long-term data initiatives rather than day-to-
day needs.
Federated
• This model is relevant when there’s an increasingly high demand
for analytics talent across the company. Here, you employ a
SWAT team of sorts – an analytics group that works from a
central point and addresses complex cross-functional tasks. The
rest of the data scientists are distributed as in the Center of
Excellence model. Basically, the federated model combines the
coordination and decentralization approach of the CoE model but
leaves this avant-garde unit.
• The federated model is best adopted in companies where
analytics processes and tasks have a systemic nature and need
day-to-day updates. This approach can serve both enterprise-
scale objectives like enterprise dashboard design and function-
tailored analytics with different types of modeling.
• While it seems that the federated model is perfect, there are still
some drawbacks.

• Expenses for talent acquisition and retention. As this model suggests a separate specialist for each product team and
central data management, this may cost you a penny. Thus, the approach in its pure form isn’t the best choice for
companies when they are in their earliest stages of analytics adoption.
• Cross-functionality may create a conflict environment. It can lack a power parity between all team lead positions and
cause late deliveries or questionable results due to constant conflicts between unit team leads and CoE management.
Democratic
• This model is an additional way to think of data culture. The democratic
model entails everyone in your organization having access to data via BI tools
 or data portals. This means that it can be combined with any other model
described above. You can have a federated approach with CoE and analytics
specialists inside each department and at the same time expose BI tools to
everyone interested in using data for their duties – which is great in terms of
fostering data culture.
• Product team members like product and engineering managers, designers,
and engineers access the data directly without attracting data scientists.
• What are the drawbacks?
• The company that integrates such a model usually invests a lot into data
science infrastructure, tooling, and training.
• You simply need more people to avoid tales of a data engineer being occupied
with tweaking a BI dashboard for another sales representative, instead of
doing actual data engineering work.
Which Model is the best ?????
Remember, that your model may change and evolve depending
on your business needs: While today you may be content with
data scientists residing in their functional units, tomorrow a
Center of Excellence can become a necessity.
The critical thing to be aware of

• If you ask AltexSoft’s data science experts what the current state of AI/ML across
industries is, they will likely point out two main issues: 1. Business executives still
need to be convinced that a reasonable ROI of ML investments exists. 2. If they are
convinced and understand the value proposition and market demand, they may lack
technical skills and resources to make products a reality.
• These barriers are mostly due to digital culture in organizations. Efficient data
processes challenge C-level executives to embrace horizontal decision-making.
Frontline managers with access to analytics have more operational freedom to 
make data-driven decisions, while top-level management oversees a strategy. This
reduces management effort and eventually mitigates “gut-feeling-decision” risks.
Basically, the cultural shift defines the end success of building a data-driven business.
As McKinsey argues, setting a culture is probably the hardest part, while the rest is
manageable.

You might also like