Chapter 2 - Key Roles and Responsibilities - Updated
Chapter 2 - Key Roles and Responsibilities - Updated
Most entry-level professionals interested in getting into a data-related job start off
as Data analysts. Qualifying for this role is as simple as it gets. All you need is a
bachelor’s degree and good statistical knowledge. Strong technical skills would be a
plus and can give you an edge over most other applicants. Other than this,
companies expect you to understand data handling, modeling and reporting
techniques along with a strong understanding of the business.
•Data Engineer
Data Engineer either acquires a master’s degree in a data-related field or gather a
good amount of experience as a Data Analyst. A Data Engineer needs to have a
strong technical background with the ability to create and integrate APIs. They also
need to understand data pipelining and performance optimization.
•Data Scientist
Data Scientist is the one who analyses and interpret complex digital data. While
there are several ways to get into a data scientist’s role, the most seamless one is by
acquiring enough experience and learning the various data scientist skills. These
skills include advanced statistical analyses, a complete understanding of machine
learning, data conditioning etc.
For a better understanding of these professionals, let’s
dive deeper and understand their required skill-sets.
Skill-Sets
The below table illustrates the different skill sets
required for Data Analyst, Data Engineer and Data
Scientist:
Data Analyst vs Data Engineer vs Data Scientist Skill Sets
Data Analyst Data Engineer Data Scientist
Data Warehousing Data Warehousing & ETL Statistical & Analytical skills
Adobe & Google Analytics Advanced programming knowledge Data Mining
Programming knowledge Hadoop-based Analytics Machine Learning & Deep learning principles
Spread-Sheet knowledge Scripting, reporting & data visualization Decision making and soft skills
As mentioned above, a data analyst’s primary skill set revolves around data
acquisition, handling, and processing. A data engineer, on the other hand, requires
an intermediate level understanding of programming to build thorough algorithms
along with a mastery of statistics and math! And finally, a data scientist needs to be
a master of both worlds. Data, stats, and math along with in-depth programming
knowledge for Machine Learning and Deep Learning.
Now that we have a complete understanding of what skill sets you need to become
a data analyst, data engineer or data scientist, let’s look at what the typical roles and
responsibilities of these professionals.
Next, let us compare the different roles and responsibilities of a data analyst, data
engineer and data scientist in their day to day life.
Roles And Responsibilities
The roles and responsibilities of a data analyst, data engineer and data scientist are
quite similar as you can see from their skill-sets. Refer the below table for more
understanding:
Data Analyst Data Engineer Data Scientist
Pre-processing and data gathering Develop, test & maintain architectures Responsible for developing Operational Models
Emphasis on representing data via reporting Understand programming and its complexity Carry out data analytics and optimization using machine learning &
and visualization deep learning
Ensures data acquisition & maintenance Building pipelines for various ETL operations Integrate data & perform ad-hoc analysis
Optimize Statistical Efficiency & Quality Ensures data accuracy and flexibility Fill in the gap between the stakeholders and customer
• Data analyst. The data analyst role implies proper data collection and
interpretation activities. An analyst ensures that collected data is relevant and
exhaustive while also interpreting the analytics results. Some companies, like
IBM or HP, also require data analysts to have visualization skills to convert
alienating numbers into tangible insights through graphics.
Preferred skills: R, Python, JavaScript, C/C++, SQL
• Business analyst. A business analyst basically realizes a CAO’s functions but on
the operational level. This implies converting business expectations into data
analysis. If your core data scientist lacks domain expertise, a business analyst
bridges this gulf.
Preferred skills: data visualization, business intelligence, SQL
Business Analytic Team
IBM ICE (Innovation Centre for Education)
• A machine learning engineer combines software engineering
Data scientist and modeling skills by determining which model to use and
and data mining techniques. If this is too fuzzy, the role can Preferred skills: R, Python, Scala, Julia, Java
be narrowed down to data preparation and cleaning with
• Data journalists help make sense of data output by putting it
further model training and evaluation.
in the right context. They’re also tasked with articulating
•Preferred skills: R, SAS, Python, Matlab, SQL, noSQL,
business problems and shaping analytics results into
Hive, Pig, Hadoop, Spark
compelling stories. Though required to have coding and
•To avoid confusion and make the search for a data scientist
statistics experience, they should be able to present the idea to
less overwhelming, their job is often divided into two roles:
stakeholders and represent the data team with those unfamiliar
machine learning engineer and data journalist.
with statistics.Preferred skills: SQL, Python, R, Scala, Carto,
• Expenses for talent acquisition and retention. As this model suggests a separate specialist for each product team and
central data management, this may cost you a penny. Thus, the approach in its pure form isn’t the best choice for
companies when they are in their earliest stages of analytics adoption.
• Cross-functionality may create a conflict environment. It can lack a power parity between all team lead positions and
cause late deliveries or questionable results due to constant conflicts between unit team leads and CoE management.
Democratic
• This model is an additional way to think of data culture. The democratic
model entails everyone in your organization having access to data via BI tools
or data portals. This means that it can be combined with any other model
described above. You can have a federated approach with CoE and analytics
specialists inside each department and at the same time expose BI tools to
everyone interested in using data for their duties – which is great in terms of
fostering data culture.
• Product team members like product and engineering managers, designers,
and engineers access the data directly without attracting data scientists.
• What are the drawbacks?
• The company that integrates such a model usually invests a lot into data
science infrastructure, tooling, and training.
• You simply need more people to avoid tales of a data engineer being occupied
with tweaking a BI dashboard for another sales representative, instead of
doing actual data engineering work.
Which Model is the best ?????
Remember, that your model may change and evolve depending
on your business needs: While today you may be content with
data scientists residing in their functional units, tomorrow a
Center of Excellence can become a necessity.
The critical thing to be aware of
• If you ask AltexSoft’s data science experts what the current state of AI/ML across
industries is, they will likely point out two main issues: 1. Business executives still
need to be convinced that a reasonable ROI of ML investments exists. 2. If they are
convinced and understand the value proposition and market demand, they may lack
technical skills and resources to make products a reality.
• These barriers are mostly due to digital culture in organizations. Efficient data
processes challenge C-level executives to embrace horizontal decision-making.
Frontline managers with access to analytics have more operational freedom to
make data-driven decisions, while top-level management oversees a strategy. This
reduces management effort and eventually mitigates “gut-feeling-decision” risks.
Basically, the cultural shift defines the end success of building a data-driven business.
As McKinsey argues, setting a culture is probably the hardest part, while the rest is
manageable.