0% found this document useful (0 votes)
0 views

Data Engineering vs Data Science

The document compares Data Engineering and Data Science, highlighting their distinct roles, tools, and focuses. Data Engineers build and maintain data pipelines and infrastructure, while Data Scientists analyze data to extract insights and build predictive models. The document also discusses the industries each profession serves, data quality evaluation, and various technical skills required for both roles.

Uploaded by

rajapraneesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Data Engineering vs Data Science

The document compares Data Engineering and Data Science, highlighting their distinct roles, tools, and focuses. Data Engineers build and maintain data pipelines and infrastructure, while Data Scientists analyze data to extract insights and build predictive models. The document also discusses the industries each profession serves, data quality evaluation, and various technical skills required for both roles.

Uploaded by

rajapraneesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Engineering Vs Data Science

Comparison
Parameters Data Engineering Data Science
Data Build and maintain data pipelines Analyze data to identify
trends and patterns
Processing Data Engineering focuses on building Data Science involves
and maintaining the infrastructure that analyzing data to extract
manages and processes data insights and build
predictive models
Tools Data engineers use various tools to Tensor flow, Python,
manage and process data, including Apache Hadoop, Power BI
Hadoop for distributed storage and
processing, Spark for big data
analytics, SQL for database
management, and ETL tools for data
integration.
Comparison
Parameters Data Engineering Data Science
What Industries Data engineers are needed in various Data Science is mainly
need data industries, including technology, focus on Healthcare, Retail
engineer? finance, healthcare, retail, and and E-Commerce
manufacturing. applications, Business ,
Marketing
Data Quality Data quality is evaluated based on Data quality is evaluated
several dimensions. Key dimensions based on several
include accuracy, completeness, dimensions. Key
consistency, timeliness, validity, and dimensions include
uniqueness. accuracy, Precision, recall

Focus Data Engineer focuses on improving Data Science focuses on a


data consumption techniques futuristic display of data.
continuously.
Comparison
Parameters Data Engineering Data Science
Roles Data Engineer roles are to build Data Scientist roles are to
data in an appropriate format. A provide
data engineer works at the back supervised/unsupervised
end. A data engineer uses learning of data, classify and
optimized machine learning regress data. Data Scientists
algorithms to maintain data and heavily used neural networks,
make data available in the most machine learning for continuous
appropriate manner. regression analysis.
Protocols Data Encryption / Decryption Data Encryption / Decryption

Data Matplotlib, numpy, pandas Matplotlib, Seaborn, or Tableau


Visualization
What are Does not directly build ML models, Directly build ML models, create
some things a create reports or dashboards reports or dashboards
data engineer
does not do?
Review Questions
1)------------ is the process of organizing,
managing and analyzing large amount of data.
a) Data Science
b) Data Mining
c) Data Engineering
d) Data Pipeline
Review Questions
2) Data engineering is a interconnection
between big data and ---------
a) Data Science
b) Data Mining
c) NLP
d) Neural Networks
Review Questions
3) In data engineering, data is classify into -----
ways
a) 6
b) 3
c) 4
d) 2
Review Questions
4) What kind of data that doesn’t define any
definite schema or set of rules
a) Structured
b) un structured
c) semi-structured
d) Quasi- Structured
Review Questions
5) What is the other name of Dark data?
a) Structured
b) un structured
c) semi-structured
d) Quasi- Structured
Review Questions
6) What is not an example for structured data?
a) SQL Databases
b) Chat Messages
c) Spread sheets
d) Table
Review Questions
7) Semi structured data commonly called --------
a) SQL Databases
b) No SQL data
c) Dark data
d) Table Format
Review Questions
8) Which one is the example of email headers?
a) Structured
b) un structured
c) semi-structured
d) Quasi- Structured
Review Questions
9) ---------- type of data can be represented in
the form of tree structure
a) XML
b) JSON
c) YAML
Review Questions
10) What is the other name of YAML?
11) What is data wrangling?
12) Expand ETL
13) Snowflake is the one of the examples of data
ware house True or False
14) Data Transformation is classified into -------
and ----------
Review Questions
15) We need to improve the quality of data and
simplify the method is called ---------
a) Data Wrangling
b) Data Preprocessing
c) Data Pipeline
d) Data Augmentation
Review Questions
16) ----- is the process of removing noise and
correcting inconsistent in data
a) Data Wrangling
b) Data Preprocessing
c) Data Cleaning
d) Data Augmentation
Review Questions
17) Which one of the following is a false
statement in data cleaning?
a) Filling in missing values
b) Calculations with missing data
c) Cleaning and filling missing data
d) Merge data
Review Questions
18) The outcome of Data integration?
a) Single unified view of data
b) Calculations with missing data
c) Cleaning and filling missing data
d) Merge data and Sort data
Review Questions
19) Which type of data transformation is convert
your data with the help of Mathematical
formula?
a) Syntactic Transformation
b) Semantic Transformation
c) Log Transform
d) Sigmoid function
Review Questions
20) --------- function is used to transformation
your data into Syntactic ad Semantic
a) ReLU functions
b) Activation Functions
c) Log Function
d) Sigmoid function
Review Questions
21) --------- is the process of involving deriving
new features from old ones
a) Data Reduction
b) Data Augmentation
c) Data Pipeline
d) Data Summarization
Review Questions
22) In the 1960’s and 1970’s, data was primarily
stored on ---------------
a) CD-ROM
b) Magnetic Tapes
c) Disk
d) IKS
Review Questions
23) -------- is an open-source framework , was
developed to store and process the massive
datasets
a) Apache Hadoop
b) R
c) JAVA
d) XML
Review Questions
24) Who is invented Relational Model?
a) Dr E.F Codd
b) Dr R.D. Hussian
c) Dr Patel
d) Dr Xiang
Review Questions
25) IBM Company invented a special data bases
system is called
a) IKS
b) IMS
c) UMIS
d) DSQL
• Programming: Proficiency in languages like Python, SQL, Java, and Scala.
• Database Management: Expertise in relational (SQL) and NoSQL databases, including
various database systems like MySQL, PostgreSQL, and MongoDB.
• Data Warehousing: Knowledge of data warehousing solutions, including cloud-based
platforms like BigQuery, Snowflake, and Amazon Redshift.
• Data Integration and ETL: Understanding of data extraction, transformation, and loading
(ETL) processes.
• Data Modeling and Schema Design: Ability to design and implement effective data models
and schemas.
• Big Data Tools and Technologies: Experience with tools like Hadoop, Apache Spark, and
Kafka.
• Cloud Computing: Proficiency in cloud platforms and services (e.g., AWS, Azure, Google
Cloud).
• Data Analysis and Interpretation: Ability to analyze data, identify patterns, and draw
meaningful insights.
• Communication and Collaboration: Strong communication and collaboration skills to work
effectively with data scientists, business analysts, and other stakeholders.

You might also like