Data Science LVCSession 2
Data Science LVCSession 2
• ETL Tools: ETL (extract, transform, load) tools move data between systems.
They access data, then apply rules to “transform” the data through steps that
make it more suitable for analysis. DO NOT WRITE ANYTHING
HERE. LEAVE THIS SPACE FOR
WEBCAM
•Data Engineering & Exploration
• SQL: Structured Query Language (SQL) is the standard language
for querying relational databases.
• Python: Python is a general programming language. Data
engineers may choose to use Python for ETL tasks.
• Cloud Data Storage: Including Amazon S3, Azure Data Lake
Storage (ADLS), Google Cloud Storage, etc.
• Query Engines: Engines run queries against data to return
answers. Data engineers may work with engines like Spark, Flink,
and others.
Everyone must explore a few essential tasks when working with data in the data preparation step. These
are as follows:
• Data cleaning: This task includes the identification of errors and making corrections or improvements
to those errors.
• Feature Selection: We need to identify the most important or relevant input data variables for the
model.
• Data Transforms: Data transformation involves converting raw data into a well-suitable format for the
model.
• Feature Engineering: Feature engineering involves deriving new variables from the available dataset.
• Dimensionality Reduction: The dimensionality reduction process involves converting higher
dimensions into lower dimension features without changing the information.