0% found this document useful (0 votes)
5 views13 pages

Data Science LVCSession 2

Uploaded by

jakkisharmila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Data Science LVCSession 2

Uploaded by

jakkisharmila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Science

Data Representation and Modelling


• The main objective of machine learning is to build models that understand data and find underlying patterns.
• In order to do so, it is very important to feed the data in a way that is interpretable by the computer.
• To feed the data into a model, it must be represented as a table or a matrix of the required dimensions. Converting
your data into the correct tabular form is one of the first steps before pre-processing can properly begin.

Data Represented in a Table


• Data should be arranged in a two-dimensional space made up of rows and columns.
• This type of data structure makes it easy to understand the data and pinpoint any problems.
• An example of some raw data stored as a CSV (comma separated values) file is shown here:

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Data Representation and Modelling
• The representation of the data in a table is as follows:

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Data Representation and Modelling
• Data modelling is the process of creating a visual representation of either a whole information system
or parts of it to communicate connections between data points and structures.
• The goal is to illustrate the types of data used and stored within the system, the relationships among
these data types, the ways the data can be grouped and organized and its formats and attributes.

• Data modelling process


1. Identify the entities.
2. Identify key properties of each entity.
3. Identify relationships among entities.
4. Map attributes to entities completely.
5. Assign keys as needed, and decide on a degree of normalization that balances the need to reduce
redundancy with performance requirements.
6. Finalize and validate the data model.
DO NOT WRITE ANYTHING
HERE. LEAVE THIS SPACE FOR
WEBCAM
•Data Engineering & Exploration
• Data engineering is the process of designing and building systems that let
people collect and analyze raw data from multiple sources and formats.
• These systems empower people to find practical applications of the data, which
businesses can use to thrive.

Data Engineering Tools and Skills


• Data engineers use many different tools to work with data.
• They use a specialized skill set to create end-to-end data pipelines that move
data from source systems to target destinations.
• Data engineers work with a variety of tools and technologies, including:

• ETL Tools: ETL (extract, transform, load) tools move data between systems.
They access data, then apply rules to “transform” the data through steps that
make it more suitable for analysis. DO NOT WRITE ANYTHING
HERE. LEAVE THIS SPACE FOR
WEBCAM
•Data Engineering & Exploration
• SQL: Structured Query Language (SQL) is the standard language
for querying relational databases.
• Python: Python is a general programming language. Data
engineers may choose to use Python for ETL tasks.
• Cloud Data Storage: Including Amazon S3, Azure Data Lake
Storage (ADLS), Google Cloud Storage, etc.
• Query Engines: Engines run queries against data to return
answers. Data engineers may work with engines like Spark, Flink,
and others.

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Data Preparation & Learning
• Data preparation is defined as a gathering, combining, cleaning, and transforming raw data to make
accurate predictions in Machine learning projects.
• Data preparation is also known as data "pre-processing," "data wrangling," "data cleaning," "data pre-
processing," and "feature engineering.“
• It is the later stage of the machine learning lifecycle, which comes after data collection.
• Data preparation is particular to data, the objectives of the projects, and the algorithms that will be
used in data modeling techniques.

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Data Preparation & Learning
Prerequisites for Data Preparation

Everyone must explore a few essential tasks when working with data in the data preparation step. These
are as follows:

• Data cleaning: This task includes the identification of errors and making corrections or improvements
to those errors.
• Feature Selection: We need to identify the most important or relevant input data variables for the
model.
• Data Transforms: Data transformation involves converting raw data into a well-suitable format for the
model.
• Feature Engineering: Feature engineering involves deriving new variables from the available dataset.
• Dimensionality Reduction: The dimensionality reduction process involves converting higher
dimensions into lower dimension features without changing the information.

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
•Data Product Creation

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Information Distillation
• In machine learning, information distillation refers to the process of transferring knowledge from a
large model to a smaller one.
• While huge models (such as very deep neural networks or ensembles of multiple models) have larger
knowledge capacity than small models, this capacity may not be utilized to its full potential.
• As illustrated in the figure below, knowledge distillation involves a small “student” model learning to
mimic a large “teacher” model and using the teacher’s knowledge to achieve similar or superior
accuracy.

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Data Visualization
• Data visualization is a graphical representation of quantitative information and data by using visual
elements like graphs, charts, and maps.
• Data visualization convert large and small data sets into visuals, which is easy to understand and
process for humans.
• Data visualization tools provide accessible ways to understand outliers, patterns, and trends in the
data.
• In the world of Big Data, the data visualization tools and technologies are required to analyze vast
amounts of information.

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Data Visualization

DO NOT WRITE ANYTHING


HERE. LEAVE THIS SPACE FOR
WEBCAM
Thank You

You might also like