0% found this document useful (0 votes)
6 views

3.Data Preprocessing

Uploaded by

aqsa.afzal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

3.Data Preprocessing

Uploaded by

aqsa.afzal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Preprocessing

By Aqsa Afzal
Data Pre-Processing
• Data preprocessing involves cleaning and transforming raw data to
prepare it for machine learning algorithms.
• Essential for ensuring data quality and improving model performance.
Steps of Data Preprocessing
Data Collection
• Gathering data from various sources such as databases, files, APIs, or web
scraping.
Data Wrangling
• Cleaning: Handling missing values, removing duplicates, correcting
inconsistencies.
• Transformation: Restructuring, converting data types, standardizing units.
• Enrichment: Adding derived features, merging datasets.
• Validation: Verifying data quality, checking for outliers.
• Aggregation: Combining datasets, summarizing data.
Data Visualization
• Histograms: Distribution of numerical data.
• Scatter plots: Relationship between two numerical variables.
• Box plots: Distribution and variability of data.
• Bar charts: Comparison of categorical variables.
• Heatmaps: Correlation matrix visualization.
Data Reduction

• Dimensionality reduction techniques using libraries like Scikit-learn


• Feature selection with Scikit-learn or other relevant libraries
Data Augmentation
• Artificially increasing data diversity by applying transformations such
as rotation, translation, flipping, etc.
• Improves model generalization and robustness, especially in
computer vision tasks.
Libraries for Data Preprocessing,
Visualization, and Augmentation
• Pandas: Data cleaning, manipulation, and basic feature engineering.
• Matplotlib and Seaborn: Python libraries for data visualization.
• Scikit-learn: Provides tools for data preprocessing including feature
scaling, encoding, and dimensionality reduction.
• TensorFlow or Keras: Libraries for data augmentation, especially in
deep learning applications.
• NumPy: Used for numerical operations and handling missing values.
Task 2

DATA PRE-
PROCESSING
Due by next

You might also like