3.Data Preprocessing
3.Data Preprocessing
By Aqsa Afzal
Data Pre-Processing
• Data preprocessing involves cleaning and transforming raw data to
prepare it for machine learning algorithms.
• Essential for ensuring data quality and improving model performance.
Steps of Data Preprocessing
Data Collection
• Gathering data from various sources such as databases, files, APIs, or web
scraping.
Data Wrangling
• Cleaning: Handling missing values, removing duplicates, correcting
inconsistencies.
• Transformation: Restructuring, converting data types, standardizing units.
• Enrichment: Adding derived features, merging datasets.
• Validation: Verifying data quality, checking for outliers.
• Aggregation: Combining datasets, summarizing data.
Data Visualization
• Histograms: Distribution of numerical data.
• Scatter plots: Relationship between two numerical variables.
• Box plots: Distribution and variability of data.
• Bar charts: Comparison of categorical variables.
• Heatmaps: Correlation matrix visualization.
Data Reduction
DATA PRE-
PROCESSING
Due by next