0% found this document useful (0 votes)
22 views2 pages

Fall 2024 - Project - CEP

DPP project

Uploaded by

sawairasaeed63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views2 pages

Fall 2024 - Project - CEP

DPP project

Uploaded by

sawairasaeed63
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Pre-processing with Python

Instructor: Dr. Aamir Arsalan Course Code: BSE-636


Semester: V Credit Hours: 3

Course project of the Data Pre-processing with Python course has been designed as a Complex Engineering
Problem (CEP).
Project Title:
Design and Optimization of a Data Preprocessing Pipeline for Machine Learning Applications

Project Statement:
In this project, students will design, implement, and evaluate a comprehensive data preprocessing pipeline to prepare
a dataset for machine learning applications. This project addresses the challenges of handling missing values,
removing outliers, and optimizing data transformation techniques to ensure robust model performance. Students will
work with a real-world dataset, applying theoretical knowledge and practical skills to design innovative
preprocessing solutions that balance conflicting requirements such as computational efficiency, data integrity, and
model accuracy.

Objectives:
1. Develop a deep understanding of advanced data preprocessing techniques and their role in machine learning.
2. Equip students with hands-on experience in handling real-world dataset challenges.
3. Foster innovative thinking to balance trade-offs in preprocessing strategies.
4. Enhance problem-solving skills through iterative design, implementation, and evaluation.

Project Phases:
Phase 1: Data Exploration and Problem Framing (Relevant WP: WP2 - Range of Conflicting Requirements)
• Select a real-world dataset from platforms like Kaggle or UCI Machine Learning Repository.
• Identify and document challenges related to missing values, outliers, and feature representation.

Phase 2: Feature Engineering and Data Transformation (Relevant WP: WP1 - Depth of Knowledge Required,
WP3 - Depth of Analysis Required)

• Engineer features using techniques like one-hot encoding, label encoding, and feature scaling.
• Implement dimensionality reduction techniques such as PCA to address the curse of dimensionality.
• Justify the selection of transformation methods for the dataset.

Phase 3: Handling Missing and Noisy Data (Relevant WP: WP2 - Range of Conflicting Requirements)
• Apply multiple imputation techniques (e.g., KNN, iterative imputer) for missing data.
• Identify and remove outliers using Z-score and IQR methods.

Phase 4: Preprocessing Pipeline Design (Relevant WPs: WP1, WP3)


• Design a pipeline using Python libraries (e.g., Pandas, Scikit-learn).

Phase 5: Model Performance and Preprocessing Impact Analysis (Relevant WP: WP3 - Depth of Analysis
Required)
• Evaluate the performance of machine learning models trained on preprocessed data.
• Compare results across multiple preprocessing strategies.

Mapping to Complex Engineering Problems (CEPs)


1. WP1: Depth of Knowledge Required
• Requires knowledge of data transformation, feature engineering, and advanced imputation
techniques.
2. WP2: Range of Conflicting Requirements
• Balances trade-offs between preprocessing complexity, computational efficiency, and dataset-
specific requirements.
3. WP3: Depth of Analysis Required
• Involves analyzing the impact of preprocessing choices on model performance and dataset integrity

Evaluation Criteria:
Category Weightage (%) Mapped WPs
Dataset Selection & 20% WP2
Problem Framing
Feature Engineering 15% WP1, WP3
Handling Missing and 20% WP2
Noisy Data
Pipeline Design 20% WP1, WP3
Final Report & 25%
Presentation

=====================================Ended=======================================

For any query about the project, contact at [email protected]

GOOD LUCK

You might also like