IDS MIdterm Project - Section (C) Fall 24-25

Uploaded by

kokobhaiya143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views2 pages

IDS MIdterm Project - Section (C) Fall 24-25

Uploaded by

kokobhaiya143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Introduction to Data Science

Midterm Project

Project 1:
Apply data preparation steps (which can be applied) and calculate descriptive statistics for the given
data set. In this project, we are going to use a modified version of Loan Approval Classification
Dataset which can be downloaded from the Teams. The original dataset can be found in the following
link where the dataset description is available as well (you may need to log-in to download the
dataset).https://www.kaggle.com/datasets/taweilo/loan-approval-classification-data

Project Deliverables
• Submit the implemented R program (R file or Text file) and updated Text dataset in the Teams.
During VIVA session, you will bring this implemented program and we may ask you to execute
the program.
• Submit the report in the Teams. See the instruction section below for the report details.
Please bring the printed copy of the submitted report during the VIVA session.

Instructions
• The submission deadline for all deliverables is December 14, 2024 (you must submit the
assignment before 11:59 PM).
• At the beginning of the report, write a short note about the dataset. You will get the dataset
details from the above link provided for the dataset.
• For each implemented code segment in the R program, provide the code and its output along
with their description in the report. In the description part, only write the content (do not write
unnecessary content) that is sufficient to understand the code and its output.
• Comments are not allowed in the R program.
• The following topics can be focused to think about the project. Note that the project is not
limited to these topics which are mentioned to get an idea about how to proceed with
the project.
o If there are any missing values in the dataset, we should apply all applicable methods
from the available options to handle the missing values.
o We can see missing values on a graph.
o We can convert the imbalanced data set into the balanced data set.
o We can find and remove duplicate values.
o We can apply some filtering methods to filter the data.
o We can convert attributes from numeric to categorical or categorical to numeric.
o We can apply the normalization method only for one attribute.
o If any invalid data/outliers exist in the data set, use the appropriate approach to
handle those values.
Project 2: Data Preprocessing Steps for Text Data
This project aims to develop a comprehensive and efficient data preprocessing for text data to
enhance the performance of natural language processing (NLP) models. Preprocessing text data is
a critical step in any NLP project, as it involves cleaning, transforming, and structuring raw text into
a format that models can interpret and learn from effectively. This project will cover a range of
preprocessing techniques tailored to text data, addressing challenges like noise, inconsistency, and
redundancy. You need to collect text data from any sources using web scraping and perform the
following data preprocessing steps.

Key Steps in Text Data Preprocessing:

1. Text Cleaning
2. Tokenization:
3. Normalization
4. Stop words Removal
5. Stemming and Lemmatization
6. Handling Contractions
7. Handling Emojis and Emoticons
8. Spell Checking

Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Python Final Project
No ratings yet
Python Final Project
3 pages
Project Requirements Student Version 1.0
No ratings yet
Project Requirements Student Version 1.0
6 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
XI CS Practical List 2023-24
No ratings yet
XI CS Practical List 2023-24
3 pages
Project Report: BS (CS) - 6 (A) Project Title: Toxic Comment Analysis
No ratings yet
Project Report: BS (CS) - 6 (A) Project Title: Toxic Comment Analysis
20 pages
Data Analyst Nanodegree Program - Syllabus
50% (2)
Data Analyst Nanodegree Program - Syllabus
7 pages
23-01!99!00 CS 633 Data Ming - Final Project PDF - PDF 2
No ratings yet
23-01!99!00 CS 633 Data Ming - Final Project PDF - PDF 2
36 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
DASC5133 FA23 Assignment
No ratings yet
DASC5133 FA23 Assignment
14 pages
Mod00 Syllabus 2024fall
No ratings yet
Mod00 Syllabus 2024fall
12 pages
DASC5133 - FA23 - Assignment 2
No ratings yet
DASC5133 - FA23 - Assignment 2
13 pages
Final Project
No ratings yet
Final Project
2 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
Data Science Projects
No ratings yet
Data Science Projects
4 pages
02 Lab 2 Instructions
No ratings yet
02 Lab 2 Instructions
37 pages
Dsb19 0003. 30rb Xa XW Exv Board Changes
100% (1)
Dsb19 0003. 30rb Xa XW Exv Board Changes
4 pages
Data Science New Report
No ratings yet
Data Science New Report
39 pages
Project Guidelines PGDDS
No ratings yet
Project Guidelines PGDDS
8 pages
Data Science
No ratings yet
Data Science
7 pages
Module 1 ITC 111
No ratings yet
Module 1 ITC 111
30 pages
Topic
No ratings yet
Topic
13 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Project3 Handout
No ratings yet
Project3 Handout
5 pages
Assignment 2 Data Science Application Project
No ratings yet
Assignment 2 Data Science Application Project
3 pages
Exam CSA: IT Certification Guaranteed, The Easy Way!
100% (1)
Exam CSA: IT Certification Guaranteed, The Easy Way!
19 pages
R Final Project
No ratings yet
R Final Project
3 pages
Data Science Solved
No ratings yet
Data Science Solved
12 pages
Autumn HHW
No ratings yet
Autumn HHW
3 pages
B.Tech - AIDS R 2021
No ratings yet
B.Tech - AIDS R 2021
31 pages
Project Presentation
No ratings yet
Project Presentation
4 pages
Arsalan Shirzad's Mini Projects Portfolio
No ratings yet
Arsalan Shirzad's Mini Projects Portfolio
24 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Data Science Immersive Syllabus: Course
No ratings yet
Data Science Immersive Syllabus: Course
4 pages
24CSPPC106 - Essentials of Data Science
No ratings yet
24CSPPC106 - Essentials of Data Science
3 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Certificate in Data Science Capstone Project
No ratings yet
Certificate in Data Science Capstone Project
5 pages
Aenexz Tech Data Science Curriculum 8 Weeks
No ratings yet
Aenexz Tech Data Science Curriculum 8 Weeks
8 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Mid Term Project
No ratings yet
Mid Term Project
3 pages
Data Analysis Plan
No ratings yet
Data Analysis Plan
22 pages
Project Report Submission
No ratings yet
Project Report Submission
3 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Lab Sheet1
No ratings yet
Lab Sheet1
1 page
DSI Detailed Syllabus v10.2
No ratings yet
DSI Detailed Syllabus v10.2
4 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
Final Project
No ratings yet
Final Project
4 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
Assigment Instructios
No ratings yet
Assigment Instructios
3 pages
Precia I30 MR (202701)
50% (2)
Precia I30 MR (202701)
38 pages
Final Coursework - 24.2 Ad Cert Python
No ratings yet
Final Coursework - 24.2 Ad Cert Python
2 pages
Programming Fundamentals Course/Lab: Final Project Guidelines
No ratings yet
Programming Fundamentals Course/Lab: Final Project Guidelines
2 pages
5.2 Activity Life Cycle, Broadcast Life Cycle
No ratings yet
5.2 Activity Life Cycle, Broadcast Life Cycle
35 pages
Udacity Dandsyllabus
No ratings yet
Udacity Dandsyllabus
7 pages
Training Brochure 2023 - Robot - Welding - ENG - Web
No ratings yet
Training Brochure 2023 - Robot - Welding - ENG - Web
28 pages
Me8781 Mechatronics Laboratory Manual: 7th Semester
No ratings yet
Me8781 Mechatronics Laboratory Manual: 7th Semester
56 pages
Module 2: Switching Concepts: Switching, Routing, and Wireless Essentials v7.0 (SRWE)
No ratings yet
Module 2: Switching Concepts: Switching, Routing, and Wireless Essentials v7.0 (SRWE)
17 pages
CC Lecture Notes - Unit-I
No ratings yet
CC Lecture Notes - Unit-I
62 pages
02 DBS3900 Operation and Maintenance
No ratings yet
02 DBS3900 Operation and Maintenance
43 pages
Casino Game
100% (1)
Casino Game
3 pages
Exception Handling
No ratings yet
Exception Handling
62 pages
Spim Lab Manual
100% (1)
Spim Lab Manual
26 pages
C LANGUAGE Multiple Choice Questions
No ratings yet
C LANGUAGE Multiple Choice Questions
3 pages
Nikhil Resume PDF
No ratings yet
Nikhil Resume PDF
1 page
Instruction of NK260 Network Connection-R1
No ratings yet
Instruction of NK260 Network Connection-R1
18 pages
Android Emulator Unit2
No ratings yet
Android Emulator Unit2
37 pages
PEB1 Users Manual v2.0
No ratings yet
PEB1 Users Manual v2.0
37 pages
The Art of Instrumentation & Vibration Analysis
No ratings yet
The Art of Instrumentation & Vibration Analysis
64 pages
FM AA CIA 15 Module 5edited
No ratings yet
FM AA CIA 15 Module 5edited
28 pages
Designing Dropbox - Grokking The System Design Interview
No ratings yet
Designing Dropbox - Grokking The System Design Interview
15 pages
Elife Active8
No ratings yet
Elife Active8
12 pages
Java OOP Interview Preparation Guide
No ratings yet
Java OOP Interview Preparation Guide
6 pages
L-3 1 3
No ratings yet
L-3 1 3
14 pages
Exam 1 - Attempt Review
No ratings yet
Exam 1 - Attempt Review
12 pages
Stima CLS Windows Quick Setup Guide
No ratings yet
Stima CLS Windows Quick Setup Guide
16 pages
Grading Rubric
No ratings yet
Grading Rubric
1 page
AWS Guard Duty Forensics & Incident Response
No ratings yet
AWS Guard Duty Forensics & Incident Response
9 pages
CST303 Computer Networks, December 2021
No ratings yet
CST303 Computer Networks, December 2021
3 pages
Fuji Rep News No. 07135
No ratings yet
Fuji Rep News No. 07135
5 pages
R Programming Insights Textbook
From Everand
R Programming Insights Textbook
Manish Soni
No ratings yet
IGNOU Software Engineering Previous 10 Years Solved Papers
From Everand
IGNOU Software Engineering Previous 10 Years Solved Papers
Manish Soni
No ratings yet
Demonstrating Design for Six Sigma
From Everand
Demonstrating Design for Six Sigma
Robert Perrine
3/5 (2)
Learning Advanced Programming
From Everand
Learning Advanced Programming
IT Campus Academy
No ratings yet
C Programming Concepts
From Everand
C Programming Concepts
Jitendra Patel
No ratings yet

IDS MIdterm Project - Section (C) Fall 24-25

Uploaded by

IDS MIdterm Project - Section (C) Fall 24-25

Uploaded by

Introduction to Data Science

Key Steps in Text Data Preprocessing:

You might also like