0% found this document useful (0 votes)

35 views3 pages

CS202 Assignment - 4 - GIKI

The assignment for Ghulam Ishaq Khan Institute (GIKI) focuses on dataset preprocessing and visualization using R, with a submission deadline of December 21, 2024. Students are required to select a dataset, perform data cleaning and exploratory data analysis, and create visualizations to derive insights. Deliverables include R code, relevant visualizations, and a brief report summarizing the dataset and findings, with grading criteria based on code quality, correctness, analysis, and creativity.

Uploaded by

hassandevolper123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views3 pages

CS202 Assignment - 4 - GIKI

Uploaded by

hassandevolper123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Ghulam Ishaq Khan Institute

(GIKI)
Assignment # 4
Subject: ICT Course Code: CS202
Class: BS cys-3rd Submission Deadline: 21/Dec/2024

Instructor: M Talha Ashfaq Total Marks: 100

Note (Read notes & instructions first) .

CHEATING/COPY CASE or LATE SUBMISSION even 1 minute late will be graded as STRAIGHT ZERO MARKS .
So be on time make no excuse.

(Read Carefully)

Dataset Preprocessing and Visualization in R

Objective:

This assignment aims to help students understand the fundamental steps in dataset preprocessing and data
visualization. Students will select a dataset, clean and preprocess the data, explore it using visualizations, and
provide insights. This will help develop skills in data manipulation, cleaning, and visualization — essential tasks in
data analysis and machine learning.

Dataset Selection:

Choose any dataset that has both numerical and categorical variables (e.g., from Kaggle, UCI, or any public
dataset repository).

You may consider datasets such as:

● Iris dataset (classification of iris species based on flower features).

● Titanic dataset (predict survival based on passenger features).
● Wine dataset (classification of wine quality).
● Breast Cancer dataset (diagnosis of cancerous or non-cancerous samples).

Note: Ensure the dataset has at least 3 features and a target variable for classification or regression tasks.

Steps to Follow:

1. Data Importing and Preprocessing

Load the Dataset: Import the dataset into R using read.csv() or readr::read_csv().

Data Cleaning:

● Missing Values: Identify and handle missing data by either removing rows/columns or imputing missing
values (e.g., using the mean or median for numerical data).
● Duplicate Rows: Check for and remove any duplicate rows using duplicated().
● Categorical Encoding: Convert categorical variables into factors, if necessary.

2. Exploratory Data Analysis (EDA)

Summary Statistics: Display summary statistics of the numerical features (e.g., mean, median, standard
deviation, etc.).
Target Variable Analysis: Examine the distribution of the target variable.

● If the target variable is categorical (e.g., yes/no, class1/class2), visualize the count distribution.

Feature Distribution:

● For numerical features, plot histograms or box plots to understand their distribution.
● For categorical features, use bar plots to show the frequency of each category.

3. Data Visualization

Correlation Matrix: For numerical features, plot a correlation matrix to understand relationships between
variables.

Scatter Plots: Visualize relationships between pairs of numerical features.

Boxplots for Target vs. Feature: For numerical features, create boxplots to compare the distribution of the
feature across different classes of the target variable.

4. Feature Engineering

Feature Scaling: If your data contains features with very different scales, consider normalizing or standardizing
the features (especially important for models like k-NN).

Create New Features: Based on your understanding of the dataset, you may try creating new features that could
help improve model performance (e.g., ratios, combinations of existing features, etc.).

5. Model Preparation

Split Data into Train and Test Sets: Split the data into training and testing sets (usually an 80/20 or 70/30 split).

6. Documentation and Reporting

Write a brief report that explains your findings and insights from the EDA.

In your report, include the following:

● Dataset Description: Provide the name of the dataset, source, and a brief overview of the features and
target variable.
● Key Findings from EDA:
○ Summary statistics of the numerical features.
○ Distributions of categorical and numerical features.
○ Insights drawn from the correlation matrix or visualizations.
● Data Cleaning: Describe how you handled missing values and duplicates, if any.
● Visualizations: Present the most important visualizations that highlight the key patterns in the data.
● Next Steps: Suggest possible further analyses, improvements, or models that could be applied.

Deliverables:

Code: Submit your R script or R Markdown file that includes:

● Data import and preprocessing steps.

● Visualizations (e.g., histograms, box plots, scatter plots).
● Feature engineering (if done).
● Any necessary comments to explain your code.

Visualizations: Include relevant visualizations that help explain the data (e.g., plots showing distributions,
relationships between features, etc.).

Report: Submit a brief report (max 2 pages) with:

● Dataset description.
● Key findings from the EDA.
● Insights from the visualizations.
● Any additional steps for further analysis.

Grading Criteria:

● Code Quality (30%): The code should be clear, efficient, and well-commented.
● Correctness (40%): Proper handling of missing values, duplicates, and correct usage of visualization
techniques.
● Analysis (20%): A thorough and insightful EDA that includes meaningful visualizations and
interpretations.
● Creativity (10%): Innovative approaches to data exploration, feature creation, or visualization.

Engine Control System (R9M) : Section
100% (4)
Engine Control System (R9M) : Section
405 pages
Current Affairs-Weekly Session-Ppt - June 2024 Part-I
No ratings yet
Current Affairs-Weekly Session-Ppt - June 2024 Part-I
99 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
CSHP Template For Small Construction Projects
50% (4)
CSHP Template For Small Construction Projects
5 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Evaluation of Gas Hydrate in Gas Pipeline Transportation
No ratings yet
Evaluation of Gas Hydrate in Gas Pipeline Transportation
107 pages
Course Title: Data Pre-Processing and Visualization
100% (2)
Course Title: Data Pre-Processing and Visualization
11 pages
Car Safety Comprehension
100% (1)
Car Safety Comprehension
9 pages
Assignment JTW115E 2023-2024 v5
No ratings yet
Assignment JTW115E 2023-2024 v5
5 pages
Mini Project Report On
No ratings yet
Mini Project Report On
17 pages
AEDT Icepak Intro 2019R1 L3 Flow and Thermal Boundary Conditions
No ratings yet
AEDT Icepak Intro 2019R1 L3 Flow and Thermal Boundary Conditions
20 pages
m2 Final
No ratings yet
m2 Final
151 pages
627317044FINAL - Detailed Advt 02 - 2024
No ratings yet
627317044FINAL - Detailed Advt 02 - 2024
14 pages
DOLE Advisory No - 3 - 09
No ratings yet
DOLE Advisory No - 3 - 09
4 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Data Analysis With R
No ratings yet
Data Analysis With R
72 pages
Knowledge Institute of Technology: (An Autonomous Institution)
No ratings yet
Knowledge Institute of Technology: (An Autonomous Institution)
33 pages
Form and CGI
No ratings yet
Form and CGI
77 pages
Charles Oman
No ratings yet
Charles Oman
49 pages
Datascience 3
No ratings yet
Datascience 3
40 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
EDA - Module 4
No ratings yet
EDA - Module 4
35 pages
Training Report On Data Analysis With Python
No ratings yet
Training Report On Data Analysis With Python
12 pages
Fortec PT Brochure July 2020 Web
No ratings yet
Fortec PT Brochure July 2020 Web
30 pages
Unit 1
No ratings yet
Unit 1
23 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
1 s2.0 S1755581723001256 Main
No ratings yet
1 s2.0 S1755581723001256 Main
41 pages
Machine
No ratings yet
Machine
10 pages
OptaSense Third Party Interface Specification
No ratings yet
OptaSense Third Party Interface Specification
32 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
INDEX
No ratings yet
INDEX
16 pages
Antim Prahar Data Analytics For Business Decisions 2025 - Compressed
No ratings yet
Antim Prahar Data Analytics For Business Decisions 2025 - Compressed
44 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Assignment (4) .Module RAmanVerma (22MBA10026)
No ratings yet
Assignment (4) .Module RAmanVerma (22MBA10026)
18 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Document
No ratings yet
Document
21 pages
Library Jit Final Handout
No ratings yet
Library Jit Final Handout
49 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Atkinson 2020 Fields and Individuals From Bourdieu To Lahire and Back Again
No ratings yet
Atkinson 2020 Fields and Individuals From Bourdieu To Lahire and Back Again
16 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Is 14223 1 1995
No ratings yet
Is 14223 1 1995
10 pages
Module 2
No ratings yet
Module 2
30 pages
Data Science Tools Final
No ratings yet
Data Science Tools Final
11 pages
Sikorsky v. City of Newburgh, No. 23-1171 (2d Cir. May 2, 2025)
No ratings yet
Sikorsky v. City of Newburgh, No. 23-1171 (2d Cir. May 2, 2025)
13 pages
Dev Core
No ratings yet
Dev Core
7 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
Metodología para El Análisis de Datos
No ratings yet
Metodología para El Análisis de Datos
10 pages
Intro
No ratings yet
Intro
26 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Total Documentation
No ratings yet
Total Documentation
21 pages
Exp 12
No ratings yet
Exp 12
7 pages
Chapter 2 - Classification of Business
No ratings yet
Chapter 2 - Classification of Business
22 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Control of Static Electricity Work Instruction
No ratings yet
Control of Static Electricity Work Instruction
7 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Exp 12
No ratings yet
Exp 12
4 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Creating EDA Reports Using R Markdown
No ratings yet
Creating EDA Reports Using R Markdown
6 pages
Matplotlib Project Report AIPT
No ratings yet
Matplotlib Project Report AIPT
6 pages
Test 1 RMG316D 2025 - Memo
No ratings yet
Test 1 RMG316D 2025 - Memo
8 pages
DEV Lab Material
No ratings yet
DEV Lab Material
16 pages
Testing & Commissioning of Irrigation System
No ratings yet
Testing & Commissioning of Irrigation System
13 pages
DAC Phase2
No ratings yet
DAC Phase2
8 pages
Final Project
No ratings yet
Final Project
4 pages
Ad3301 Dev Splitup
No ratings yet
Ad3301 Dev Splitup
5 pages
Data Preparation Basics#
No ratings yet
Data Preparation Basics#
2 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Group Assignment - 2024 - 9
No ratings yet
Group Assignment - 2024 - 9
3 pages
Statistics With R Week 5
No ratings yet
Statistics With R Week 5
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Fast Track Quick Reference
No ratings yet
Fast Track Quick Reference
7 pages
Pseudo Holday - Handle COVID 19 - Facebook Prophet
No ratings yet
Pseudo Holday - Handle COVID 19 - Facebook Prophet
27 pages
24CSPPC106 - Essentials of Data Science
No ratings yet
24CSPPC106 - Essentials of Data Science
3 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Building A Performance Based Work Culture PDF
No ratings yet
Building A Performance Based Work Culture PDF
16 pages
Maritime Sewip Datasheet
No ratings yet
Maritime Sewip Datasheet
2 pages
Set 2
No ratings yet
Set 2
3 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
Cryptoasset Registration Flowchart
No ratings yet
Cryptoasset Registration Flowchart
1 page
India - Gratuity - Form - V2 - Signed PDF
No ratings yet
India - Gratuity - Form - V2 - Signed PDF
2 pages
Stage Carriage Fare Revision
No ratings yet
Stage Carriage Fare Revision
9 pages
To College DESIGN OF TAPER SLOT ARRAY FOR ULTRA WIDE Review 1.1
No ratings yet
To College DESIGN OF TAPER SLOT ARRAY FOR ULTRA WIDE Review 1.1
24 pages
2307
No ratings yet
2307
3 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet