0% found this document useful (0 votes)

23 views4 pages

Assignment03 DataScience Report

Uploaded by

Akash Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views4 pages

Assignment03 DataScience Report

Uploaded by

Akash Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Home Assignments - 03

Project Report Submitted in Partial Fulfilment of the Requirements for the Degree of

Bachelor of Technology (Hons.)

in
Computer Science and Engineering

Submitted by
AKASH KUMAR: (Roll No. 2021UGCS040)

Under the Supervision of

Dr. Dilip Kumar

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

National Institute of Technology Jamshedpur

Page 1
SUMMARY

Part 1: Data Exploration and Visualization

Approach:

• I began by loading the dataset into a pandas DataFrame and exploring its structure. Basic descriptive
statistics were calculated to understand the distribution and central tendencies of the numerical
columns. Visualizations such as histograms and box plots were employed to analyze the data
distribution and identify potential outliers.

Challenges:

• Identifying and dealing with outliers was a significant challenge, as it required a careful balance
between removing anomalous values and preserving valuable data points.
• Another challenge was ensuring the visualizations provided clear insights, which required choosing the
appropriate type of plots.

Key Learnings:

• Visualization is crucial for uncovering hidden patterns and outliers in the data.
• Descriptive statistics provide a quick summary of data, helping to identify areas that need further
cleaning or transformation.

Part 2: Data Cleaning

Approach:

• Missing values were identified and handled using appropriate methods such as imputation for
continuous variables and mode substitution for categorical variables. Duplicates were removed to
ensure data integrity. Outliers identified earlier were either removed or transformed depending on
their impact on the analysis.

Challenges:

• Handling missing data effectively without introducing bias or losing important information was tricky.
• Dealing with outliers required careful consideration of the impact on the overall dataset.

Key Learnings:

• Data cleaning is a critical step in the data science process, as it directly impacts the quality and
accuracy of the analysis.
• Proper handling of missing values and outliers ensures that the dataset is reliable for further analysis.

Page 2
Part 3: Data Integration
Approach:

• A secondary dataset was obtained and merged with the primary dataset using a common identifier.
The merged dataset was then checked for consistency, and any duplicates or discrepancies were
addressed to ensure a unified and accurate dataset.

Challenges:

• Finding a suitable secondary dataset that could be merged with the primary one was time-consuming.
• Ensuring consistency between datasets, especially when they originated from different sources,
required thorough checking and validation.

Key Learnings:

• Data integration is essential for creating comprehensive datasets that can provide deeper insights.
• Ensuring consistency across integrated datasets is crucial for maintaining the integrity of the data and
the reliability of the analysis.

Part 4: Data Storage and Retrieval

Approach:

• The cleaned dataset was saved to Google Drive for cloud storage. It was then retrieved and loaded
back into a DataFrame to ensure that data could be easily stored and accessed in a distributed
environment.

Challenges:

• Setting up the correct permissions and paths for Google Drive storage took some time, especially
when ensuring the data could be easily accessed later.
• Understanding how to effectively use cloud storage within a Python environment required additional
learning.

Key Learnings:

• Cloud storage provides a flexible and scalable solution for storing large datasets, making it easier to
collaborate and manage data in distributed environments.
• Proper storage and retrieval mechanisms are crucial for ensuring that data can be accessed and
analyzed efficiently.
• Contribution to the Overall Data Science Process

Page 3
Data Exploration:

• Data exploration is the foundation of the data science process. It allows for an initial understanding of
the data, guiding the subsequent steps of cleaning, transformation, and analysis. By identifying
patterns, trends, and outliers, it informs the strategies for data cleaning and preparation.

Data Cleaning:

• Cleaning the data ensures that it is accurate, complete, and free of errors. This step is crucial for
building reliable models and making accurate predictions. Without proper cleaning, the analysis might
be flawed, leading to incorrect conclusions.

Data Integration:

• Integrating data from multiple sources enriches the dataset, providing a more holistic view of the
problem at hand. It enables the combination of different perspectives and variables, leading to more
comprehensive analysis and better-informed decisions.

Data Storage and Retrieval:

• Efficient data storage and retrieval are critical for managing large datasets, especially in collaborative
or cloud-based environments. It ensures that data is securely stored, easily accessible, and can be
shared among team members or accessed for future analysis.

Final Thoughts
This assignment provided a comprehensive overview of the data science process, from data exploration to
cleaning, integration, and storage. Each step is interlinked, contributing to the overall goal of extracting
meaningful insights from data. By understanding and addressing the challenges in each part, I have gained a
deeper appreciation of the importance of thorough data preparation and management in the data science
workflow..

Page 4

22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
FDSMSE Imp
No ratings yet
FDSMSE Imp
6 pages
Unit-2 - DS Notes
No ratings yet
Unit-2 - DS Notes
22 pages
III Unit
No ratings yet
III Unit
4 pages
Unit 2 - DS - 1st Year
No ratings yet
Unit 2 - DS - 1st Year
7 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Unit 2
No ratings yet
Unit 2
21 pages
Mid Term Project
No ratings yet
Mid Term Project
3 pages
PDS Exp 7 To 9
No ratings yet
PDS Exp 7 To 9
10 pages
Data Handling and Visualization 3rd Unit
No ratings yet
Data Handling and Visualization 3rd Unit
4 pages
The Data Science Process
No ratings yet
The Data Science Process
33 pages
Introduction To Data Science: Data Science Methodology & Data Preparation DR Shuhaida Mohamed Shuhidan Jan 2025
No ratings yet
Introduction To Data Science: Data Science Methodology & Data Preparation DR Shuhaida Mohamed Shuhidan Jan 2025
34 pages
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
No ratings yet
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
7 pages
Document
No ratings yet
Document
29 pages
Cs3352 - Foundation of Data Science
No ratings yet
Cs3352 - Foundation of Data Science
56 pages
Data Science (Quick Guide) For College Exams
No ratings yet
Data Science (Quick Guide) For College Exams
34 pages
EBook - Data Science 4
No ratings yet
EBook - Data Science 4
14 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
Data Collection and Data Preparation
No ratings yet
Data Collection and Data Preparation
5 pages
Unit I and Unit II Dev
No ratings yet
Unit I and Unit II Dev
36 pages
Data Cleaning Methods Overview
No ratings yet
Data Cleaning Methods Overview
23 pages
Data Wrangling
No ratings yet
Data Wrangling
9 pages
Dsur Ea2352001010391 W3
No ratings yet
Dsur Ea2352001010391 W3
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Unit 2
No ratings yet
Unit 2
11 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
Dsa Report
No ratings yet
Dsa Report
24 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
Unit II (DWDM)
No ratings yet
Unit II (DWDM)
19 pages
What Is Data Science?
No ratings yet
What Is Data Science?
94 pages
As You Delve Into The World of Data Analytics
No ratings yet
As You Delve Into The World of Data Analytics
10 pages
Architecture of Data Science Projects: Components
No ratings yet
Architecture of Data Science Projects: Components
4 pages
Data Wrangling
No ratings yet
Data Wrangling
6 pages
Data Science Process Overview
No ratings yet
Data Science Process Overview
4 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
Data Science
No ratings yet
Data Science
5 pages
Sample Phase 2 Document
No ratings yet
Sample Phase 2 Document
7 pages
DS - Unit I
No ratings yet
DS - Unit I
3 pages
Ids Model 2
No ratings yet
Ids Model 2
63 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Data Science
No ratings yet
Data Science
14 pages
Data Science Assignment Final
No ratings yet
Data Science Assignment Final
2 pages
Ids Unit 2
No ratings yet
Ids Unit 2
26 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
3 pages
Data Science Process
No ratings yet
Data Science Process
13 pages
IDS - UNIT-2 - Notes Part1 - Introduction To Data Science and Prob Concept
No ratings yet
IDS - UNIT-2 - Notes Part1 - Introduction To Data Science and Prob Concept
66 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Data Analytics Project Guide
No ratings yet
Data Analytics Project Guide
6 pages
QB Ese FDS
No ratings yet
QB Ese FDS
29 pages
Unit - I
No ratings yet
Unit - I
6 pages
Introduction To Data Science: What Is Data Science? What Is A Data Science Pipeline?
No ratings yet
Introduction To Data Science: What Is Data Science? What Is A Data Science Pipeline?
3 pages
Module1 Data Science
No ratings yet
Module1 Data Science
15 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Introduction To Data Science Methodology
No ratings yet
Introduction To Data Science Methodology
45 pages
Q1. Explain Data Science Process Along With Detailed Diagram
No ratings yet
Q1. Explain Data Science Process Along With Detailed Diagram
7 pages
Microservices Using ASP - Net Core - Dot Net Tutorials
No ratings yet
Microservices Using ASP - Net Core - Dot Net Tutorials
52 pages
Technical Specification of Multi-Function Energy Meter SEM96
No ratings yet
Technical Specification of Multi-Function Energy Meter SEM96
28 pages
Information Processing Handout 3
No ratings yet
Information Processing Handout 3
5 pages
AW-NB110H WiFi & Bluetooth Module Datasheet
No ratings yet
AW-NB110H WiFi & Bluetooth Module Datasheet
13 pages
Digital Innovation: Strategies for Success
No ratings yet
Digital Innovation: Strategies for Success
14 pages
Measuring and Control Device User Manuel
No ratings yet
Measuring and Control Device User Manuel
76 pages
Python MySQL Connectivity Examples
No ratings yet
Python MySQL Connectivity Examples
5 pages
#WWDC16 Typography and Fonts PDF
100% (1)
#WWDC16 Typography and Fonts PDF
186 pages
HNG Uvp118 LHP10L 11 2019
No ratings yet
HNG Uvp118 LHP10L 11 2019
2 pages
E - Business - Unit-4
No ratings yet
E - Business - Unit-4
31 pages
A Microcontroller Based Variable Voltage
No ratings yet
A Microcontroller Based Variable Voltage
10 pages
Simplex Interface With Notifier
No ratings yet
Simplex Interface With Notifier
1 page
Innovative Practices Lab
No ratings yet
Innovative Practices Lab
49 pages
FME7 Lecture 11 LU Decomposition
No ratings yet
FME7 Lecture 11 LU Decomposition
5 pages
Electronics Grad with Web Dev Skills
No ratings yet
Electronics Grad with Web Dev Skills
1 page
Mail Flow Troubleshooting & Tricks
No ratings yet
Mail Flow Troubleshooting & Tricks
13 pages
Vehicle Security with AI Recognition
No ratings yet
Vehicle Security with AI Recognition
10 pages
Block Diagram of 8085
No ratings yet
Block Diagram of 8085
32 pages
CMM 28-12-04 Rev 1 (Pressure Relief Valve PN L97!63!602,-606)
No ratings yet
CMM 28-12-04 Rev 1 (Pressure Relief Valve PN L97!63!602,-606)
84 pages
AWS IAM and EC2 Hands-On Guide
No ratings yet
AWS IAM and EC2 Hands-On Guide
14 pages
C# ATM Management System Overview
No ratings yet
C# ATM Management System Overview
4 pages
Sigen C&I Modbus User Guide
No ratings yet
Sigen C&I Modbus User Guide
14 pages
Applied Cryptography Overview
No ratings yet
Applied Cryptography Overview
30 pages
Unified Test in Math 8
No ratings yet
Unified Test in Math 8
14 pages
FaceDancer Pose - and Occlusion-Aware High Fidelity Face Swapping
No ratings yet
FaceDancer Pose - and Occlusion-Aware High Fidelity Face Swapping
14 pages
Internship Report (Chatta)
No ratings yet
Internship Report (Chatta)
27 pages
FlashSystem - Distributed RAID - 2021-Jul-01
No ratings yet
FlashSystem - Distributed RAID - 2021-Jul-01
9 pages
The Critical Role of Digital Forensics in The Modern Information Era
No ratings yet
The Critical Role of Digital Forensics in The Modern Information Era
12 pages
150 Problem Set 2
No ratings yet
150 Problem Set 2
11 pages
Module 3 - Mutual Exclusion and Deadlock Detection - Sreerag Sanilkumar
No ratings yet
Module 3 - Mutual Exclusion and Deadlock Detection - Sreerag Sanilkumar
11 pages

Assignment03 DataScience Report

Uploaded by

Assignment03 DataScience Report

Uploaded by

Home Assignments - 03

Bachelor of Technology (Hons.)

Under the Supervision of

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

National Institute of Technology Jamshedpur

Part 1: Data Exploration and Visualization

Part 2: Data Cleaning

Part 4: Data Storage and Retrieval

Data Storage and Retrieval:

You might also like