A1 Exploratory and Descriptive Data Analysis

Uploaded by

Abhishek agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views1 page

A1 Exploratory and Descriptive Data Analysis

Uploaded by

Abhishek agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

EE353 / EE769 Introduction to Data Science and Machine Learning

July-Nov 2024, IIT Bombay

Assignment 1: Exploratory and Descriptive Data Analysis

Instructions:

 Perform all analysis in an ipython notebook environment, such as Google Colab.

 For every line or block of code where you copied the base code from somewhere, cite your source with a
source number, such as #[1], #[2], #[3], etc. This should be done even if you modify the base code
substantially.
 At the bottom of your assignment, give a reference to the source numbers, e.g. “2.
https://stackoverflow.com/questions/17193850/get-column-by-number-in-pandas”, or “3. ChatGPT
prompt: How to get columns by numbers in pandas”, or “1. Discussion with classmate Mukesh Adani Roll
No. 213414”.
 Make copious use of text cells and inline comments in code cells to explain your intent, observations, and
next steps. Without these, your assignment will not be graded, and it will be assumed that you did not
understand your own code.
 Record a video shorter than 10 minutes of you demoing your assignment using a screenrecorder and
webcam. Host the video in a shared drive (MAKE SURE TO GIVE ANYONE WITH THE LINK ACCESS. WE
WILL NOT ASK YOU.) Include the link at the top of your ipynb file in a comment, and submit ONLY the
ipynb file.
 Submit your assignment on Moodle before the deadline. After the deadline, there may be a late penalty,
but submit it on Moodle only. Do not email or submit on Teams.

Questions:

1. Check out the City of Los Angeles public data sources and test the hypothesis that the statistics of
“affordable housing projects” (government housing for low-income people) in a ZIP code has a relation to
the health inspection scores of the restaurants in that ZIP code.
a) Download csv files from:
i. https://catalog.data.gov/dataset/restaurant-and-market-health-inspections
ii. https://catalog.data.gov/dataset/hcidla-affordable-housing-projects-list-2003-to-present
b) Perform EDA on the two files: [4]
i. Check if the data types are as expected, else convert them
ii. Check for missing values, then decide to either remove those rows, or fill an imputed value
iii. Check for unexpected entries in certain columns. Correct them if necessary and feasible.
iv. Plot some graphs to understand the data
c) Summarize each file by ZIP code using SQL: [2]
i. Ensure the right type of summarization (sum, mean, max etc.) for the other variables
d) Join the files using SQL by ZIP code: [2]
i. Ensure that the ZIP codes are in compatible formats and lengths
ii. For each ZIP, get the predictor variable from the housing projects file, and potential predicted
variables from the health inspections file
e) Formulate and test the hypothesis: [2]
i. Formulate a reasonable alternative hypothesis
ii. Formulate a null hypothesis
iii. Select an appropriate test and significance level
iv. Perform the test and decide if the null hypothesis should be rejected and alternative
hypothesis should be accepted
2. Open-ended: Find some interesting data from Indian government data portal https://www.data.gov.in
and perform EDA, derive some insights using graphs, and perform a statistical test for an interesting
hypothesis. No need to use multiple files for this question, unless you want to do the extra work for your
own learning. [4]

Software Service Kit For Vacon 100 - User Manual: Received All The Necessary Parts!
No ratings yet
Software Service Kit For Vacon 100 - User Manual: Received All The Necessary Parts!
2 pages
NHC O612 Manual
No ratings yet
NHC O612 Manual
30 pages
CS 3361 SET 2
No ratings yet
CS 3361 SET 2
3 pages
ayush file 1
No ratings yet
ayush file 1
37 pages
Homework 0
No ratings yet
Homework 0
4 pages
least square method
No ratings yet
least square method
2 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
H3
No ratings yet
H3
3 pages
DWR TEE PAPER
No ratings yet
DWR TEE PAPER
8 pages
Lab 02 - Introduction to Pandas
No ratings yet
Lab 02 - Introduction to Pandas
6 pages
Примена на ИКТ - 7 одд бИЛЈАНА
No ratings yet
Примена на ИКТ - 7 одд бИЛЈАНА
7 pages
QB for DS - V Sem Students
No ratings yet
QB for DS - V Sem Students
23 pages
dsp-N211010-1
No ratings yet
dsp-N211010-1
25 pages
PR LIST DSBDA
No ratings yet
PR LIST DSBDA
2 pages
Prectical List MCA-304( Data Science and Big Data) (5)
No ratings yet
Prectical List MCA-304( Data Science and Big Data) (5)
1 page
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
Lab_questionbank
No ratings yet
Lab_questionbank
3 pages
Cs3361 Set3 Fds Anna University
No ratings yet
Cs3361 Set3 Fds Anna University
3 pages
EntityFramework_CheatSheet_NoImage
No ratings yet
EntityFramework_CheatSheet_NoImage
8 pages
ECSE321 Project Deliverable 2
No ratings yet
ECSE321 Project Deliverable 2
3 pages
DevOps Testing Tools
No ratings yet
DevOps Testing Tools
8 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
Fyttlyf Data Science Team Test
No ratings yet
Fyttlyf Data Science Team Test
2 pages
RIPS 2014 Application Process SWCS-user-Guide
No ratings yet
RIPS 2014 Application Process SWCS-user-Guide
66 pages
Logfile
No ratings yet
Logfile
7 pages
Android Job Portal System
No ratings yet
Android Job Portal System
9 pages
Taurus Series Multimedia Players SNMP Test Guide-V1.0.0
No ratings yet
Taurus Series Multimedia Players SNMP Test Guide-V1.0.0
10 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Name - Jai Ramteke Class - TYBBA (CA) Roll No - 50 Python Assignment 5 Set A
100% (1)
Name - Jai Ramteke Class - TYBBA (CA) Roll No - 50 Python Assignment 5 Set A
8 pages
HW 1
No ratings yet
HW 1
3 pages
DSBDA Sample Problem Statements
No ratings yet
DSBDA Sample Problem Statements
3 pages
Vendor Master in SAP
No ratings yet
Vendor Master in SAP
9 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
Bash Shell Cheat Sheet
100% (1)
Bash Shell Cheat Sheet
3 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
ds
No ratings yet
ds
28 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
60 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
Computer Organization & Architecture Lab (CSC451) : Dr. Bibhash Sen Associate Professor Department of CSE, NIT Durgapur
No ratings yet
Computer Organization & Architecture Lab (CSC451) : Dr. Bibhash Sen Associate Professor Department of CSE, NIT Durgapur
3 pages
JAva Persistence
No ratings yet
JAva Persistence
3 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
Data Science
No ratings yet
Data Science
3 pages
CEG Assessment II
No ratings yet
CEG Assessment II
4 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
fds-fundamentals-of-data-science-laboratory
No ratings yet
fds-fundamentals-of-data-science-laboratory
53 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
pp DWDM 4 5
No ratings yet
pp DWDM 4 5
26 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
2 Create A Simple Form in ChronoForms
No ratings yet
2 Create A Simple Form in ChronoForms
11 pages
KE25 Plan Line Item Display
No ratings yet
KE25 Plan Line Item Display
5 pages
Assignment DS EC11 3
No ratings yet
Assignment DS EC11 3
1 page
Revit Course - Compressed
No ratings yet
Revit Course - Compressed
243 pages
Introduction and Installation of R
No ratings yet
Introduction and Installation of R
23 pages
Deep Security 20 Administration Guide
No ratings yet
Deep Security 20 Administration Guide
1,830 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
vamshi ml-1,2
No ratings yet
vamshi ml-1,2
25 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
No ratings yet
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
5 pages
CSCI 1120: Introduction To Computing Using C++ Tutorial 1
No ratings yet
CSCI 1120: Introduction To Computing Using C++ Tutorial 1
42 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
SVAN 957 Price List
No ratings yet
SVAN 957 Price List
2 pages
Data Science lab manual..
No ratings yet
Data Science lab manual..
54 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
TY - Lab-II CS-358 Web Tech & DS Slip (Rev 2021-22)
No ratings yet
TY - Lab-II CS-358 Web Tech & DS Slip (Rev 2021-22)
20 pages
Personal Finance Management System: Daffodil International University
No ratings yet
Personal Finance Management System: Daffodil International University
82 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Digital VLSI Design: T.Rajeswari (IN1729) Anusha G (IN1727)
No ratings yet
Digital VLSI Design: T.Rajeswari (IN1729) Anusha G (IN1727)
36 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
8238 Ecap172 Programming Methodology
No ratings yet
8238 Ecap172 Programming Methodology
234 pages
Item Bank Application Project
No ratings yet
Item Bank Application Project
25 pages
DMDW 6
No ratings yet
DMDW 6
41 pages
Datascience
No ratings yet
Datascience
8 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
CMP 214
No ratings yet
CMP 214
66 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Test-3 Solutions Subject: Advanced Computer Architecture: 1 2 3 4 5 6 7 8 S1 S2 S3 X X X X X X X X
No ratings yet
Test-3 Solutions Subject: Advanced Computer Architecture: 1 2 3 4 5 6 7 8 S1 S2 S3 X X X X X X X X
14 pages
IGNOU BCA Computer Basics and PC Software Previous Year Unsolved Papers BCS 011
From Everand
IGNOU BCA Computer Basics and PC Software Previous Year Unsolved Papers BCS 011
Manish Soni
No ratings yet
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)

A1 Exploratory and Descriptive Data Analysis

Uploaded by

A1 Exploratory and Descriptive Data Analysis

Uploaded by

EE353 / EE769 Introduction to Data Science and Machine Learning

July-Nov 2024, IIT Bombay

 Perform all analysis in an ipython notebook environment, such as Google Colab.

You might also like