Bda Survey Assignment: Parta - Rollnumbers - Ipynb Parta - Rollnumbers - Ipynb Part A

This document provides instructions for a survey data analysis assignment to be completed in Python notebooks. It involves reading survey response data from a CSV file, cleaning and preparing the data by converting variable types and creating new derived variables. It then asks students to perform exploratory data analysis on the data, including generating frequency tables, time series plots, pair plots, and correlations. For text variables, it involves analyzing word frequencies, identifying top terms, and building word clouds. It emphasizes practical data skills and working with others to analyze open-ended questions.

Uploaded by

Sankeerth Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views3 pages

Bda Survey Assignment: Parta - Rollnumbers - Ipynb Parta - Rollnumbers - Ipynb Part A

Uploaded by

Sankeerth Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

BDA Survey Assignment

Thank you for taking part in the survey. You generated 61 responses, and this is precisely the
starting point for our explorations. Crack this assignment in groups of 4, with the hope that
you will later on develop into a project team. Upload your answers on Moodle.

The purpose of this assignment is to get you going on all of the Python you have picked up.
By now, you should have gone through the MOOC, and learned more ways to handle data.

 The point of this assignment is NOT to go down the rabbit hole of identifying
responses with individuals and feel super about your investigational skills. Want to
play Sherlock? Why not show how good you are at cracking the tougher questions!
 Don’t be daunted by the assignment. The point is NOT to hint at the kind of questions
that you’ll have to solve on an exam (keep that obsession aside for a few weeks
more).

An easy way to administer a survey is through Google Forms. Not only does it summarise
the responses, but it also provides a ready spreadsheet (see BDAResponses.csv). Create two
Python notebooks in your work folder, and name them as as PartA_RollNumbers.ipynb
and PartA_RollNumbers.ipynb.

Part A

1. Read the data from the spreadsheet into a dataframe called Responses.

2. Figure out how to obtain the following:

a. Header
b. The number of rows and columns
c. Types of the variables – how many of these are correctly identified?

3. Rename the variables in the dataframe as follows:

Variable Name Question

response_date Timestamp
education Your educational background
work_ex Work experience in months - NOT years
code_ability How do you rate your ability to program?
languages What languages have you coded in?
lines_of_code How many lines of code have you written?
mba_code_reasons Why MBAs need to code?
worked_with_db Have you worked with a database?
which_db What kind of database was that?
why_bda Why on earth a course in big data analysis?
soc_med_accounts What social media accounts do you have?
soc_med_challenges What would your dream career be?
dream_career What would your dream career be?
life_mission What did you say your mission was in life?
As a first step, let’s convert all the variables into the appropriate types. Refer to this link by
Chris Albon to understand how to deal with datetime variables.

4. Convert the remaining variables to categorical/numerical depending on their scale.

Convert all the elaborate answers (e.g. why_bda) into string type variables.

5. Examine the response_date variable. Bucket the values into consecutive hourly
interval slots, using a new or “derived” variable called hour_slot. To illustrate, the
top three responses will be slotted as shown below.

response_date education work_ex hour_slot

02/07/20 9:52 Science 0 1
02/07/20 9:56 Engineering 47 1
02/07/20 10:00 Engineering 55 2
… … … …

6. Create a table to roll up the hour slots by frequencies as shown:

hour_slot count
1 3
2 1
3 1
… …

How would you interpret the zero count values in the above table?

7. Create a variable called inter_arrival_time to capture successive differences between

responses. Note that the spreadsheet is conveniently sorted in the first column, so you
don’t have to worry about the order of arrivals. Express the values in minutes.

8. In Question 7, in addition to the hour_slot, create a truncated_hour, which snips off

the minutes field from the response_date value. Next, create a frequency table just
like the one shown.

9. We’re dealing with times here. Maybe it’s time for a time series question? Get a fancy
hourly time series depicting the response arrival counts across the survey period.

Much ado about one puny variable? Now that you have developed a taste for what is entailed
with data preparation and exploration, let’s focus our attention on other variables.

10. Obtain a “pairs plot” of all pairs of numerical variables, old and new. Make it pretty.
What observations can you make?

11. Examine the correlations between all pairs of numerical variables, old and new –
figure out how to do this efficiently (Hint: Not by choosing variables one pair at a
time!). What conclusions can you make?

12. Check out the survey result link, and reproduce the bar charts and pie charts as
faithfully as possible, with the colour schemes and legends. Obtain a more meaningful
histogram for work_ex – use the seaborn library. Don’t know how to carry this out?
All you have to do is ask (Google).

Part B

For this part, you will use the second Python notebook you have created.

How would you analyse the textual variable values? Discuss this among your friends. This
is clearly the toughest question, one that has no clearcut answer. Just like the ones you
encounter at the workplace…

13. Use the following link to figure out word frequencies in an answer using Approach
3. Figure out how to eliminate punctuation marks.

14. For each textual question, identify the top 10 terms across the base of 61 responses.
What do you notice? How would you remedy the problem of “trivial” terms?

15. After applying the remedy, correlate the frequencies for the top 10 terms in each
answer with work_ex and code_ability. What conclusions do you make?

16. Figure out how to build word clouds by going through this link. Carry this out for all
questions for which the answers are free form.

17. If you use Docker, you realise that the code for the answer to Q.16 does NOT run the
next time you fire it up. How would you avert this?

Six Weeks Summer Training Report PDF
100% (2)
Six Weeks Summer Training Report PDF
26 pages
Acies Global-1
No ratings yet
Acies Global-1
5 pages
All Answers Coursera
No ratings yet
All Answers Coursera
2 pages
(ANSWERED) Informecial App Analysis Question Test
100% (1)
(ANSWERED) Informecial App Analysis Question Test
3 pages
Cell Barring (RAN15.0 02)
No ratings yet
Cell Barring (RAN15.0 02)
51 pages
Data Analyst Nanodegree Program - Syllabus
50% (2)
Data Analyst Nanodegree Program - Syllabus
7 pages
12 - How To Deal With Single Bits
No ratings yet
12 - How To Deal With Single Bits
11 pages
Mini Project Report On
No ratings yet
Mini Project Report On
17 pages
PCED - Lösung en
No ratings yet
PCED - Lösung en
24 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Set of 6 Sample Papers of Computer Science
No ratings yet
Set of 6 Sample Papers of Computer Science
59 pages
Xii Ip Sample Paper 1
No ratings yet
Xii Ip Sample Paper 1
10 pages
SAP Tables - Overview
No ratings yet
SAP Tables - Overview
3 pages
REQUIREMENTS-Storage and Filling of LPG in Bulk: WWW - Erc.go - Ke
No ratings yet
REQUIREMENTS-Storage and Filling of LPG in Bulk: WWW - Erc.go - Ke
2 pages
Applied Ethics
No ratings yet
Applied Ethics
5 pages
PACS DATA EXTRACT-User Guide
100% (1)
PACS DATA EXTRACT-User Guide
15 pages
Log
No ratings yet
Log
24 pages
Microsoft Ai Automate
No ratings yet
Microsoft Ai Automate
259 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Computer Science CLS 12
100% (1)
Computer Science CLS 12
10 pages
Datascience
No ratings yet
Datascience
8 pages
Dev Record Edited-4
No ratings yet
Dev Record Edited-4
69 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
41 pages
Yealink T55A Teams Phone Edition User Guide V15.85
No ratings yet
Yealink T55A Teams Phone Edition User Guide V15.85
51 pages
Question Papers
No ratings yet
Question Papers
55 pages
EDA Mini - Report
No ratings yet
EDA Mini - Report
24 pages
PCED Aufgaben en
No ratings yet
PCED Aufgaben en
40 pages
TD-Section IV-Technical Specification-ELL Crane-ICG 2020
No ratings yet
TD-Section IV-Technical Specification-ELL Crane-ICG 2020
76 pages
Problem3 SK
No ratings yet
Problem3 SK
31 pages
Internship Report
No ratings yet
Internship Report
67 pages
2024SDSC500AD Assignment
No ratings yet
2024SDSC500AD Assignment
24 pages
Enews-Guided - SUPPORTING
No ratings yet
Enews-Guided - SUPPORTING
27 pages
Grade 12 Cs - Pre Board 3 Ans
No ratings yet
Grade 12 Cs - Pre Board 3 Ans
29 pages
2.dasar Counting 1
No ratings yet
2.dasar Counting 1
19 pages
Stats Assignment
No ratings yet
Stats Assignment
15 pages
Lab Manual FOR CSE 355/ Data Science Professional Certification Name
No ratings yet
Lab Manual FOR CSE 355/ Data Science Professional Certification Name
20 pages
Sna Lab Report (21mic7199)
No ratings yet
Sna Lab Report (21mic7199)
25 pages
Micro Analytics Course PDF
No ratings yet
Micro Analytics Course PDF
11 pages
Vail CMMS
No ratings yet
Vail CMMS
24 pages
Computer SC Sample Paper by CBSE With Marking Scheme
No ratings yet
Computer SC Sample Paper by CBSE With Marking Scheme
18 pages
Samsung GT c3520 Service Manual PDF
No ratings yet
Samsung GT c3520 Service Manual PDF
71 pages
IP Project Deepika
No ratings yet
IP Project Deepika
26 pages
Ae, Me 2nd Semester Syllabus As Per Nep-2020 PDF
No ratings yet
Ae, Me 2nd Semester Syllabus As Per Nep-2020 PDF
26 pages
DSC C BCA 353P - MAJOR - Practical Solutions Using Python
No ratings yet
DSC C BCA 353P - MAJOR - Practical Solutions Using Python
5 pages
Data Science
No ratings yet
Data Science
10 pages
Data Science 500 Assignment
No ratings yet
Data Science 500 Assignment
6 pages
User Guide For Free Version
No ratings yet
User Guide For Free Version
20 pages
SQP 17 - QP
No ratings yet
SQP 17 - QP
10 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
G 12 Cs (MPCC) PB 1 Ms
No ratings yet
G 12 Cs (MPCC) PB 1 Ms
6 pages
Set A
No ratings yet
Set A
8 pages
Main Ldap Training Day2
No ratings yet
Main Ldap Training Day2
39 pages
Cheryl Simons Resume 2013-4
No ratings yet
Cheryl Simons Resume 2013-4
3 pages
Iii Trial
No ratings yet
Iii Trial
10 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
F20 HMGT 6335 OPRE 6332 Spreadsheet Modeling SYLLABUS
No ratings yet
F20 HMGT 6335 OPRE 6332 Spreadsheet Modeling SYLLABUS
9 pages
Syllabus Sem 6
No ratings yet
Syllabus Sem 6
6 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Bigdata Doc As
No ratings yet
Bigdata Doc As
4 pages
Eda 4 5
No ratings yet
Eda 4 5
7 pages
Excel Cad
No ratings yet
Excel Cad
8 pages
10 Basic Data Analytics Questions With Explanations
No ratings yet
10 Basic Data Analytics Questions With Explanations
2 pages
Constructor CPP Unit8
No ratings yet
Constructor CPP Unit8
28 pages
Gridadvisor Series II Smart Sensor Catalog Ca915001en
No ratings yet
Gridadvisor Series II Smart Sensor Catalog Ca915001en
4 pages
Detailed Analysis
No ratings yet
Detailed Analysis
3 pages
Constitution
No ratings yet
Constitution
3 pages
Home Assignment Dataliteracy
No ratings yet
Home Assignment Dataliteracy
4 pages
FAQ's - Applied Statistics
No ratings yet
FAQ's - Applied Statistics
3 pages
Scoring Key/marking Scheme
No ratings yet
Scoring Key/marking Scheme
9 pages
LEDGENTS For Building
No ratings yet
LEDGENTS For Building
1 page
Box Sensor 2
No ratings yet
Box Sensor 2
1 page
Advance Excel Toolkit
No ratings yet
Advance Excel Toolkit
3 pages
Gotoxy Statement in Dev C Tutorial PDF
No ratings yet
Gotoxy Statement in Dev C Tutorial PDF
2 pages
Subject: Computer Science Class: XII Exam: Practice Paper Time Duration: 3 Hrs M.M.: 70
No ratings yet
Subject: Computer Science Class: XII Exam: Practice Paper Time Duration: 3 Hrs M.M.: 70
7 pages
Delhi Public School, GBN PRE BOARD-III (2020-21)
No ratings yet
Delhi Public School, GBN PRE BOARD-III (2020-21)
11 pages
ComputerScience SQP
No ratings yet
ComputerScience SQP
11 pages
Untitled
No ratings yet
Untitled
4 pages
Regular Falsi Method: B.S. (SE) Semester Project Report
No ratings yet
Regular Falsi Method: B.S. (SE) Semester Project Report
12 pages
Data Analyst 101
No ratings yet
Data Analyst 101
9 pages
Data Science With Python Updated Brochure
No ratings yet
Data Science With Python Updated Brochure
13 pages
Kendriya Vidyalaya Sangathan, Chennai Region Practice Test 2020 - 21 Class Xii
No ratings yet
Kendriya Vidyalaya Sangathan, Chennai Region Practice Test 2020 - 21 Class Xii
9 pages
Contact Summary
No ratings yet
Contact Summary
19 pages
Diagnostic Systematic Reviews Road Map V3
No ratings yet
Diagnostic Systematic Reviews Road Map V3
2 pages
Java Course Outline
No ratings yet
Java Course Outline
3 pages
BrightViewX XCT Specs
No ratings yet
BrightViewX XCT Specs
6 pages
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
COMPUTER SCIENCE FOR ROOKIES
From Everand
COMPUTER SCIENCE FOR ROOKIES
Angel Bahabwa
No ratings yet
C# Interview Questions You'll Most Likely Be Asked
From Everand
C# Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Java/J2EE Design Patterns Interview Questions You'll Most Likely Be Asked: Second Edition
From Everand
Java/J2EE Design Patterns Interview Questions You'll Most Likely Be Asked: Second Edition
Vibrant Publishers
No ratings yet