0% found this document useful (0 votes)

163 views6 pages

Pandas Exercise

Uploaded by

vesike3421

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views6 pages

Pandas Exercise

Uploaded by

vesike3421

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

TASK: Run the following code to read in the "hotel_booking_data.csv" file.

Feel free
to explore the file a bit before continuing with the rest of the exercise.

In [1]: import pandas as pd

In [2]: hotels = pd.read_csv("C:\\Users\\HP\\Desktop\\Python\\Code\\UNZIP_FOR_NOTEBOOKS_FINAL\\03-Pandas

In [3]: hotels.head()

Out[3]:
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_m

Resort
0 0 342 2015 July 27
Hotel

Resort
1 0 737 2015 July 27
Hotel

Resort
2 0 7 2015 July 27
Hotel

Resort
3 0 13 2015 July 27
Hotel

Resort
4 0 14 2015 July 27
Hotel

5 rows × 36 columns

TASK: How many rows are there?

In [4]: # CODE HERE

len(hotels)

Out[4]: 119390

TASK: Is there any missing data? If so, which column has the most missing data?
In [5]: # CODE HERE
hotels.isnull().sum()

Out[5]: hotel 0
is_canceled 0
lead_time 0
arrival_date_year 0
arrival_date_month 0
arrival_date_week_number 0
arrival_date_day_of_month 0
stays_in_weekend_nights 0
stays_in_week_nights 0
adults 0
children 4
babies 0
meal 0
country 488
market_segment 0
distribution_channel 0
is_repeated_guest 0
previous_cancellations 0
previous_bookings_not_canceled 0
reserved_room_type 0
assigned_room_type 0
booking_changes 0
deposit_type 0
agent 16340
company 112593
days_in_waiting_list 0
customer_type 0
adr 0
required_car_parking_spaces 0
total_of_special_requests 0
reservation_status 0
reservation_status_date 0
name 0
email 0
phone-number 0
credit_card 0
dtype: int64

In [6]: print(f"Yes, missing data, company column missing: {hotels['company'].isna().sum()} rows.")

Yes, missing data, company column missing: 112593 rows.

TASK: Drop the "company" column from the dataset.

In [7]: hotels.drop(columns=['company'],inplace=True)

TASK: What are the top 5 most common country codes in the dataset?

In [9]: hotels['country'].value_counts()[:5]

Out[9]: PRT 48590

GBR 12129
FRA 10415
ESP 8568
DEU 7287
Name: country, dtype: int64

TASK: What is the name of the person who paid the highest ADR (average daily rate)? How much was their
ADR?
In [10]: # CODE HERE
hotels.sort_values('adr',ascending=False)[['name','adr']].iloc[0]

Out[10]: name Daniel Walter

adr 5400.0
Name: 48515, dtype: object

TASK: The adr is the average daily rate for a person's stay at the hotel. What is the mean adr across all the
hotel stays in the dataset?

In [43]: # CODE HERE

round(hotels['adr'].mean(),2)

Out[43]: 101.83

TASK: What is the average (mean) number of nights for a stay across the entire data set? Feel free to round
this to 2 decimal points.

In [46]: # CODE HERE

total_night_stay=hotels['stays_in_week_nights']+hotels['stays_in_weekend_nights']

In [47]: round(total_night_stay.mean(),2)

Out[47]: 3.43

TASK: What is the average total cost for a stay in the dataset? Not average daily cost, but total stay cost. (You
will need to calculate total cost your self by using ADR and week day and weeknight stays). Feel free to round
this to 2 decimal points.

In [49]: # CODE HERE

total_cost=hotels['adr']*total_night_stay

In [52]: round(total_cost.mean(),2)

Out[52]: 357.85

TASK: What are the names and emails of people who made exactly 5 "Special Requests"?
In [58]: # CODE HERE
hotels[hotels['total_of_special_requests']==5][['name','email']]

Out[58]: name email

7860 Amanda Harper [email protected]

11125 Laura Sanders [email protected]

14596 Tommy Ortiz [email protected]

14921 Gilbert Miller [email protected]

14922 Timothy Torres [email protected]

24630 Jennifer Weaver [email protected]

27288 Crystal Horton [email protected]

27477 Brittney Burke [email protected]

29906 Cynthia Cabrera [email protected]

29949 Sarah Floyd [email protected]

32267 Michelle Villa [email protected]

39027 Nichole Hebert [email protected]

39129 Lindsey Mckenzie [email protected]

39525 Ashley Edwards [email protected]

70114 Christopher Torres [email protected]

78819 Mrs. Tara Sullivan DVM [email protected]

78820 Michaela Brown [email protected]

78822 Kurt Maldonado MD [email protected]

97072 Jason Richardson [email protected]

97099 Terri Hurley [email protected]

97261 Mrs. Caitlin Webb [email protected]

98410 Holly Arroyo [email protected]

98674 Denise Campbell [email protected]

99887 Michael Smith [email protected]

99888 Dr. Trevor Sellers [email protected]

101569 Kayla Murphy [email protected]

102061 Taylor Martinez [email protected]

109511 Charles Wilson [email protected]

109590 Tyler Allison [email protected]

110082 Matthew Bailey [email protected]

110083 Charlotte Acevedo [email protected]

111909 Darrell Brennan [email protected]

111911 Melinda Jensen [email protected]

113915 Terry Arnold [email protected]

114770 Mary Nguyen [email protected]

114909 Lindsay Cuevas [email protected]

116455 Cynthia Hernandez [email protected]

116457 Angela Hawkins [email protected]

118817 Sue Lawson [email protected]

119161 Alyssa Richards [email protected]

TASK: What percentage of hotel stays were classified as "repeat guests"? (Do not base this off the name of
the person, but instead of the is_repeated_guest column)
In [77]: round((hotels['is_repeated_guest']==1).sum()/len(hotels['is_repeated_guest'])*100,2)

Out[77]: 3.19

In [ ]:

TASK: What are the top 5 most common last name in the dataset? Bonus: Can you figure this out in one line
of pandas code? (For simplicity treat the a title such as MD as a last name, for example Caroline Conley MD
can be said to have the last name MD)

In [80]: #CODE HERE

first_last_name=hotels['name'].str.split()

In [82]: last_name=first_last_name.str[-1]

In [86]: hotels['name'].apply(lambda name: name.split()[1]).value_counts()[:5]

Out[86]: Smith 2510

Johnson 1998
Williams 1628
Jones 1441
Brown 1433
Name: name, dtype: int64

TASK: What are the names of the people who had booked the most number children and babies for their stay?
(Don't worry if they canceled, only consider number of people reported at the time of their reservation)

In [11]: hotels['total_kids']=hotels['babies']+hotels['children']

In [17]: hotels.sort_values('total_kids',ascending=False)[['name','adults','total_kids','babies','childre

Out[17]:
name adults total_kids babies children

328 Jamie Ramirez 2 10.0 0 10.0

46619 Nicholas Parker 2 10.0 10 0.0

78656 Marc Robinson 1 9.0 9 0.0

19718 Mr. Jeffrey Cross 2 3.0 0 3.0

107837 Albert French 2 3.0 2 1.0

... ... ... ... ... ...

119389 Ariana Michael 2 0.0 0 0.0

40600 Craig Campos 2 NaN 0 NaN

40667 David Murphy 2 NaN 0 NaN

40679 Frank Burton 3 NaN 0 NaN

41160 Jerry Roberts 2 NaN 0 NaN

119390 rows × 5 columns

TASK: What are the top 3 most common area code in the phone numbers? (Area code is first 3 digits)

In [18]: #CODE HERE

area_codes=hotels['phone-number'].str[:3]

In [20]: area_codes.value_counts()[:3]

Out[20]: 799 168

185 167
541 166
Name: phone-number, dtype: int64
TASK: How many arrivals took place between the 1st and the 15th of the month (inclusive of 1 and 15) ?
Bonus: Can you do this in one line of pandas code?

In [21]: #CODE HERE

hotels['arrival_date_day_of_month'].apply(lambda day:day in range(1,16)).sum()

Out[21]: 58152

HARD BONUS TASK: Create a table for counts for each day of the week that people arrived. (E.g. 5000 arrivals
were on a Monday, 3000 were on a Tuesday, etc..)

In [47]: def convert_to_proper(day,month,year):

return f'{day}-{month}-{year}'

In [50]: import numpy as np

hotels['date']=np.vectorize(convert_to_proper)(hotels['arrival_date_day_of_month'],
hotels['arrival_date_month'],
hotels['arrival_date_year'])

In [52]: date_to_day=hotels['date']

In [53]: date_to_day=pd.to_datetime(date_to_day)

In [55]: date_to_day.dt.day_name().value_counts()

Out[55]: Friday 19631

Thursday 19254
Monday 18171
Saturday 18055
Wednesday 16139
Sunday 14141
Tuesday 13999
Name: date, dtype: int64

Attract Women Unlock Her Legs How To Effortlessly Attract Women and Become The Man Women Unlock Their Legs For (Dating Advice For Men To Attract Women) PDF
95% (20)
Attract Women Unlock Her Legs How To Effortlessly Attract Women and Become The Man Women Unlock Their Legs For (Dating Advice For Men To Attract Women) PDF
41 pages
Mobile Application For Lost and Found Items in King Khalid University
No ratings yet
Mobile Application For Lost and Found Items in King Khalid University
84 pages
Quiz 1 - Introduction To Data Science (S1-22 - DSECLZG532)
No ratings yet
Quiz 1 - Introduction To Data Science (S1-22 - DSECLZG532)
18 pages
Jake S Resume
No ratings yet
Jake S Resume
1 page
Programming Essentials in Python Introduction To Python
No ratings yet
Programming Essentials in Python Introduction To Python
33 pages
Exercise 1 - Buckling of Columns - Solutions
No ratings yet
Exercise 1 - Buckling of Columns - Solutions
1 page
Engn4627pr02 PDF
No ratings yet
Engn4627pr02 PDF
4 pages
Department of Civil Engineering: Institute of Technology of Cambodia
No ratings yet
Department of Civil Engineering: Institute of Technology of Cambodia
22 pages
TP Matlab
No ratings yet
TP Matlab
10 pages
Jimma University Course Outline
50% (2)
Jimma University Course Outline
1 page
C Program List
No ratings yet
C Program List
18 pages
Hydrostatic Force On Curved Surfaces
No ratings yet
Hydrostatic Force On Curved Surfaces
8 pages
Answer 28800
No ratings yet
Answer 28800
2 pages
Exploit Writing With Python
100% (1)
Exploit Writing With Python
2 pages
Introduction To Matlab
75% (4)
Introduction To Matlab
34 pages
Chap 03 - Algorithm Design For Sequence Control Structure
No ratings yet
Chap 03 - Algorithm Design For Sequence Control Structure
35 pages
Data Mining Exercises - Solutions
No ratings yet
Data Mining Exercises - Solutions
5 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
3 pages
Solar Panel Cleaning System Documentation
No ratings yet
Solar Panel Cleaning System Documentation
52 pages
TOC Practical Front Page and Index
No ratings yet
TOC Practical Front Page and Index
5 pages
Faculty of Engineering, Environment and Computing 7071CEM Assignment Brief Jan-May 2021
No ratings yet
Faculty of Engineering, Environment and Computing 7071CEM Assignment Brief Jan-May 2021
4 pages
Nmos Pmos Char PDF
No ratings yet
Nmos Pmos Char PDF
4 pages
Smart Agriculture
No ratings yet
Smart Agriculture
14 pages
Sabur Ali Resume - Civil Engineer
No ratings yet
Sabur Ali Resume - Civil Engineer
2 pages
Unit 7 - Topic 1 - Selection Structure
100% (1)
Unit 7 - Topic 1 - Selection Structure
4 pages
Examen Corrigé Methode Numérique S4 - UAMOB PDF
No ratings yet
Examen Corrigé Methode Numérique S4 - UAMOB PDF
4 pages
Self Balance Robot
100% (1)
Self Balance Robot
45 pages
Voice Wheel Chair Gps GPC
No ratings yet
Voice Wheel Chair Gps GPC
65 pages
Repport Btech Final
No ratings yet
Repport Btech Final
49 pages
OA TD2 Correction 2019 2020 - Compressed
No ratings yet
OA TD2 Correction 2019 2020 - Compressed
3 pages
DSP Sample Question - Final
No ratings yet
DSP Sample Question - Final
7 pages
CCNA 2 - Final Exam Answers
No ratings yet
CCNA 2 - Final Exam Answers
24 pages
Exponential Noise &amp All Filters (Matlab Code)
50% (2)
Exponential Noise &amp All Filters (Matlab Code)
4 pages
Online Bus Ticket Reservation Using Php/Mysqli With Source Code
100% (1)
Online Bus Ticket Reservation Using Php/Mysqli With Source Code
2 pages
CE-303 Operating Systems Final Spring 2021
No ratings yet
CE-303 Operating Systems Final Spring 2021
3 pages
Merise - MCP, MLC, MLD - Engl
100% (1)
Merise - MCP, MLC, MLD - Engl
7 pages
Practice Questions Lec4
No ratings yet
Practice Questions Lec4
2 pages
Haar Wavelets
No ratings yet
Haar Wavelets
4 pages
Lab Exam Question Bank OfC++ PRG
No ratings yet
Lab Exam Question Bank OfC++ PRG
3 pages
Experiment - Binary and Decimal Numbers
0% (1)
Experiment - Binary and Decimal Numbers
6 pages
Introduction To Information Theory Channel Capacity and Models
No ratings yet
Introduction To Information Theory Channel Capacity and Models
36 pages
Pycryptodome Master
100% (1)
Pycryptodome Master
82 pages
Lamport's Algorithm For Logical Clock
No ratings yet
Lamport's Algorithm For Logical Clock
5 pages
Individual Household Electric Power Consumption Forecasting Using Machine Learning Algorithms
No ratings yet
Individual Household Electric Power Consumption Forecasting Using Machine Learning Algorithms
4 pages
Matlab Code of Image Compression
0% (1)
Matlab Code of Image Compression
5 pages
Exams 2024 Python For Beginners
No ratings yet
Exams 2024 Python For Beginners
22 pages
Advanced Line Following Robot
No ratings yet
Advanced Line Following Robot
28 pages
Penetration Testing A PfSense Firewall (3e)
No ratings yet
Penetration Testing A PfSense Firewall (3e)
8 pages
Iuea Internship Report
No ratings yet
Iuea Internship Report
26 pages
Highway & Transportation Engineering
No ratings yet
Highway & Transportation Engineering
3 pages
Résolution Des Équations Non Linéaires
100% (1)
Résolution Des Équations Non Linéaires
4 pages
Bresenham Line Drawing Algorithm
No ratings yet
Bresenham Line Drawing Algorithm
15 pages
Generalities On Ucs
No ratings yet
Generalities On Ucs
11 pages
DSP Examen p1 2013
No ratings yet
DSP Examen p1 2013
2 pages
Rfid Logger With Mysql Database
No ratings yet
Rfid Logger With Mysql Database
10 pages
Itc Mathlab TP
100% (1)
Itc Mathlab TP
43 pages
EDA of Hotel Booking Dataset - Kaggle
No ratings yet
EDA of Hotel Booking Dataset - Kaggle
67 pages
Project SLC DSBA INNHotels FullCode-Copy1
No ratings yet
Project SLC DSBA INNHotels FullCode-Copy1
138 pages
Hotel Status Dashboard
No ratings yet
Hotel Status Dashboard
4,621 pages
HotelBookingData Analysis
No ratings yet
HotelBookingData Analysis
20 pages
Hotel Bookings Exploratory Data Analysis - 1
No ratings yet
Hotel Bookings Exploratory Data Analysis - 1
13 pages
Oral Communication For Finals Notes
No ratings yet
Oral Communication For Finals Notes
11 pages
Net Solved Question Papers
No ratings yet
Net Solved Question Papers
44 pages
Scansion Worksheet
No ratings yet
Scansion Worksheet
6 pages
Simulado 4 Inedito 2025 Epcar Alcateia Do Ensino
No ratings yet
Simulado 4 Inedito 2025 Epcar Alcateia Do Ensino
15 pages
Chapter 5 - Logic
No ratings yet
Chapter 5 - Logic
26 pages
English 1sci23 2trim1
No ratings yet
English 1sci23 2trim1
2 pages
658395492 ĐỀ THI HSG ANH 6
No ratings yet
658395492 ĐỀ THI HSG ANH 6
5 pages
Iti Limited PDF
No ratings yet
Iti Limited PDF
7 pages
CSE Professional Reviewer
No ratings yet
CSE Professional Reviewer
6 pages
Quiz Notebook I. Sound Devices "Which Sound Device Am I?" Directions: Shade The Letter of The Correct Answer
No ratings yet
Quiz Notebook I. Sound Devices "Which Sound Device Am I?" Directions: Shade The Letter of The Correct Answer
2 pages
App 1a en R3.0
No ratings yet
App 1a en R3.0
9 pages
The King in Yellow - Annotated Edition-88-224
No ratings yet
The King in Yellow - Annotated Edition-88-224
137 pages
jm214p77k-Sp SOW Y8 Achievement Test A Question Paper
No ratings yet
jm214p77k-Sp SOW Y8 Achievement Test A Question Paper
18 pages
Composition Sample Pack
No ratings yet
Composition Sample Pack
25 pages
List/Word Bank Matching: (20 Questions, 20 Points) Write Only The Letter From The Word Bank
No ratings yet
List/Word Bank Matching: (20 Questions, 20 Points) Write Only The Letter From The Word Bank
4 pages
Zutto (ずっと) By Tomohisa Sako: Romanizedkanji/Hangultranslation
No ratings yet
Zutto (ずっと) By Tomohisa Sako: Romanizedkanji/Hangultranslation
4 pages
Variables
No ratings yet
Variables
11 pages
Past Simple 2 Interrogative, Negative, Mixed Forms
No ratings yet
Past Simple 2 Interrogative, Negative, Mixed Forms
28 pages
Exegesis of Romans 8:5-11
100% (1)
Exegesis of Romans 8:5-11
15 pages
t2 e 3801 Year 6 Synonyms and Antonyms Warmup Powerpoint - Ver - 1
No ratings yet
t2 e 3801 Year 6 Synonyms and Antonyms Warmup Powerpoint - Ver - 1
15 pages
Entregable 2, Finanzas Corporativas
100% (1)
Entregable 2, Finanzas Corporativas
4 pages
Africa-chadicMusey English French Dictionary
100% (2)
Africa-chadicMusey English French Dictionary
168 pages
Listen To Text 1. Decide Which Three Statements (A - G) Are True According To The Text. If
No ratings yet
Listen To Text 1. Decide Which Three Statements (A - G) Are True According To The Text. If
5 pages
Speakout Writing Extra Starter Unit 1 PDF
No ratings yet
Speakout Writing Extra Starter Unit 1 PDF
1 page
2024 Gitt-1 1
No ratings yet
2024 Gitt-1 1
11 pages
TOEFL Grammar Guide PDF
No ratings yet
TOEFL Grammar Guide PDF
78 pages
Gegar Nilam Tingkatan5
100% (1)
Gegar Nilam Tingkatan5
36 pages
Dongeng Bahasa Inggris Dan Terjemahannya
57% (7)
Dongeng Bahasa Inggris Dan Terjemahannya
23 pages
Cuadro Comparativo Methods 2ND LG Acquisition
No ratings yet
Cuadro Comparativo Methods 2ND LG Acquisition
18 pages

Pandas Exercise

Uploaded by

Pandas Exercise

Uploaded by

TASK: Run the following code to read in the "hotel_booking_data.csv" file.

In [1]: import pandas as pd

In [2]: hotels = pd.read_csv("C:\\Users\\HP\\Desktop\\Python\\Code\\UNZIP_FOR_NOTEBOOKS_FINAL\\03-Pandas

TASK: How many rows are there?

In [4]: # CODE HERE

In [6]: print(f"Yes, missing data, company column missing: {hotels['company'].isna().sum()} rows.")

Yes, missing data, company column missing: 112593 rows.

TASK: Drop the "company" column from the dataset.

Out[9]: PRT 48590

Out[10]: name Daniel Walter

In [43]: # CODE HERE

In [46]: # CODE HERE

In [49]: # CODE HERE

Out[58]: name email

7860 Amanda Harper [email protected]

11125 Laura Sanders [email protected]

14596 Tommy Ortiz [email protected]

14921 Gilbert Miller [email protected]

14922 Timothy Torres [email protected]

24630 Jennifer Weaver [email protected]

27288 Crystal Horton [email protected]

27477 Brittney Burke [email protected]

29906 Cynthia Cabrera [email protected]

29949 Sarah Floyd [email protected]

32267 Michelle Villa [email protected]

39027 Nichole Hebert [email protected]

39129 Lindsey Mckenzie [email protected]

39525 Ashley Edwards [email protected]

70114 Christopher Torres [email protected]

78819 Mrs. Tara Sullivan DVM [email protected]

78820 Michaela Brown [email protected]

78822 Kurt Maldonado MD [email protected]

97072 Jason Richardson [email protected]

97099 Terri Hurley [email protected]

97261 Mrs. Caitlin Webb [email protected]

98410 Holly Arroyo [email protected]

98674 Denise Campbell [email protected]

99887 Michael Smith [email protected]

99888 Dr. Trevor Sellers [email protected]

101569 Kayla Murphy [email protected]

102061 Taylor Martinez [email protected]

109511 Charles Wilson [email protected]

109590 Tyler Allison [email protected]

110082 Matthew Bailey [email protected]

110083 Charlotte Acevedo [email protected]

111909 Darrell Brennan [email protected]

111911 Melinda Jensen [email protected]

113915 Terry Arnold [email protected]

114770 Mary Nguyen [email protected]

114909 Lindsay Cuevas [email protected]

116455 Cynthia Hernandez [email protected]

116457 Angela Hawkins [email protected]

118817 Sue Lawson [email protected]

119161 Alyssa Richards [email protected]

In [80]: #CODE HERE

In [86]: hotels['name'].apply(lambda name: name.split()[1]).value_counts()[:5]

Out[86]: Smith 2510

328 Jamie Ramirez 2 10.0 0 10.0

46619 Nicholas Parker 2 10.0 10 0.0

78656 Marc Robinson 1 9.0 9 0.0

19718 Mr. Jeffrey Cross 2 3.0 0 3.0

107837 Albert French 2 3.0 2 1.0

... ... ... ... ... ...

119389 Ariana Michael 2 0.0 0 0.0

40600 Craig Campos 2 NaN 0 NaN

40667 David Murphy 2 NaN 0 NaN

40679 Frank Burton 3 NaN 0 NaN

41160 Jerry Roberts 2 NaN 0 NaN

119390 rows × 5 columns

In [18]: #CODE HERE

Out[20]: 799 168

In [21]: #CODE HERE

In [47]: def convert_to_proper(day,month,year):

In [50]: import numpy as np

Out[55]: Friday 19631

You might also like