0% found this document useful (0 votes)

5 views3 pages

Task2 - Part 2

The document outlines a PySpark task focused on analyzing COVID-19 patient data from a CSV file named PatientInfo. It describes the structure of the dataset, including patient details such as ID, sex, age, and infection case, and provides code snippets for loading the data and performing basic operations like counting released patients. Additionally, it includes tasks for handling null values and modifying the dataset.

Uploaded by

abdohishamgalaby512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views3 pages

Task2 - Part 2

Uploaded by

abdohishamgalaby512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

PySpark Task 2 _ Part 2 .

ipynb - Colab

keyboard_arrow_down PySpark Task 2 _ Part 2

In this task we will be using the "[[NeurIPS 2020] Data Science for COVID-19 (DS4C)]

The csv file that we will be using in this task is PatientInfo.

keyboard_arrow_down PatientInfo.csv Type your text

patient_id the ID of the patient

sex the sex of the patient

age the age of the patient

country the country of the patient

province the province of the patient

city the city of the patient

infection_case the case of infection

infected_by the ID of who infected the patient

contact_number the number of contacts with people

symptom_onset_date the date of symptom onset

confirmed_date the date of being confirmed

released_date the date of being released

deceased_date the date of being deceased

state isolated / released / deceased

keyboard_arrow_down Import and create SparkSession

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("COVID-19 Analysis").getOrCreate()

keyboard_arrow_down Load the PatientInfo.csv file and show the first 5 rows
df = spark.read.csv("/content/PatientInfo.csv", header=True, inferSchema=True)
df.show(5)

1/3
PySpark Task 2 _ Part 2 .ipynb - Colab

keyboard_arrow_down Display the schema of the dataset

df.printSchema()

keyboard_arrow_down Using the state column.

How many people survived (released)?

df.filter(df.state == "released").count()

2929

keyboard_arrow_down Bonus Question!

Display the number of null values in each column we didn't cover how to do this, but we covered
something very similar. Check this link for a hint
"https://sparkbyexamples.com/pyspark/pyspark-find-count-of-null-none-nan-values/" If you get
stuck on this, don't worry, just view the solutions

2/3
PySpark Task 2 _ Part 2 .ipynb - Colab

Start coding or generate with AI.

keyboard_arrow_down
Fill the nulls in the infected_by column with the string "Unknown"

use shift + tab on fill function for a hint

[ ] ↳ 1 cell hidden
keyboard_arrow_down

Try to Drop the column infection_case

[ ] ↳ 1 cell hidden

3/3

COVID-19 Clinical Trials EDA Pandas (ML - FA - DA Projects)
No ratings yet
COVID-19 Clinical Trials EDA Pandas (ML - FA - DA Projects)
53 pages
Water Hammer Analysis: Sample Problems
67% (3)
Water Hammer Analysis: Sample Problems
15 pages
Plotly PDF
No ratings yet
Plotly PDF
166 pages
Python Pandas Data Analysis
No ratings yet
Python Pandas Data Analysis
36 pages
COVID 19 Pandemic Analysis Class 12 Practicals
No ratings yet
COVID 19 Pandemic Analysis Class 12 Practicals
29 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
p.1-2 - Standards in Infection Control For Healthcare Facilities
No ratings yet
p.1-2 - Standards in Infection Control For Healthcare Facilities
19 pages
Project - COVID 19 Analysis
No ratings yet
Project - COVID 19 Analysis
2 pages
COVID-19 Clinical Trials EDA Pandas
No ratings yet
COVID-19 Clinical Trials EDA Pandas
30 pages
Covid Analysis
No ratings yet
Covid Analysis
32 pages
DSBDA Mini Project - Ipynb - Colab
No ratings yet
DSBDA Mini Project - Ipynb - Colab
22 pages
Covid Vaccine
No ratings yet
Covid Vaccine
13 pages
4 PySpark Exercises
No ratings yet
4 PySpark Exercises
7 pages
Introduction To Data Wrangling
No ratings yet
Introduction To Data Wrangling
22 pages
IP Practical 2024-25 (1 To 34)
No ratings yet
IP Practical 2024-25 (1 To 34)
33 pages
Problem IMPLEMENTATION START (1) 2
No ratings yet
Problem IMPLEMENTATION START (1) 2
8 pages
MCQ Dermatosurg Neel Chap 24-25
No ratings yet
MCQ Dermatosurg Neel Chap 24-25
78 pages
Assignment 7 Covid 19 .Ipynb - Colab
No ratings yet
Assignment 7 Covid 19 .Ipynb - Colab
6 pages
Sample
No ratings yet
Sample
13 pages
Pandas
No ratings yet
Pandas
29 pages
Project File - A
No ratings yet
Project File - A
20 pages
Hospital Management Documentation
No ratings yet
Hospital Management Documentation
19 pages
DS Lab-5 GP-Anirudh 180905452 B2 59
No ratings yet
DS Lab-5 GP-Anirudh 180905452 B2 59
27 pages
Pharmacy Literature Review Example
100% (2)
Pharmacy Literature Review Example
4 pages
Untitled Document 3
No ratings yet
Untitled Document 3
13 pages
Problem IMPLEMENTATION START
No ratings yet
Problem IMPLEMENTATION START
5 pages
Project 2
No ratings yet
Project 2
27 pages
Python Codes and Comments
No ratings yet
Python Codes and Comments
5 pages
Pubmed
No ratings yet
Pubmed
30 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Codes
No ratings yet
Codes
44 pages
Arpit Shrivastava143
No ratings yet
Arpit Shrivastava143
4 pages
Project of Covid
No ratings yet
Project of Covid
26 pages
Data Science
No ratings yet
Data Science
8 pages
12 IP Dataframe and Pyplot Notes
No ratings yet
12 IP Dataframe and Pyplot Notes
14 pages
Data Visualization Python Code
No ratings yet
Data Visualization Python Code
8 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
COVID 19 Pandemic Analysis
No ratings yet
COVID 19 Pandemic Analysis
26 pages
Lab 05 - PySpark - DataFrame
No ratings yet
Lab 05 - PySpark - DataFrame
3 pages
COVID
No ratings yet
COVID
2 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Assignment - Ipynb - Colaboratory
No ratings yet
Assignment - Ipynb - Colaboratory
14 pages
Assignment Sujith S
No ratings yet
Assignment Sujith S
13 pages
COVID-19 Data Analysis With Pandas and NumPy
No ratings yet
COVID-19 Data Analysis With Pandas and NumPy
5 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
14 pages
SITHCCC007 Assessment 1 - Portfolio Complet
100% (2)
SITHCCC007 Assessment 1 - Portfolio Complet
14 pages
BDM1043 - Lab 5 - Covid Data Insights Using Hive Queries
No ratings yet
BDM1043 - Lab 5 - Covid Data Insights Using Hive Queries
6 pages
Essential Software Assignment 3
No ratings yet
Essential Software Assignment 3
2 pages
Computer Science Ip
No ratings yet
Computer Science Ip
16 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
A2 Midterm QP
No ratings yet
A2 Midterm QP
1 page
Database Sba Updated PDF
No ratings yet
Database Sba Updated PDF
5 pages
Name
No ratings yet
Name
23 pages
Text
No ratings yet
Text
7 pages
Week 6 Assignment
No ratings yet
Week 6 Assignment
2 pages
Dataset Extraction and Datasetpre-Processing
No ratings yet
Dataset Extraction and Datasetpre-Processing
7 pages
Dataset - Asthma - Ipynb - Colaboratory
No ratings yet
Dataset - Asthma - Ipynb - Colaboratory
2 pages
Intro To Py and ML - Part 2
No ratings yet
Intro To Py and ML - Part 2
10 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
IP Ques Paper Soln
No ratings yet
IP Ques Paper Soln
2 pages
Biomedical Instrumentation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Biomedical Instrumentation: Prof. Dr. Nizamettin AYDIN
29 pages
Week 4 Laboratory Activity
No ratings yet
Week 4 Laboratory Activity
6 pages
Chaptyer One of Introduction
No ratings yet
Chaptyer One of Introduction
18 pages
Visualizing COVID-19 Data Beautifully in Python (In 5 Minutes or Less!!) - by Nik Piepenbreier - Towards Data Science
No ratings yet
Visualizing COVID-19 Data Beautifully in Python (In 5 Minutes or Less!!) - by Nik Piepenbreier - Towards Data Science
8 pages
C Ovid Data Analysis
No ratings yet
C Ovid Data Analysis
3 pages
MOON EFFECTS On Consciousness: Page 1 of 7
No ratings yet
MOON EFFECTS On Consciousness: Page 1 of 7
7 pages
Lab 3
No ratings yet
Lab 3
3 pages
Groundwater Provinces of India and Odisha - Wilfred Rajendra Kindo
No ratings yet
Groundwater Provinces of India and Odisha - Wilfred Rajendra Kindo
31 pages
Unit 4.2
No ratings yet
Unit 4.2
45 pages
TS Exam 01
No ratings yet
TS Exam 01
24 pages
PURPOSE: To Provide A Retirement Pay Scheme To
No ratings yet
PURPOSE: To Provide A Retirement Pay Scheme To
5 pages
NSTP-CWTS Output Number 2 I. Family Preparedness Emergency Plan II. Description
No ratings yet
NSTP-CWTS Output Number 2 I. Family Preparedness Emergency Plan II. Description
7 pages
COVID Project
0% (1)
COVID Project
1 page
Cia 3 Ob
No ratings yet
Cia 3 Ob
13 pages
Annexure 4
No ratings yet
Annexure 4
19 pages
Group 7 - Renewable Energy
No ratings yet
Group 7 - Renewable Energy
23 pages
Wine Formulas r1
No ratings yet
Wine Formulas r1
14 pages
Flomixers: Selection Data All Gases
No ratings yet
Flomixers: Selection Data All Gases
12 pages
BSL Profile Updation Form
No ratings yet
BSL Profile Updation Form
1 page
Comparative Review of Waste Tyre Pyrolysis 5240E1114021
No ratings yet
Comparative Review of Waste Tyre Pyrolysis 5240E1114021
6 pages
1-T1a Supply-EMC 19-C2-SAQ
No ratings yet
1-T1a Supply-EMC 19-C2-SAQ
2 pages
Biography
No ratings yet
Biography
3 pages
Prophylactic Use of Fluconazole in Very Premature Infants
No ratings yet
Prophylactic Use of Fluconazole in Very Premature Infants
7 pages
Food T
No ratings yet
Food T
5 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
105-106 From TSM - Sept-Oct - 13
No ratings yet
105-106 From TSM - Sept-Oct - 13
2 pages
G.F Personal Flow
No ratings yet
G.F Personal Flow
1 page
Reliance Case Study
No ratings yet
Reliance Case Study
1 page
Sharon Blondheim Resume 11-Feb 2014
No ratings yet
Sharon Blondheim Resume 11-Feb 2014
2 pages
Zwitsal BB Skin Prot Lot 4x6x100ml
No ratings yet
Zwitsal BB Skin Prot Lot 4x6x100ml
1 page

Task2 - Part 2

Uploaded by

Task2 - Part 2

Uploaded by

PySpark Task 2 _ Part 2 .

keyboard_arrow_down PySpark Task 2 _ Part 2

The csv file that we will be using in this task is PatientInfo.

keyboard_arrow_down PatientInfo.csv Type your text

patient_id the ID of the patient

sex the sex of the patient

age the age of the patient

country the country of the patient

province the province of the patient

city the city of the patient

infection_case the case of infection

infected_by the ID of who infected the patient

contact_number the number of contacts with people

symptom_onset_date the date of symptom onset

confirmed_date the date of being confirmed

released_date the date of being released

deceased_date the date of being deceased

state isolated / released / deceased

keyboard_arrow_down Import and create SparkSession

keyboard_arrow_down Display the schema of the dataset

keyboard_arrow_down Using the state column.

keyboard_arrow_down Bonus Question!

Start coding or generate with AI.

use shift + tab on fill function for a hint

Try to Drop the column infection_case

You might also like