Lab 05 - PySpark - DataFrame (1)
Lab 05 - PySpark - DataFrame (1)
Lab 05
PySpark - DataFrame
Question 1:
Given a tsv file WHO-COVID-19-20210601-213841.tsv which is corresponding to the WHO
Coronavirus (COVID-19) Dashboard.
Students are required to create a folder, named lab05, in /content directory of Google Colab
and then copy the tsv to /content/lab05/input/
Question 2:
Write a PySpark program, located in ASEANCaseCount.py, using DataFrames to
● to count the number of cumulative total cases among ASEAN countries (South-East
Asia Region in the given data table)
● to find the country with the maximum number of cumulative total cases among ASEAN
countries.
● to find the top 3 countries with the lowest number of cumulative cases among ASEAN
countries.