0% found this document useful (0 votes)
14 views3 pages

Assignment 2425

This document outlines an assignment for a Data Programming in Python course, contributing 20% to the overall grade. It includes tasks involving data analysis using Pandas on datasets related to Star Wars characters and Pokemon statistics, with specific questions to be answered and submitted by December 6th. Students are required to upload their Python code in specified formats on Moodle, ensuring proper coding practices and avoiding hard-coded values.

Uploaded by

Jorge Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views3 pages

Assignment 2425

This document outlines an assignment for a Data Programming in Python course, contributing 20% to the overall grade. It includes tasks involving data analysis using Pandas on datasets related to Star Wars characters and Pokemon statistics, with specific questions to be answered and submitted by December 6th. Students are required to upload their Python code in specified formats on Moodle, ensuring proper coding practices and avoiding hard-coded values.

Uploaded by

Jorge Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Programming in Python

Assignment
This assignment sheet is assessed and contributes 20% to your overall grade for Data Programming in Python. You can obtain a total
of 20 marks for each sheet. You have to answer all questions on this sheet.
Please upload your answers before Friday, 6th December, 11am (UK time).
To upload your answers, log on to Moodle and go to Data Programming in Python. Under Assessments, there is a link Upload
answers for assignment 3 (due December 6th). Click on this link to upload the file containing your commented Python code. You
can either upload a well commented and well structured Python source file (file extension .py) or a Jupyter notebook (file extension
.ipynb). Please do not upload your code in other formats. You may get a file type error when uploading, please ignore this.

Task 1. The data we are going to be using for this task contains information about the Star Wars movies. Use the following
link to load the data in directly:
https://raw.githubusercontent.com/UofGAnalyticsData/DPIP/main/assesment_datase
ts/assessment3/starwars.csv
You can load this data in python in multiple ways.
First, we can use read csv to directly download the data as using pandas as follows:
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)
note you may need to add additional arguments to read csv.
Or if we want to download the data in Colab (note this will not work on some non-Colab platforms using the following
command:
!wget https://raw.githubusercontent.com/UofGAnalyticsData/DPIP/..
..main/assesment_datasets/assessment3/starwars.csv

df = pd.read_csv("starwars.csv")
This assessment is testing your ability to use Pandas, thus, please use Pandas syntax, e.g. query, to do the following.
Do not “hard-code” information you have “manually” obtained from the data. Please make methods generic.
(a) What is the maximum height and maximum mass of any character? [1 mark]
(b) Find the species of the characters with a birth year greater than 105. [1 mark]
(c) Add a column BMI to the dataset: it should contain the body mass index (10000 × mass/height2 when mass is
measured in kg, and height measured in cm). [1 mark]
(d) Find the character with the highest BMI. [1 mark]
(e) Ignoring the character with the highest BMI. Construct three scatter plots of height vs mass using each of the
plotting libraries we have discussed in the notes (matplotlib, seaborn and pandas).
[2 mark]
(f) Repeat the matplotlib plot from the previous question, and add two new lines, showing the mass and height values
that would result in a BMI of 10 and of 40. Plot the first line in cyan and the second in magenta.
You can compute the equation using the following rearrangement of the expression above:

1
r
10000 × mass
height =
BMI

Hint: You can use the plot command from matplotlib.


[1 mark]
(g) For every homeworld, find how many characters come from it. [1 mark]
(h) Find the names of all characters which are from the same homeworld as Chewbacca. [1 mark]
(i) For every homeworld, find the proportion of characters which are humans. [1 mark]
(j) For the following homeworlds, Naboo, Tatooine, Alderaan and Kashyyyk, please calculate the number of charac-
ters of each species. Hint: you may want to use groupby and then .size() - but you don’t need to do it this way.
[2 mark]

2
Task 2. The data we are going to be using for this task contains information about the statistics of Pokemon (from the games,
not Pokemon card or Pokemon Go). Use the following link to load the data in directly:
https://raw.githubusercontent.com/UofGAnalyticsData/DPIP/main/assesment_datase
ts/assessment3/Pokemon.csv
You can use similar commands as above to load the data.
Please use Pandas syntax, e.g. query, to do the following. Do not “hard-code” information you have “manually”
obtained from the data. Please make methods generic.
Note: I am considering any Pokemon with a different name to be a different Pokemon. E.g. Venusaur and VenusaurMega
Venusaur are different Pokemon, despite their number (#) being the same
(a) How many legendary Pokemon are there? [1 mark]
(b) Find the average attributes (Total, HP, Attack, Defence, Sp.Atk, Sp. Def, Speed) for Pokemon belonging to each
primary type (Type 1). [1 mark]
On average which type (Type 1) has the highest score for each attribute (Do this in one go rather than separately
for each attribute) [1 mark]
(c) Create a new column called ‘multi type’ indicating whether the pokemon has more than one Type (e.g. does Type
2 have an entry), where True should be a Pokemon with more than one type. [1 mark]
(d) Compute the 6 most common type 1s in generation 1, (note this should be generic and not rely on your computing
them by hand). You may find the isin function (discussed in the next question) and the nlargest method
useful.
For each of the most common types plot a histogram of HP of all generation 1 pokemon of that type.
So that the histograms are not overlapping, present them as a plot with 6 subplots (1 for each type) in 3 by 2
arrangement. [2 mark]
(e) In Pokemon, certain types are either good, bad or average against other types. For grass Pokemon this is the
following:
• Bad against: Flying, Poison, Bug, Steel, Fire, Grass, Dragon
• Good against: Ground, Rock, Water
where they are average against any other types.
Create a table of each Pokemon’s adjusted attack attribute against grass Pokemon based on ‘Type 1’. The adjusted
attack attribute can be calculated on the basis that:
• the attack attribute is doubled if grass Pokemon are bad against that type (e.g. flying)
• halved if they are good against that type (e.g. ground)
• remains the same otherwise (e.g. ice).
Add this to the table under the column title ‘grass attack’. [1 marks]
Example. Say a Pokemon has an attack attribute of 100. If the Pokemon is primarily a Fire Pokemon (Type 1 ==
Fire; in bad against list) then its ‘grass attack’ would be 200. Similar if it is primarily a Rock Pokemon (Type 1 ==
Rock; in good against list) then its ‘grass attack’ would be 50. If it is primarily an Ice Pokemon (Type 1 == Ice; not
in either good or bad against lists) then its ‘grass attack’ would be 100.
Which Pokemon has the lowest ‘grass attack’?
Hint: The following may be useful depending on how you choose to answer this question.
• To check if IN a list: x.isin(y)
• to check if NOT IN a list: ˜x.isin(y) [1 marks]

Total : 20

You might also like