0% found this document useful (0 votes)

8 views

CSCA08H Assignment 3

Uploaded by

sohamkorade.jnvkk

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

CSCA08H Assignment 3

Uploaded by

sohamkorade.jnvkk

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

11/29/22, 11:12 AM CSCA08H Assignment 3

CSCA08H Assignment 3: Hypertension and Low

Income in Toronto Neighbourhoods

Goals of this Assignment

In this assignment, you will practise working with files, building and using dictionaries,
designing functions using the Function Design Recipe, reading documentation, and
writing unit tests.

Background
A commonly-held belief is that an individual's health is largely influenced by the
choices they make. However, there is lots of evidence that health is affected by
systemic factors.

Health researchers often study the relationships between an individual's health

outcomes and factors related to their physical environment, social and economic
situations, and geographic location. Studies such as this one investigate how a
particular health outcome (living with hypertension) are tied to a systemic factor (the
income level of a country).

In this assignment, you will write code to assist with analysing data on the
relationship between hypertension (also known as high blood pressure) and income
levels in Toronto neighbourhoods. The data you will work with is real data, however
we have simplified it somewhat to make this assignment clearer for you.

A note on math and stats

The data analysis that your code will do will include some statistical analysis that we
have not talked about in the course. You do NOT need to understand the underlying
statistics to complete this assignment. The code you write will do some simple
mathematical operations, like adding up some numbers, or finding ratios using
division. We will use Pearson correlation for the more advanced analysis and you will
use existing functions that we have imported for you.

You will need to take a look at the examples of these functions in order to figure out
what arguments you need to pass to them, and what types of data they return, but
you do not need to understand how they work in any detail.

Correlation is a single coefficient expressing the tendency of one set of data to grow
linearly, in the same or opposite direction, with another set of data. This is done by
comparing whether points that have been paired between the two sets are similarly
greater or less than than their set's respective averages.

For example, if we wanted to compare whether for students in the class, age is
correlated with height, we would have two sets of data, birth date (which we could
express as, say, number of weeks old for finer granularity), and heights.

Numbers from each set are ordered in the same way so that each height value
corresponds to the age value for the same student. What is nice about the correlation

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 1/12
11/29/22, 11:12 AM CSCA08H Assignment 3

metric we are using, is that it is normalised to be between -1 and 1, with these values
giving us a nice human interpretation. A value of 1 means that the points make a
straight line. In our example, this means, for some increase in age, we have a
consistent increase in height. Similarly, a value of -1 is the same relationship but with
a flip of direction, where older students would be shorter than younger ones. Finally, a
value of 0 would say that there is no consistent increase or decrease in height for a
change in age. We will use this to investigate the relationship between low income
rates and hypertension, for any tendency to increase or decrease together.

If you are a statistics person, keep in mind that the learning goals of the assignment
are about writing code using what we've learned in the course, not about doing a
proper statistical analysis.

Dataset descriptions
This assignment uses data files related to one of the two variables of interest (i.e.,
hypertension data or income data). The files are CSV (comma separated values) files,
where each column in a line is separated by a comma. You can assume there are no
commas anywhere else in the files, other than to separate columns, and that any file
given is in the correct format. The two file types are described below.

Neighbourhood hypertension data files

The first row in a neighbourhood hypertension file contain header information, and the
remaining rows each contain data relating to hypertension prevalence in a particular
Toronto neighbourhood.

Here is a description of the different columns of the dataset. Notice the use of
constants and carefully study the starter file constants.py.

Column
Description
index
HT_ID_COL An ID that uniquely identifies each neighbourhood.
HT_NBH_NAME_COL The name of the neighbourhood. Neighbourhood names are unique.
The number of people aged 20 to 44 with hypertension in the
HT_20_44_COL
neighbourhood.
NBH_20_44_COL The total number of people aged 20 to 44 in the neighbourhood.
The number of people aged 45 to 64 with hypertension in the
HT_45_64_COL
neighbourhood.
NBH_45_64_COL The total number of people aged 45 to 64 in the neighbourhood.
The number of people aged 65 and older with hypertension in the
HT_65_UP_COL
neighbourhood.
NBH_65_UP_COL The total number of people aged 65 and older in the neighbourhood.
Neighbourhood hypertension dataset

Neighbourhood income data files

The first row in a neighbourhood income data file contains header information, and
the remaining rows each contain data about low income status.

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 2/12
11/29/22, 11:12 AM CSCA08H Assignment 3

Here is a description of the different columns of the dataset. Notice the use of
constants and carefully study the starter file constants.py.

Column
Description
index
LI_ID_COL An ID that uniquely identifies each neighbourhood.
LI_NBH_NAME_COL The name of the neighbourhood. Neighbourhood names are unique.
POP_COL The total population in the neighbourhood.
LI_POP_COL The number of people in the neighbourhood with low income status.
Neighbourhood income dataset
Neighbourhood names and ids are the same between our hypertension data files and
our low income data files. However, the total population of a neighbourhood can be
different between the two data files, as they were collected at different times.

The CityData Type

The code you will write for this assignment will build and then use a dictionary that
contains hypertension and low income data about neighbourhoods in a city. This
section describes the format of that dictionary.

Key/value pairs in a CityData dictionary

Each key in a CityData dictionary is a string representing the name of a neighbourhood.

As is necessary for dictionary keys, all neighbourhood names will be unique.

The values in a CityData dictionary are dictionaries containing information about a

neighbourhood. These neighbourhood data dictionaries contain specific keys that label
a neighbourhood's data.

Format of the inner dictionaries

A dictionary that is a value in a dictionary of type CityData has the following key/value
pairs. Notice the use of constants and carefully study the starter file constants.py.

Key (Type) Value

ID (int) The id number of this neighbourhood.
(int) The total population of this neighbourhood, as given in the low income
TOTAL
data file.
(int) The number of people in this neighbourhood who are classified as low
LOW_INCOME
income.
(list[int]) A list of the hypertension data of this neighbourhood. This list
will have length exactly 6, and the values will be the numbers from
columns HT_20_44_COL, NBH_20_44_COL, HT_45_64_COL, NBH_45_64_COL, HT_65_UP_COL, and
HT
NBH_65_UP_COL stored at indices HT_20_44_IDX, NBH_20_44_IDX, HT_45_64_IDX,
NBH_45_64_IDX, HT_65_UP_IDX, and NBH_65_UP_IDX of the list, correspondingly. See
the section above on neighbourhood hypertension data files.
CityData dictionary

An example CityData dictionary

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 3/12
11/29/22, 11:12 AM CSCA08H Assignment 3

The following is an example of a CityData dictionary. We have also provided this

dictionary for you to use in your docstring examples and other testing in the starter
code file. Note that we have formatted the dictionary below for easier reading,
however you will not see this formatting in your code.
{'West Humber-Clairville': {
'id': 1,
'hypertension': [703, 13291, 3741, 9663, 3959, 5176],
'total': 33230,
'low_income': 5950},
'Mount Olive-Silverstone-Jamestown': {
'id': 2,
'hypertension': [789, 12906, 3578, 8815, 2927, 3902],
'total': 32940,
'low_income': 9690},
'Thistletown-Beaumond Heights': {
'id': 3,
'hypertension': [220, 3631, 1047, 2829, 1349, 1767],
'total': 10365,
'low_income': 2005},
'Rexdale-Kipling': {
'id': 4,
'hypertension': [201, 3669, 1134, 3229, 1393, 1854],
'total': 10540,
'low_income': 2140},
'Elms-Old Rexdale': {
'id': 5,
'hypertension': [176, 3353, 1040, 2842, 948, 1322],
'total': 9460,
'low_income': 2315}}

The sample CityData dictionary above represents hypertension and low income data for
five neighbourhoods: West Humber-Clairville, Mount Olive-Silverstone-Jamestown,
Thistletown-Beaumond Heights, Rexdale-Kipling, and Elms-Old Rexdale.

Let's take a closer look at the data for Elms-Old Rexdale. This neighbourhood is
represented by the key/value pair where the key is 'Elms-Old Rexdale'. The id of this
neighbourhood is 5. The hypertension data for this neighbourhood is as follows: 3353
people are between the ages of 20 and 44, 176 of whom have hypertension. There
are 2842 people between the ages of 45 and 64, 1040 of whom have hypertension,
and there are 1322 people aged 65 and up, 948 of whom have hypertension. The low
income data for this neighbourhood is that 2315 people are classified as low income,
from a total population of 9460 people.

Note that the totals do not match between the low income and the hypertension data
— this is because the low income data was collected before the hypertension data,
and the size of the neighbourhoods changed. For the purposes of this assignment, we
will assume the collection of these two datasets is close enough in time to compare
them to each other. You do not need to do anything about these differing totals, other
than to make sure you are using the correct total when computing rates, as described
later.

Age standardisation
This section describes the process of age standardisation that we will use in this
assignment to perform a more accurate analysis. Note that we have given you a
function that computes the age standardised rate from the raw rate
(described in Task 3). This section is for your information only; we have
already implemented this for you.

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 4/12
11/29/22, 11:12 AM CSCA08H Assignment 3

Our dataset will let us calculate the rate of hypertension in each Toronto
neighbourhood. One complicating factor is that different neighbourhoods have
different age demographics. For example, the Henry Farm neighbourhood has a
significantly lower proportion of 65+ residents than Hillcrest Village. And because
people aged 65+ have a higher overall rate of hypertension, this demographic
difference alone would cause us to expect to see a difference in the overall
hypertension between these neighbourhoods.

So because we care about the impact of low income status on hypertension rates, we
want to remove the impact of different age demographics between the
neighbourhoods. To do so, we will use a process called age standardisation to
calculate an adjusted hypertension rate that ignores differences in ages. This process
involves the following steps for each neighbourhood:

1. First, we'll calculate the hypertension rate within each of the following age
groups: 20-44, 45-64, and 65+. We'll report these rates as percentages, which
you can think of as being the number of cases of hypertension per 100 people
aged 20-44 / 45-64 / 65 and up.
2. Then, we'll pick one standard population with certain numbers of people in these
age groups. For the purpose of this assignment, we'll use the total Canadian
population from the 1991 census:
Age Group Population
20-44 11,199,830
45-64 5,365,865
65+ 3,169,970
Total (20+) 19,735,665
Population by age group
data

3. Then, we'll use the neighbourhood rates to calculate the hypothetical number of
people in the standard population who would have hypertension. For example, if
the rates for neighbourhood X were 20% of 20-44, 30% of 45-64, and 66% of
65+, the total number of people with hypertension in the standard population
would be 2,239,966 + 1,609,760 + 2,092,180 = 5,941,906.
4. Finally, divide this number of people with hypertension by the total size of the
standard population, yielding a final percentage 5,941,906 / 19,735,665 x 100 or
approximately 30%. This percentage is the age standardised rate for the
neighbourhood.

If you are interested, you can read more about age standardised rates here.

Required Functions
In the starter code file a3.py, follow the Function Design Recipe to complete the
functions described below.

You will need helper functions (i.e., functions you define yourself to be called in other
functions) for some of the required functions, but likely not for all of them. Helper
functions also require complete docstrings with doctests. We strongly recommend you
also follow any suggestions about helper functions in the table below; we give you
these hints to make your programming task easier.

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 5/12
11/29/22, 11:12 AM CSCA08H Assignment 3

Some indicators that you should consider writing a new helper function, or using
something you've already written as a helper are:

Rewriting code to solve a task you have already solved in another function
Getting a warning from the checker that your function is too long
Getting a warning from the checker that your function has too many nested
blocks or too many branches
Realising that your function can be broken down into smaller sub-problems (with
a helper function for each)

For each of the functions below, other than the file reading functions in Task 1, write
at least two examples in the docstring. You can use the provided SAMPLE_DATA
dictionary, and you should also create another small CityData dictionary for examples
and testing. If your helper function takes an open file as an argument, you do NOT
need to write any examples in that function's docstring. Otherwise, for any helper
functions you add, write at least two examples in the docstring.

Your functions should not mutate their arguments, unless the description says
that is what they do.

Assumptions

Assume the following about the data:

All neighbourhood ids and names are unique, and will appear the same in all data
files. That is, no neighbourhood will have a different id between files, or a
different name.
In all tasks except Task 1, the dictionary argument will have both hypertension
and low income data for every neighbourhood. That is, it will be a valid CityData
dictionary.
All float values should be left as is; do not round any of them.

Using Constants

The starter code contains constants in the file constants.py that you should use in your
solution for the list indices and key identifiers for the CityData dictionary as well as the
column numbers for the input files. You may add other constants if you wish, but DO
NOT place them in the file constants.py: instead put them in the a3.py file.

Task 1: Building the data dictionary

In this task, you will write functions that read in files and build the dictionary of
neighbourhood data. You will write two functions — one that adds hypertension data
to a dictionary, and one that adds low income data. You will almost certainly also need
to define one or more helper functions to help you solve this task.

These functions will be used to build a CityData dictionary, however the dictionary that
is passed to the functions may not yet contain all of the data.

To illustrate this, we have provided two small data files. After passing the same
dictionary to both functions with each of those small files, the dictionary should be a
CityData dictionary that contains the same information as the provided SAMPLE_DATA

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 6/12
11/29/22, 11:12 AM CSCA08H Assignment 3

dictionary. Using the small hypertension file and an empty dictionary as arguments to
get_hypertension_data, the result should be that the dictionary now contains the
hypertension data as in SAMPLE_DATA, but not the low income data.
{'West Humber-Clairville':
{'id': 1, 'hypertension': [703, 13291, 3741, 9663, 3959, 5176]},
'Mount Olive-Silverstone-Jamestown':
{'id': 2, 'hypertension': [789, 12906, 3578, 8815, 2927, 3902]},
'Thistletown-Beaumond Heights':
{'id': 3, 'hypertension': [220, 3631, 1047, 2829, 1349, 1767]},
'Rexdale-Kipling':
{'id': 4, 'hypertension': [201, 3669, 1134, 3229, 1393, 1854]},
'Elms-Old Rexdale':
{'id': 5, 'hypertension': [176, 3353, 1040, 2842, 948, 1322]}}

Similarly, using the small low income file and an empty dictionary as arguments to
get_low_income_data, the result should be that the dictionary now contains the low income
data as in SAMPLE_DATA, but not the hypertension data.
{'West Humber-Clairville':
{'id': 1, 'total': 33230, 'low_income': 5950},
'Mount Olive-Silverstone-Jamestown':
{'id': 2, 'total': 32940, 'low_income': 9690},
'Thistletown-Beaumond Heights':
{'id': 3, 'total': 10365, 'low_income': 2005},
'Rexdale-Kipling':
{'id': 4, 'total': 10540, 'low_income': 2140},
'Elms-Old Rexdale':
{'id': 5, 'total': 9460, 'low_income': 2315}}

A complete CityData dictionary will have been passed to both functions. See the sample
usage at the end of the starter code file for an example of how both functions are
used to build a CityData dictionary.

Note: While this is the first task, it is not necessarily the easiest. If you are stuck
while working on this task, we suggest moving on to other tasks and coming back to
this later.

Recall that TextIO as the parameter type means the file is already open.

Function name:
Full Description (paraphrase to get a proper docstring
(Parameter types)
description)
-> Return type
get_hypertension_data: The first parameter is a dictionary representing hypertension
(dict, TextIO) -> None and/or low income data for a neighbourhood and the second
parameter is a hypertension data file that is open for reading.
This function should modify the dictionary so that it contains
the hypertension data in the file.

If a neighbourhood with data in the file is already in the

dictionary then its hypertension data should be updated.
Otherwise it should be added to the dictionary with its
hypertension data.

After this function is called, the dictionary should contain

key/value pairs whose keys are the names of every
neighbourhood in the hypertension data file, and whose values
are dictionaries which contain at least the keys ID and HT for
those neighbourhoods. After both functions get_hypertension_data

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 7/12
11/29/22, 11:12 AM CSCA08H Assignment 3

and get_low_income_data are called with a dict as the first

argument, this argument will be a complete CityData type.

The first parameter is a dictionary representing hypertension

and/or low income data for a neighbourhood and the second
parameter is a low income data file that is open for reading.
This function should modify the dictionary so that it contains
the low income data in the file.

If a neighbourhood with data in the file is already in the

dictionary then its low income data should be updated.
Otherwise it should be added to the dictionary with its low
get_low_income_data: income data.
(dict, TextIO) -> None
After this function is called, the dictionary should contain
key/value pairs whose keys are the names of every
neighbourhood in the low income data file, and whose values
are dictionaries which contain at least the keys ID, TOTAL, and
LOW_INCOME for those neighbourhoods. After both functions
get_hypertension_data and get_low_income_data are called with a dict
as the first argument, this argument will be a complete CityData
type.

Functions: Task 1

Task 2: Neighbourhood-level Analysis

Function name:
Full Description (paraphrase to get a proper
(Parameter types) ->
docstring description)
Return type
The first parameter is a CityData dictionary, and the
second and third parameters are strings representing
the names of neighbourhoods. The function returns the
name of the neighbourhood that has a higher
population, according to the low income data.
get_bigger_neighbourhood:
(CityData, str, str) -> str Assume that the two neighbourhood names are
different. If a name is not in the dictionary, assume it
has a population of 0. If the two neighbourhoods are
the same size, return the first name (i.e., the leftmost
one in the parameters list, not alphabetically).

get_high_hypertension_rate: The first parameter is a CityData dictionary, and the

(CityData, float) -> second parameter is a number between 0.0 and 1.0
list[tuple[str, float]] (inclusive) that represents a threshold. This function
should return a list of tuples representing all
neighbourhoods with a hypertension rate greater than
or equal to the threshold. In each tuple, the first item is
the neighbourhood name and the second item is the
hypertension rate in that neighbourhood.

Compute the overall hypertension rate for a

neighbourhood by dividing the total number of people
with hypertension by the total number of adults in the
https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 8/12
11/29/22, 11:12 AM CSCA08H Assignment 3

neighbourhood. You may assume that no neighbourhood

has 0 population.

If this function was called with the provided SAMPLE_DATA

dictionary and a threshold of 0.3 as arguments, then the
returned value would be [('Thistletown-Beaumond Heights',
0.31797739151574084), ('Rexdale-Kipling', 0.3117001828153565)].
The order of the tuples in the list does not matter.

The parameter is a CityData dictionary. This function

should return a dictionary where the keys are the same
as in the parameter, and the values are the ratio of the
hypertension rate to the low income rate for that
neighbourhood.

For the denominators for each rate, use the total

number of people as given in the corresponding data
file. That is, for calculating the low income rate, use the
total population in the neighbourhood from the low-
income data file; and for the hypertension rate, use the
get_ht_to_low_income_ratios: sum of the total people in all three age groups in the
(CityData) -> dict[str, float] hypertension data. You may assume that no
neighbourhood has 0 population.

For example, if this function was called with the

provided SAMPLE_DATA dictionary as an argument, then the
returned dictionary should include the key/value pairs
'West Humber-Clairville': 1.6683148168616895 and 'Mount Olive-
Silverstone-Jamestown': 0.9676885451091314, as well as the
pairs for the other three neighbourhoods.

You will find that writing a helper function would be

useful here.

calculate_ht_rates_by_age_group: The first parameter is a CityData dictionary, and the

(CityData, str) -> tuple[float, second parameter represents a neighbourhood name
float, float] that is a key in the dictionary. The function returns a
tuple of three values, representing the hypertension
rate for each of the three age groups in the
neighbourhood as a percentage. (Note that this is
different from the previous two functions, where the
rate was calculated using the total numbers, and was
not expressed as a percentage.)

For example, consider the neighbourhood with the name

'Elms-Old Rexdale' in the provided SAMPLE_DATA dictionary.
The rate of hypertension for the 20-44 age group in the
neighbourhood is computed by dividing the number of
people aged 20-44 with hypertension by the total
number of people aged 20-44. For this neighbourhood,
that is 176 / 3353. To get this rate as a percentage, we
then multiply by 100, for a rate of 5.24903071875932.
The rate is calculated in the same way for the 45-64
and 65+ age groups. Thus, calling this function with the
provided SAMPLE_DATA dictionary and the string 'Elms-Old
https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 9/12
11/29/22, 11:12 AM CSCA08H Assignment 3

Rexdale' should return (5.24903071875932, 36.593947923997185,

71.70953101361573).

You may assume that no neighbourhood has a 0

population. Notice that this function is used as a helper
in the get_age_standardized_ht_rate function that we have
provided for you.

Functions: Task 2

Task 3: Finding the Correlation

Function name:
Full Description (paraphrase to get a proper docstring
(Parameter types)
description)
-> Return type
The parameter for this function is a CityData dictionary. This
function returns the correlation between age standardised
hypertension rates and low income rates across all
neighbourhoods.

To complete this function, you will need to use the correlation

function in the module statistics. Refer to the documentation
for the function to determine what arguments to pass to the
get_correlation: correlation function, and how to use its returned value. You can
(CityData) -> float find the documentation online here or by using
help(statistics.correlation). Remember that to call help on a
function from another module, you first need to import the
module.

You will need to use the provided function

get_age_standardized_ht_rate as a helper to get the age-
standardised rate for each neighbourhood.

Functions: Task 3

Task 4: Order by Ratio

Function name:
Full Description (paraphrase to get a proper docstring
(Parameter types)
description)
-> Return type
order_by_ht_rate: The parameter is a CityData dictionary. This function will return a
(CityData) -> list[str] list of the names of the neighbourhoods, ordered from lowest to
highest age-standardised hypertension rate. We use the age-
standardised rate because we are comparing across
neighbourhoods.

Assume every neighbourhood has a unique hypertension rate;

i.e., that there are no ties.

For example, if this function is called with the CityData dictionary

provided in the starter code, it will return ['Elms-Old Rexdale',
'Rexdale-Kipling', 'Thistletown-Beaumond Heights', 'West Humber-
Clairville', 'Mount Olive-Silverstone-Jamestown']

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 10/12
11/29/22, 11:12 AM CSCA08H Assignment 3

There are multiple ways to solve this problem. You may choose
to solve this problem by writing your own sorting code, but you
do not have to do this. You can also use list.sort as part of your
solution, if you choose.

Functions: Task 4

Task 5: Required Testing (unittest)

Write and submit a unittest file for the get_bigger_neighbourhood function. We have
provided starter code in the test_a3.py file. We have included one test that you can use
as a template to write your other test methods. For each test method, include a brief
docstring description specifying what is being tested. Do not write examples in the
docstrings. Your set of tests should all pass on correct code, and your tests should be
thorough enough that at least one of them will fail on a buggy version of the function.
There is no required number of tests; we will mark your tests by running them on the
correct code as well as several buggy versions.

Files to Download
Download a3.zip which contains starter code (a3.py and test_a3.py), the checker
(a3_checker.py together with the helper file checker.py and folder pyta), and two sizes of
each type of data file.

Marking
These are the aspects of your work that will be marked for Assignment 3:

Correctness (70%): Your functions should perform as specified. Correctness,

as measured by our tests, will count for the largest single portion of your marks.
Once your assignment is submitted, we will run additional tests, not provided in
the checker. Passing the checker does not mean that your code will earn full
marks for correctness.
Testing (15%): Your test suite will be checked by running it on
incorrect/broken implementations. Your tests should all pass on a correct version
of the function, and at least one should fail on each of our broken
implementations.
Coding style (15%):
Make sure that you follow Python style guidelines that we have introduced
and the Python coding conventions that we have been using throughout the
semester. Although we don't provide an exhaustive list of style rules, the
checker tests for style are complete, so if your code passes the checker,
then it will earn full marks for coding style with one exception: docstrings
may be evaluated separately. For each occurrence of a PyTA error, one mark
(out of 20) deduction will be applied. For example, if a C0301 (line-too-long)
error occurs 3 times, then 3 marks will be deducted.
If you encounter PyTA error R0915 (too-many-statements), that indicates
that your function is too long (more than 20 statements long). In that case,
introduce helper functions to do some of the work — even if the helpers will
only be called once. Your program should be broken down into functions,
both to avoid repetitive code and to make the program easier to read.
https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 11/12
11/29/22, 11:12 AM CSCA08H Assignment 3

All functions, including helper functions, should have complete docstrings

including preconditions when you think they are necessary.
Also, your variable names and names of your helper functions should be
meaningful. Your code should be as simple and clear as possible.

What to Hand In
The very last thing you do before submitting should be to run the checker
program one last time.

Otherwise, you could make a small error in your final changes before submitting that
causes your code to receive zero for correctness.

Submit a3.py and test_a3.py on MarkUs by following the instructions on the course
website. Remember that spelling of filenames, including case, counts: your file must
be named exactly as above.

https://www.utsc.utoronto.ca/~atafliovich/csca08/assignments/a3/index.html 12/12

01 - Describing and Summarizing Data
No ratings yet
01 - Describing and Summarizing Data
41 pages
Manual 1232E 34E 36E 38E
100% (2)
Manual 1232E 34E 36E 38E
152 pages
Peer Graded Assignment Data Analytics
No ratings yet
Peer Graded Assignment Data Analytics
7 pages
Top Answers To Splunk Interview Questions
100% (2)
Top Answers To Splunk Interview Questions
6 pages
TC Automation Interface
No ratings yet
TC Automation Interface
475 pages
SMDM Project
No ratings yet
SMDM Project
29 pages
Chapter 3: Review On Statisti Cs and Databases: Descriptive Statistics
No ratings yet
Chapter 3: Review On Statisti Cs and Databases: Descriptive Statistics
17 pages
Da Assignment
No ratings yet
Da Assignment
7 pages
Lecture 2
No ratings yet
Lecture 2
61 pages
Amol Borude 65
No ratings yet
Amol Borude 65
12 pages
Correspondance Analysis
No ratings yet
Correspondance Analysis
16 pages
Chapter 5 Business Intelligenceand Data Mining
No ratings yet
Chapter 5 Business Intelligenceand Data Mining
10 pages
Survey Method Assignment
No ratings yet
Survey Method Assignment
5 pages
Packages Used in This Chapter: R Studio - Descriptive Statistics
No ratings yet
Packages Used in This Chapter: R Studio - Descriptive Statistics
9 pages
Lab 1 Introduction To Data
No ratings yet
Lab 1 Introduction To Data
11 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
What Is Statistics
No ratings yet
What Is Statistics
20 pages
Descriptive Statistics Chapter 2-Part 1
No ratings yet
Descriptive Statistics Chapter 2-Part 1
5 pages
Capstone 1 Problem Statement
No ratings yet
Capstone 1 Problem Statement
18 pages
unit iii dev notes
No ratings yet
unit iii dev notes
30 pages
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
No ratings yet
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
15 pages
Project Summary Report (Draft)
No ratings yet
Project Summary Report (Draft)
4 pages
Statistics Notes
No ratings yet
Statistics Notes
46 pages
Text Mining Notes
No ratings yet
Text Mining Notes
24 pages
Mofct
No ratings yet
Mofct
20 pages
Ness Evans, Annabel-Using Basic Statistics in The Behavioral and Social Sciences-Chapter 2
No ratings yet
Ness Evans, Annabel-Using Basic Statistics in The Behavioral and Social Sciences-Chapter 2
17 pages
5.1 Visual Displays of Data
No ratings yet
5.1 Visual Displays of Data
8 pages
Unit-4 Data Exploration (E-next.in)
No ratings yet
Unit-4 Data Exploration (E-next.in)
16 pages
Appian
No ratings yet
Appian
32 pages
137725
No ratings yet
137725
15 pages
Final Shokhrukhsora Toshmukhamedova
No ratings yet
Final Shokhrukhsora Toshmukhamedova
11 pages
Subtitle Big Data Coursera 1
No ratings yet
Subtitle Big Data Coursera 1
2 pages
machine learning unit 2
No ratings yet
machine learning unit 2
9 pages
Synopsis Final PDF
No ratings yet
Synopsis Final PDF
7 pages
Documentation Variables Definition L Is
No ratings yet
Documentation Variables Definition L Is
6 pages
Business Statistics Using Excel: Understanding Standard Deviation Well Is Critical For This Course
No ratings yet
Business Statistics Using Excel: Understanding Standard Deviation Well Is Critical For This Course
13 pages
Chapter 1: Some Basic Statistical Concepts: 1. The Language of Statistics
No ratings yet
Chapter 1: Some Basic Statistical Concepts: 1. The Language of Statistics
28 pages
Chap 2 Notes
No ratings yet
Chap 2 Notes
5 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Module 1 - Basic Concepts of Statistics
No ratings yet
Module 1 - Basic Concepts of Statistics
6 pages
Edger: Differential Expression Analysis of Digital Gene Expression Data
No ratings yet
Edger: Differential Expression Analysis of Digital Gene Expression Data
69 pages
Data Sceinces
No ratings yet
Data Sceinces
15 pages
Artificial Intelligence Research: The Utility and
No ratings yet
Artificial Intelligence Research: The Utility and
9 pages
ModLec273 2
No ratings yet
ModLec273 2
84 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Analyze Descriptive Statistics Frequencies: SPSS PC Version 10: Frequency Distributions and Crosstabs
No ratings yet
Analyze Descriptive Statistics Frequencies: SPSS PC Version 10: Frequency Distributions and Crosstabs
3 pages
Assignment2 Stats
No ratings yet
Assignment2 Stats
5 pages
Geographic Data Acquisition
No ratings yet
Geographic Data Acquisition
10 pages
MI U3 Notes
No ratings yet
MI U3 Notes
11 pages
Computer Class 1_multiple regression
No ratings yet
Computer Class 1_multiple regression
24 pages
C++
No ratings yet
C++
44 pages
Making predictions
No ratings yet
Making predictions
13 pages
RSBY - User Guide 19.5.08
No ratings yet
RSBY - User Guide 19.5.08
27 pages
Data - Analysis On Rape Victims
No ratings yet
Data - Analysis On Rape Victims
11 pages
A Language For Fuzzy Statistical Database
No ratings yet
A Language For Fuzzy Statistical Database
16 pages
Descriptive Staticstics: College of Information and Computing Sciences
No ratings yet
Descriptive Staticstics: College of Information and Computing Sciences
28 pages
The Basic of Data Cleaning 1711767651
No ratings yet
The Basic of Data Cleaning 1711767651
64 pages
Prediction
100% (1)
Prediction
10 pages
Chapter - 16 Demographic Survey
No ratings yet
Chapter - 16 Demographic Survey
20 pages
What Is Statistics
No ratings yet
What Is Statistics
6 pages
Final Shokhrukhsora Toshmukhamedova
No ratings yet
Final Shokhrukhsora Toshmukhamedova
10 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
ITCY481-LAB1
No ratings yet
ITCY481-LAB1
12 pages
Introduction To Embedded System Design
No ratings yet
Introduction To Embedded System Design
22 pages
J-STD-0 1 6-1 99
No ratings yet
J-STD-0 1 6-1 99
6 pages
E-Yantra Robotics Competition E-Yantra+ Caretaker Robot Theme
No ratings yet
E-Yantra Robotics Competition E-Yantra+ Caretaker Robot Theme
7 pages
CSE-4081: Cloud Computing: Course Context and Overview
No ratings yet
CSE-4081: Cloud Computing: Course Context and Overview
2 pages
Nfa With e To Without e
No ratings yet
Nfa With e To Without e
4 pages
User Acceptance Test Plan
No ratings yet
User Acceptance Test Plan
4 pages
NLSS Showcase On 28th Feb 2012: JR Patrick
No ratings yet
NLSS Showcase On 28th Feb 2012: JR Patrick
7 pages
Utsav_New_Resume
No ratings yet
Utsav_New_Resume
2 pages
Angular JS Cheat Sheet
100% (2)
Angular JS Cheat Sheet
4 pages
Mern Stack
No ratings yet
Mern Stack
2 pages
NRSC Risat 1 Data Format 1
No ratings yet
NRSC Risat 1 Data Format 1
64 pages
HP DesignJet T530 Printer Series Datasheet
No ratings yet
HP DesignJet T530 Printer Series Datasheet
2 pages
EAR 4 5k Manual ITSv7r1
No ratings yet
EAR 4 5k Manual ITSv7r1
145 pages
Newfaq
No ratings yet
Newfaq
235 pages
Autodyn 14.0 Ws01 Filled Can Crush
No ratings yet
Autodyn 14.0 Ws01 Filled Can Crush
21 pages
Day 2 PeopleCode
100% (2)
Day 2 PeopleCode
43 pages
Google Apps Script Web Apps_ Comprehensive Guide
No ratings yet
Google Apps Script Web Apps_ Comprehensive Guide
9 pages
Acumatica Role-Based ERP Software For Distribution and Retail-Commerce
No ratings yet
Acumatica Role-Based ERP Software For Distribution and Retail-Commerce
6 pages
Virtualization For Mysql On Vmware®: Best Practices and Performance Guide
No ratings yet
Virtualization For Mysql On Vmware®: Best Practices and Performance Guide
10 pages
Immediate download Autodesk Fusion 360 Surface Design and Sculpting with T Spline Surfaces 6th Edition Sandeep Dogra ebooks 2024
100% (2)
Immediate download Autodesk Fusion 360 Surface Design and Sculpting with T Spline Surfaces 6th Edition Sandeep Dogra ebooks 2024
55 pages
Eset Era 63 Era Admin Enu
No ratings yet
Eset Era 63 Era Admin Enu
265 pages
TMS320C50 Architecture
100% (5)
TMS320C50 Architecture
2 pages
Firewall Installation, Configuration, and Management: Essentials I
No ratings yet
Firewall Installation, Configuration, and Management: Essentials I
1 page
M A C R F D TV (DTV) A O S: Odulation ND Oding Equirements OR Igital Pplications VER Atellite
No ratings yet
M A C R F D TV (DTV) A O S: Odulation ND Oding Equirements OR Igital Pplications VER Atellite
33 pages
Installation and Operation Manual Soundtheory Gullfoss 1.9: Revision 1.1 2021-04-13
No ratings yet
Installation and Operation Manual Soundtheory Gullfoss 1.9: Revision 1.1 2021-04-13
29 pages
BS Software Engineering (Evening) Final Year Projects Proposal Final List.
No ratings yet
BS Software Engineering (Evening) Final Year Projects Proposal Final List.
1 page

CSCA08H Assignment 3

Uploaded by

CSCA08H Assignment 3

Uploaded by

11/29/22, 11:12 AM CSCA08H Assignment 3

CSCA08H Assignment 3: Hypertension and Low

Goals of this Assignment

Health researchers often study the relationships between an individual's health

A note on math and stats

Neighbourhood hypertension data files

Neighbourhood income data files

The CityData Type

Key/value pairs in a CityData dictionary

Each key in a CityData dictionary is a string representing the name of a neighbourhood.

The values in a CityData dictionary are dictionaries containing information about a

Format of the inner dictionaries

Key (Type) Value

An example CityData dictionary

The following is an example of a CityData dictionary. We have also provided this

Assume the following about the data:

Task 1: Building the data dictionary

If a neighbourhood with data in the file is already in the

After this function is called, the dictionary should contain

and get_low_income_data are called with a dict as the first

The first parameter is a dictionary representing hypertension

If a neighbourhood with data in the file is already in the

Task 2: Neighbourhood-level Analysis

get_high_hypertension_rate: The first parameter is a CityData dictionary, and the

Compute the overall hypertension rate for a

neighbourhood. You may assume that no neighbourhood

If this function was called with the provided SAMPLE_DATA

The parameter is a CityData dictionary. This function

For the denominators for each rate, use the total

For example, if this function was called with the

You will find that writing a helper function would be

calculate_ht_rates_by_age_group: The first parameter is a CityData dictionary, and the

For example, consider the neighbourhood with the name

Rexdale' should return (5.24903071875932, 36.593947923997185,

You may assume that no neighbourhood has a 0

Task 3: Finding the Correlation

To complete this function, you will need to use the correlation

You will need to use the provided function

Task 4: Order by Ratio

Assume every neighbourhood has a unique hypertension rate;

For example, if this function is called with the CityData dictionary

Task 5: Required Testing (unittest)

Correctness (70%): Your functions should perform as specified. Correctness,

All functions, including helper functions, should have complete docstrings

You might also like