Applied Statistics & Data Visualisation Assignment Brief 2024-25
Applied Statistics & Data Visualisation Assignment Brief 2024-25
Applied Statistics & Data Visualisation Assignment Brief 2024-25
Level 7
Weighting within This assessment is worth 100% of the overall module mark.
module
Kaveh Kiani, email: [email protected]
Module
Nathan Topping, email: [email protected]
Leader/Assessment
set by
Friday 6th December at 4pm
Submission deadline
date and time
The submission deadline is 6th December 2024 by no later
than 16:00.
1. Only a single pdf report that includes all tasks. Name the file
as "your name.pdf" (for example, "John Smith.pdf").
2. The zip file should contain material that are clearly labelled
and fully working versions of R codes and Power BI dashboard
should be included with a clearly written description of each
application and its use in a “Read Me.txt” file. Your dashboard
should be shared as a .pbix file.
3. The zip file is valid and openable.
# Dataset Scenario
1 Global You work for an NGO which is exploring global population
Population dynamics. You have been provided with a dataset of some key
Estimates & demographic indicators, with estimates provided for the years
Projections 1960-2022 and projections for the years 2023-2050. They would
like you to explore the data and develop an interactive single-
screen dashboard to allow policymakers and the public to better
understand the data relating to population trends over this
timeframe. The dashboard should present aggregated country-
group level data, based on regional and/or income groupings (a
separate Country Groupings file is provided to help with this). You
have been given the freedom to focus your dashboard on a specific
aspect of the data if you wish (e.g., trends in urban vs rural
populations).
2 Global You work for a thinktank exploring economic development globally.
Economic You have been provided with data taken from the International
Outlook Monetary Fund’s (IMF) Global Economic Outlook which contains a
range of indicators across the timeframe 2001-2020. They would
like you to explore the data and develop an interactive single-
screen dashboard to allow policymakers and the public to build an
understanding of country and/or country-group level economic
performance. There are a total of 44 indicators, and you have been
asked to select a sub-set of these indicators to present, based on
your own data exploration. You can choose either to present a
dashboard which is focussed on showing economic performance
trends over time or a dashboard designed to show an annual
snapshot of economic performance (with the flexibility for the user
to select the year).
3 Global Trends in You work for a health charity which focuses on issues relating to
HIV global health, and specifically HIV. You have been provided with a
dataset of some indicators from the World Health Organisation’s
(WHO) Global Health Observatory (GHO) relating to the
prevalence and incidence of HIV across the timeframe 2000-2022.
The indicators are disaggregated based on sex and age groupings
(either 4-level or 2-level groupings). You should note that some
values are missing because not all indicators are disaggregated
across the same dimensions (e.g., “People living with HIV who are
on antiretroviral therapy (%)” is disaggregated by 2-level age
Assessment Information/Brief 2024/25
2
groupings only, so is blank for rows relating to sex or 4-level age
groupings.) They would like you to explore the data and develop an
interactive single-screen dashboard to allow policymakers and the
public to build an understanding of country and country-group level
trends in HIV over 2000-2022. A separate Country Groupings file
is provided to help with grouping countries by income and/or region.
For this task, you should imagine you are a Data Analyst who has been asked to create a
single-screen interactive dashboard which is suitable for the audience and purpose
outlined in your chosen scenario. The scenarios above are quite broad, and as the Data
Analyst you have been tasked with exploring the data, and if necessary, defining more
specific objectives relating to your scenario. You should create this dashboard in Power BI
and you will need to submit your .pbix file.
Accompanying your dashboard, you should submit a 2,000-word written report which:
Higher scoring solutions will be those that have fully explained and justified the
visualisations used and the overall dashboard solution, using the theory and principles
covered in the lectures and recommended reading. Use of more advanced features and
DAX will also contribute to higher scores if used appropriately and if this is fully
documented in the report.
# Dataset Scenario
1 Concrete You work for a construction company. They are testing a range of
Compressive concrete mixes and want to better understand how compressive
Strength strength relates to the composition of the concrete.
2 Energy You work for a low carbon consultancy which advises on energy
Efficiency efficient building design. They want to understand how the heating
and cooling load on a building vary based on its design. They have
generated a dataset of simulated buildings with various layouts,
orientations, and glazing areas. They want to better understand
how heating and cooling load relates to the building design.
3 Wine Quality You work for a wine supplier. For a random sample of wines, they
have conducted a number of physiochemical tests and have also
collected quality data based on the assessment of wine tasting
experts. They want to understand if the physiochemical attributes
of the wine can be used to predict wine quality. They also want to
understand whether quality and physiochemical attributes are
different for red and white varieties of wine.
For this task, you should imagine you are a consultant hired to conduct statistical analysis
for your chosen scenario. You are tasked with completing your statistical analysis in R and
providing a full written report of your findings.
• Include initial Exploratory Data Analysis (EDA), for example, calculation of relevant
descriptive statistics and visualisations to help you explore and understand the data.
• Conduct appropriate correlation analysis for the variables and evaluate the results in
the context of your chosen scenario.
• Formulate regression problem(s) relating to your chosen scenario and application of
appropriate regression techniques on the dataset.
• Formulate hypotheses relating to your chosen scenario and use appropriate tests to
test them.
You will submit both your R scripts used to conduct this analysis and a 2,500-word report
on your analysis. Your report should:
• Briefly introduce your understanding of the chosen scenario and your task.
• Present all steps of your statistical analysis. This should include screenshots of both
your code and your output as well as any data visualisations. You should also explain
why you have chosen to use the techniques you have chosen (for example, why the
particular regression techniques you are using are suited to the task).
• Evidence that you have properly tested any assumptions relating to the statistical
techniques you have used.
Important Note: Please ensure that the hypothesis tests you define are different from those
used to check model assumptions, such as tests for normality, heteroscedasticity,
stationarity in time series, etc. The tests for model assumptions are evaluated separately
under the modelling task and will not count towards the marks for the hypothesis testing
section.
# Dataset Scenario
1 Share You work for a financial investment first. They want to use time series
Prices modelling to forecast share price movements for a number of companies.
For this task you should select one company from the data provided. You
can choose either the open, close, high, or low values of the share prices
for your time series modelling.
2 UK Vital You work for a government statistics agency in the UK. They want to use
Statistics time series modelling to forecast numbers of births, deaths, marriages,
and divorces in both England & Wales and for the UK as a whole. You
should select one of these time series for this task.
For this task, you should imagine you are a Data Scientist who has been asked to conduct
time series modelling for your chosen scenario.
You will submit both your R scripts used to conduct this analysis and a 1,500-word report
on your analysis. Your report should:
• Briefly introduce your understanding of the chosen scenario and your task
• Provide initial exploration of the data, e.g., visualisation of the time series data,
decomposition into trend and seasonal components, etc.
• Present all steps of your time series modelling. This should include screenshots of
both your code and your output as well as any data visualisations. You should also
explain why you have chosen to use the techniques you have chosen.
• Evidence that you have properly tested any assumptions relating to the time series
modelling techniques you have used.
• Interpret and explain the output generated from your analysis.
• Finally, you should briefly conclude by summarising the main findings from your time
series analysis, including a comparison of the models and a recommendation on
which is better suited to the data.
Skill I U A D
Communication X
Critical Thinking and X
Problem Solving
Data Literacy X
Digital Literacy X
Industry Awareness X
Innovation and
Creativity
Proactive Leadership
Reflection and Life-
Long Learning
Self-management and X
Organisation
Team Working
You are also reminded that you will need to provide citations if
you reference published research, and AI tools cannot be
relied on to provide accurate citations. We would also
emphasise that poor use of AI tools (e.g., copying and pasting
output from ChatGPT that you don’t understand and without
editing) can be clearly identified and will result in a low-quality
submission which will score poorly.
The total word count is 6,000 words across the three tasks.
Word count/ duration
(if applicable)
askUS Services
The University offers a range of support services for students
through askUS including Disability and Inclusion Service,
Wellbeing and Counselling Services.