Applied Statistics & Data Visualisation Assignment Brief 2024-25

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Assessment Brief 2024-25

Module title Applied Statistics & Data Visualisation

CRN 61049 / 61050

Level 7

Data Visualisation, Statistical Analysis & Time Series


Assessment title
Modelling

Weighting within This assessment is worth 100% of the overall module mark.
module
Kaveh Kiani, email: [email protected]
Module
Nathan Topping, email: [email protected]
Leader/Assessment
set by
Friday 6th December at 4pm
Submission deadline
date and time
The submission deadline is 6th December 2024 by no later
than 16:00.

For coursework assessments only: students with a


Reasonable Adjustment Plan (RAP) or Carer Support Plan
should check your plan to see if an extension to this
submission date has been agreed.
Your assessment should be submitted through blackboard and
How to submit
should be separated into two formats. First, a single pdf report
of 6,000 words and second, a zipped file containing the R codes
and Power BI dashboard. Please check that the report file and
zip file are:

1. Only a single pdf report that includes all tasks. Name the file
as "your name.pdf" (for example, "John Smith.pdf").
2. The zip file should contain material that are clearly labelled
and fully working versions of R codes and Power BI dashboard
should be included with a clearly written description of each
application and its use in a “Read Me.txt” file. Your dashboard
should be shared as a .pbix file.
3. The zip file is valid and openable.

Assessment Information/Brief 2024/25


1
To complete this assessment, you should complete all three tasks outlined below. For
each task, you are provided with a choice of scenarios, and you should select only one of
these scenarios for each task. You will then complete the task for your chosen scenario,
using the appropriate dataset shared on Blackboard. All datasets can be found in the ‘Data
Repository’ folder within the assessment folder. You must use the dataset provided and
using the wrong dataset for the task or using data sourced elsewhere will result in
zero marks for the task.

Task 1: Interactive Dashboard Design (35 marks)

For Task 1, you should select one of the following scenarios:

# Dataset Scenario
1 Global You work for an NGO which is exploring global population
Population dynamics. You have been provided with a dataset of some key
Estimates & demographic indicators, with estimates provided for the years
Projections 1960-2022 and projections for the years 2023-2050. They would
like you to explore the data and develop an interactive single-
screen dashboard to allow policymakers and the public to better
understand the data relating to population trends over this
timeframe. The dashboard should present aggregated country-
group level data, based on regional and/or income groupings (a
separate Country Groupings file is provided to help with this). You
have been given the freedom to focus your dashboard on a specific
aspect of the data if you wish (e.g., trends in urban vs rural
populations).
2 Global You work for a thinktank exploring economic development globally.
Economic You have been provided with data taken from the International
Outlook Monetary Fund’s (IMF) Global Economic Outlook which contains a
range of indicators across the timeframe 2001-2020. They would
like you to explore the data and develop an interactive single-
screen dashboard to allow policymakers and the public to build an
understanding of country and/or country-group level economic
performance. There are a total of 44 indicators, and you have been
asked to select a sub-set of these indicators to present, based on
your own data exploration. You can choose either to present a
dashboard which is focussed on showing economic performance
trends over time or a dashboard designed to show an annual
snapshot of economic performance (with the flexibility for the user
to select the year).
3 Global Trends in You work for a health charity which focuses on issues relating to
HIV global health, and specifically HIV. You have been provided with a
dataset of some indicators from the World Health Organisation’s
(WHO) Global Health Observatory (GHO) relating to the
prevalence and incidence of HIV across the timeframe 2000-2022.
The indicators are disaggregated based on sex and age groupings
(either 4-level or 2-level groupings). You should note that some
values are missing because not all indicators are disaggregated
across the same dimensions (e.g., “People living with HIV who are
on antiretroviral therapy (%)” is disaggregated by 2-level age
Assessment Information/Brief 2024/25
2
groupings only, so is blank for rows relating to sex or 4-level age
groupings.) They would like you to explore the data and develop an
interactive single-screen dashboard to allow policymakers and the
public to build an understanding of country and country-group level
trends in HIV over 2000-2022. A separate Country Groupings file
is provided to help with grouping countries by income and/or region.

For this task, you should imagine you are a Data Analyst who has been asked to create a
single-screen interactive dashboard which is suitable for the audience and purpose
outlined in your chosen scenario. The scenarios above are quite broad, and as the Data
Analyst you have been tasked with exploring the data, and if necessary, defining more
specific objectives relating to your scenario. You should create this dashboard in Power BI
and you will need to submit your .pbix file.

Accompanying your dashboard, you should submit a 2,000-word written report which:

• Briefly summarises your chosen scenario as you understand it.


• Any findings of note from your initial exploration of the data and any more specific
objectives you have defined for your dashboard (remember, these must still relate to
your overall scenario.)
• Explains and justifies your choice of data visualisations with reference to theory, and
why these are appropriate selections given the objectives of your dashboard.
• Explains and justifies your overall dashboard layout, formatting, and composition,
with reference to the best practices and theory covered in the module.
• Provides a step-by-step overview of how the dashboard was built, including details
of any advanced features used (hierarchies, grouping, binning, forecasting etc) as
well as any DAX formulas used to create calculated fields or measures. In this section
you must include screenshots to help support your explanation
• You must include a screenshot of your final dashboard solution in your report.
• Finally, you should conclude by providing a brief critical evaluation of your dashboard
solution – i.e., what does it do successfully and what are its limitations?

Higher scoring solutions will be those that have fully explained and justified the
visualisations used and the overall dashboard solution, using the theory and principles
covered in the lectures and recommended reading. Use of more advanced features and
DAX will also contribute to higher scores if used appropriately and if this is fully
documented in the report.

Assessment Information/Brief 2024/25


3
Task 2: Statistical Analysis (45 marks)

For Task 2, you should select one of the following scenarios:

# Dataset Scenario
1 Concrete You work for a construction company. They are testing a range of
Compressive concrete mixes and want to better understand how compressive
Strength strength relates to the composition of the concrete.
2 Energy You work for a low carbon consultancy which advises on energy
Efficiency efficient building design. They want to understand how the heating
and cooling load on a building vary based on its design. They have
generated a dataset of simulated buildings with various layouts,
orientations, and glazing areas. They want to better understand
how heating and cooling load relates to the building design.
3 Wine Quality You work for a wine supplier. For a random sample of wines, they
have conducted a number of physiochemical tests and have also
collected quality data based on the assessment of wine tasting
experts. They want to understand if the physiochemical attributes
of the wine can be used to predict wine quality. They also want to
understand whether quality and physiochemical attributes are
different for red and white varieties of wine.

For this task, you should imagine you are a consultant hired to conduct statistical analysis
for your chosen scenario. You are tasked with completing your statistical analysis in R and
providing a full written report of your findings.

Your analysis should:

• Include initial Exploratory Data Analysis (EDA), for example, calculation of relevant
descriptive statistics and visualisations to help you explore and understand the data.
• Conduct appropriate correlation analysis for the variables and evaluate the results in
the context of your chosen scenario.
• Formulate regression problem(s) relating to your chosen scenario and application of
appropriate regression techniques on the dataset.
• Formulate hypotheses relating to your chosen scenario and use appropriate tests to
test them.

You will submit both your R scripts used to conduct this analysis and a 2,500-word report
on your analysis. Your report should:

• Briefly introduce your understanding of the chosen scenario and your task.
• Present all steps of your statistical analysis. This should include screenshots of both
your code and your output as well as any data visualisations. You should also explain
why you have chosen to use the techniques you have chosen (for example, why the
particular regression techniques you are using are suited to the task).
• Evidence that you have properly tested any assumptions relating to the statistical
techniques you have used.

Assessment Information/Brief 2024/25


4
• Interpret and explain the output generated from your analysis. As a statistical
consultant, interpreting and explaining your results fully is an essential part of
the brief.
• Finally, you should briefly conclude by summarising the main findings from your
analysis.

Important Note: Please ensure that the hypothesis tests you define are different from those
used to check model assumptions, such as tests for normality, heteroscedasticity,
stationarity in time series, etc. The tests for model assumptions are evaluated separately
under the modelling task and will not count towards the marks for the hypothesis testing
section.

Assessment Information/Brief 2024/25


5
Task 3: Time Series Modelling (20 marks)

For Task 3, you should select one of the following scenarios:

# Dataset Scenario
1 Share You work for a financial investment first. They want to use time series
Prices modelling to forecast share price movements for a number of companies.
For this task you should select one company from the data provided. You
can choose either the open, close, high, or low values of the share prices
for your time series modelling.
2 UK Vital You work for a government statistics agency in the UK. They want to use
Statistics time series modelling to forecast numbers of births, deaths, marriages,
and divorces in both England & Wales and for the UK as a whole. You
should select one of these time series for this task.

For this task, you should imagine you are a Data Scientist who has been asked to conduct
time series modelling for your chosen scenario.
You will submit both your R scripts used to conduct this analysis and a 1,500-word report
on your analysis. Your report should:

• Briefly introduce your understanding of the chosen scenario and your task
• Provide initial exploration of the data, e.g., visualisation of the time series data,
decomposition into trend and seasonal components, etc.
• Present all steps of your time series modelling. This should include screenshots of
both your code and your output as well as any data visualisations. You should also
explain why you have chosen to use the techniques you have chosen.
• Evidence that you have properly tested any assumptions relating to the time series
modelling techniques you have used.
• Interpret and explain the output generated from your analysis.
• Finally, you should briefly conclude by summarising the main findings from your time
series analysis, including a comparison of the models and a recommendation on
which is better suited to the data.

Assessment Information/Brief 2024/25


6
You have been provided with a rubric as an accompanying
Assessment Criteria
document which details the criteria which will be used for
marking and feedback.

You should look at the assessment criteria to find out what we


are specifically looking at during the assessment.

Knowledge and Assessed intended learning outcomes


Understanding On successful completion of this assessment, you will be able
to:
Practical,
Professional or
1. Design effective data visualisations, taking into
Subject Specific
consideration theories of visual perception
Skills
2. Understand the key statistical concepts which underpin data
science, including sampling, regression, time series and
hypothesis testing
3. Critically assess the relative strengths and uses of range of
data analysis techniques (including t-tests, ANOVA, linear
regression, multiple regression models, logistic regression,
time series models and categorical data analysis)
4. Assess statistical assumptions and chi-square statistics to
detect associations among categorical variables
5. Apply statistical data analysis using R
6. Use Power BI to design dashboards, reports, and data
visualisations
You will develop a range of employability skills sought by
Employability Skills
employers through each assessment.
developed /
demonstrated
Through this assessment will have an opportunity to develop
and demonstrate the following employability skills:

Skill I U A D
Communication X
Critical Thinking and X
Problem Solving
Data Literacy X
Digital Literacy X
Industry Awareness X
Innovation and
Creativity
Proactive Leadership
Reflection and Life-
Long Learning
Self-management and X
Organisation
Team Working

Assessment Information/Brief 2024/25


7
I = You will have been introduced to this skill

U = You will have developed an understanding of this skill in


the context of your subject

A = You will be able to apply this skill in the context of your


subject

D = You will have demonstrated an enhanced understanding


and application of this skill in a wider context

You may use AI tools to support you in completing the tasks


Using Artificial
and drafting this submission. However, if you do you should
Intelligence (AI)
include the following declaration at the beginning of your
Tools
submission, so we know how these tools have been used:

During the preparation and writing of this submission I used


[NAME TOOL / SERVICE] in Section [SECTION(s) TITLE] in
order to [REASON]. After using this tool/service, I reviewed
and edited the content as needed and take full responsibility
for the content of the submission content.

You are also reminded that you will need to provide citations if
you reference published research, and AI tools cannot be
relied on to provide accurate citations. We would also
emphasise that poor use of AI tools (e.g., copying and pasting
output from ChatGPT that you don’t understand and without
editing) can be clearly identified and will result in a low-quality
submission which will score poorly.

The total word count is 6,000 words across the three tasks.
Word count/ duration
(if applicable)

You can request the feedback after releasing the assessment


Feedback
marks
arrangements

Students are expected to learn and demonstrate skills


Academic Integrity
associated with good academic conduct (academic integrity).
and Referencing
Good academic conduct includes the use of clear and correct
referencing of source materials. Here is a link to where you
can find out more about the skills which students need:
Academic integrity & referencing
Referencing

Academic Misconduct is an action which may give you an


unfair advantage in your academic work. This includes
plagiarism, asking someone else to write your assessment
Assessment Information/Brief 2024/25
8
for you or taking notes into an exam. The University takes
all forms of academic misconduct seriously.
Support for this Assessment
Assessment
You can obtain support for this assessment by contacting one
Information and
of the module leaders.
Support
You can find more information about understanding your
assessment brief and assessment tips for success here.

Assessment Rules and Processes


You can find information about assessment rules and
processes in the Assessment Support module in Blackboard.

Develop your Academic and Digital Skills


Find resources to help you develop your skills here.

Concerns about Studies or Progress


If you have any concerns about your studies, contact your
Academic Progress Review Tutor/Personal Tutor or your
Student Progression Administrator (SPA).

askUS Services
The University offers a range of support services for students
through askUS including Disability and Inclusion Service,
Wellbeing and Counselling Services.

Personal Mitigating Circumstances (PMCs)


If personal mitigating circumstances (e.g. illness or other
personal circumstances) may have affected your ability to
complete this assessment, you can find more information
about the Personal Mitigating Circumstances Procedure here.
Independent advice is available from the Students’ Union
Advice Centre about this process:
https://www.salfordstudents.com/advice/centre
If you fail your assessment, and are eligible for reassessment,
Reassessment
you will be able to find the date for resubmission on your
module site in Blackboard.

For students with accepted personal mitigating circumstances


for absence/non submission, this will be your replacement
assessment attempt.

Your reassessment task will be the same as this assessment


brief.

We know that having to undergo a reassessment can be


challenging however support is available. Have a look at all
the sources of support outlined earlier in this brief and refer to
the Personal Effectiveness resources.

Assessment Information/Brief 2024/25


9

You might also like