PRML Assignment1
PRML Assignment1
Students can apply for an extension to the submission due date for an assessment item
through extenuating, evidenced circumstances (specific details are found through the
Assessment Policy and Procedures. Section 9.12). Extensions must be applied for before
the due date. Documentary evidence (e.g., medical certificate) will be expected for an
extension to be granted, however this will not guarantee that the application will be
successful. The Unit Convener or relevant Discipline Convener will decide whether to
grant an extension and the length of the extension.
If a student chooses to submit his/her assignment via the Internet off the campus, it is
the responsibility of the student to guarantee the accessibility of the Internet. Not being
able to access to the Internet at a location which is outside of the campus is not an
excuse for extension.
This assessment is a group task and only one submission per group is allowed. Ensure
all resources used are appropriately referenced in a referencing style of your choice;
but you need to maintain consistency.
Please submit your draft report to URKUND Student Text-matching Checker for a
similarity/plagiarism check before you submit your final report.
1
Overview
In this assignment you will be working with real-world data and problems. You will be
required to analyse a climate/weather data discussed on page 4. You are expected to
perform five of the six steps in the data mining process on your chosen dataset. These five
steps are:
• Business understanding
• Data understanding
• Data preparation
• Modelling
• Evaluation
Image: http://www.proglobalbusinesssolutions.com/six-steps-in-crisp-dm-the-standard-data-mining-process
As it is the case for most real-world data mining projects, each of the datasets requires
some pre-processing to get the data into a form suitable for mining.
2
Problem formulation
In this assignment, you are going to work on applying data mining to solve a real-world
business problem. This involves identifying a business in Australia that is impacted by
rainfall. The impact of rainfall on a business could be positive or negative.
• If the impact of rainfall on the business is positive, it creates an opportunity. The aim
is to make recommendations that help the business capitalise on the opportunity
created by rainfall.
• If the impact of rainfall on the business if negative it poses a risk, and the business
may incur loss. The aim is to make recommendations that help the business
minimise the loss incurred due to rainfall.
The first step will be to identify a business that is likely to be impacted (positively or
negatively) by rainfall and obtain data on their business operations such as revenue over a
period of time. Once the revenue (or information that will help us quantify the impact of
rainfall on the business) information is obtained, the weather data for the business location
(closest) can be downloaded from the Bureau of Meteorology website to explore if there is
correlation between revenue of the business and rainfall.
If a correlation (positive or negative) exists between business revenue and rainfall, data
mining may be used to build predictive models to forecast rainfall and make
recommendations to improve its business processes.
You are also allowed to perform all tasks using R or Python if you are proficient in using
those programming languages. You will need to submit your fully commented scripts
along with your final report.
3
Deliverables
The deliverable for this assignment will be a report with associated datasets in csv file
format and fully commented scripts (R, Python), if any. The report will be in the style of a
professional document of no more than 12 pages1 single-spaced A4, Times New Roman
(12), including relevant graphs and output to support your reporting. A suggested
breakdown of the 12 pages is given in the sections below. However, you may apply a
different breakdown as needed. The report will detail your methods and the results of your
modelling. The report will contain the following sections:
1
Prior approval from the convenor is required to exceed the page limit. Exceeding page limit without
approval may result in a penalty of up to 5%.
4
Datasets
Weather dataset
This data can be retrieved online for a geographical area of your choice, from
http://www.bom.gov.au/climate/data/index.shtml?bookmark=200
Download each of the last 14 months of data (you may have to download these in 14
separate files).
Once the data has been downloaded, create a single spreadsheet by combining the 14
separate files you have downloaded. You will then need to use Excel to add a derived target
variable – RainTomorrow (Hint: its values may be determined by applying a threshold on the
Rainfall variable).
This historic dataset can be used to build a predictive model to predict whether it will rain
tomorrow. Decision tree is one modelling technique that may be appropriate for this
problem; you will need to choose and apply at least two different (three for PG students)
modelling techniques.
5
Marking rubric for the report
Marking Rubric
Pass (50% – 64%) Credit (65% – 74%) Distinction (75% – 84%) HD (85% – 100%)
Little context provided, Context is provided, but Provides (in a few Provides (in a few sentences)
or context unclear or links to the sentences) the context of the context of problem being
confusing. goal/hypothesis unclear. problem being explored. explored. Clearly identifies
why the reader should care
about the problem.
6
incomplete. mentioned but unclear or of experiment. experiment.
confusing.
Background information Background information Background information Background information
includes unrelated included but not sufficient provides context for giving context to goal
information and/or is to support experimental question. included.
insufficient to support goal/hypothesis.
goal/hypothesis.
Little organisation of Organised from broad to Organised from broad to Organised from broad to
information leading into specific with respect to specific with respect to the specific with respect to the
the goal/hypothesis. the topic. topic. topic.
Provides a general Provides enough Provides enough detail for Provides enough detail for
description of the information for someone someone to repeat the someone to repeat the
experiment with little to infer how to do the experiment from the experiment from the
detail, making it difficult experiment, but details instructions. instructions.
for a reader to repeat may be unclear.
the experiment.
Inclusion of too much Inclusion of some results. Written in narrative Does not include
result. (paragraph) form. unnecessary detail - written
to an audience familiar with
the topic.
7
Not written in narrative Not written in narrative Written in narrative
(paragraph) form (paragraph) form (paragraph) form.
Results Basic Competent Proficient Advanced
Results incompletely Results are presented in Results are presented in Results are presented in
presented in narrative narrative form. narrative form. narrative form.
form.
Tables and figures Tables and figures cited, Tables and figures are cited Tables and figures are cited
present, but not but not necessarily in appropriately and appropriately and numbered
referenced or cited correct order. numbered by the order by the order addressed in the
properly. addressed in the text. text.
Data mixed with much Data mostly free of Data is free of Apparent trends in data are
interpretation or interpretation or interpretation or identified.
conclusions. conclusions. conclusions.
8
Poses further questions to
continue research or address
limitations with current
project.
Conclusion Basic Competent Proficient Advanced
Results summarised. Results summarised. Results summarised. Results summarised.
Conclusion not made Conclusion relates to the Conclusion is clearly Conclusion is succinct and
with respect to goal/hypothesis but is related to the clearly answers the
goal/hypothesis. lengthy or unclear. goal/hypothesis. goal/hypothesis.
May be listed in correct May be listed in correct Only those references cited Only references cited in the
format as per APA. format as per APA. in the text included in text included in reference
reference list list.
No references other No references other than Additional references used. References other than the
than the textbook used. the textbook used. textbook used.
9
Format and Some/all of narrative All narrative written in All narrative written in past All narrative written in past
Style written in present or past tense. tense. tense.
future tense.
Frequent use of 1st Some inconsistencies Consistent use of 1st or 3rd Limited use of 1st person.
person context. between 1st and 3rd person.
person context.
Many grammar and Several grammar and Only a few grammar and Minimal to no errors in
spelling errors present. spelling errors present. spelling errors. grammar and spelling.
10