Group Assignment

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 4

CT099-3-3 BDA Group Assignment

Page 1 of 4

Knowledge Discovery and Big Data Analytics Assignment


Big data applications are developed and being explored by the computer science organization,
which is classified and accepted by huge data sets collected from sensor networks, online
networks, medical agencies, etc. To deal with the difficulty in analysis of data, we conduct
research on the novel algorithms for data mining and knowledge discovery through network
property.

For the assignment, you are asked to explore the application of big data analytics techniques to
the data problem of your choice. You can choose to study one particular data problem, giving
special consideration to the unique properties of the problem domain, and testing one or more
methods on it.

Learning Outcomes

Course Learning Outcome 3: Demonstrate the analytical and visualization methods for
effective storytelling in a given business case.

On conclusion students should be able to: Explain and implement the concept along with
methods for knowledge discovery. Select, analyse and evaluate the most suitable data mining
methods for solving specific problems. Demonstrate the analytical and visualization methods for
effective storytelling in each business case selected by team or assigned by lecturer.

Assessment

The total assessment mark of this group case study is 40%, with 50% of the total contributed by
an individual component and remaining a group marks. Marking criteria is attached on this
assignment.

Groups

Your class will be divided into groups. Each group will contain maximum of 4 members only.

Minimum report requirement

Business goal & Objective


Once you have obtained the datasets for analysis, you and your group members have to specify
what is the ultimate purpose of mining this data? For example, seeking patterns in your data to
help you retain good customers, you might build one model to predict customer profitability and
a second model to identify customers likely to leave.

Big Data Analytics Lifecycle & Methodologies


You have to adopt one of the knowledge discovery process methodologies like SEMMA,
CRISP-DM, FAYYD’S KDD…etc, to guide you through the project process. It is very important
to explain and justify the methodology that has been chosen.

Level 3 Asia Pacific University of Technology and Innovation 10/2022


CT099-3-3 BDA Group Assignment
Page 2 of 4

Dataset Preparation
To go through data selection, cleaning, formatting and exploring. The goal of exploring is to
identify the most important fields in predicting an outcome, and determine which derived values
may be useful.

Type of Prediction & Modelling Techniques


The next step is deciding on the type of prediction that’s most appropriate: (1) classification:
predicting into what category or class a case falls, or (2) regression: predicting what number
value a variable will have (if it’s a variable that varies with time, it’s called time series
prediction). In the example above, you might use regression to forecast the amount of
profitability, and classification to predict which customers might leave.
Now you can choose the model type: a neural net to perform the regression, perhaps, and a
decision tree for the classification. There are also traditional statistical models to choose from
such as logistic regression, discriminant analysis, or general linear models. The most important
thing is to choose the model type that meets your requirements.

Algorithms & Model Validation


Many algorithms are available to build your models. You might build the neural net using back
propagation or radial basis functions. For the decision tree, you might choose among CART,
C5.0, Quest, or CHAID.
After building a model, you must evaluate its results and interpret their significance. Remember
that the accuracy rate found during testing applies only to the data on which the model was built.
You need to use a software tool of your choice to complete the modelling.

Analysis & Recommendations


Once a data mining model is built and validated the results must be analyzed to recommend
actions based on it. Discussions on social impacts and ethical issues are encouraged if it is
relevant to the solution. Analysis and comparison of results between group member,
justifications on the findings would be the best to conclude the assignment.

Getting datasets

Every project must involve at least one dataset. The data set should be unbiased and the minimal
size of data is required to fulfil your assignment objective. There are many interesting and freely
available datasets that you can find in the internet especially on social networking datasets,
airline data, weather forecasting and much more.

Example of Open Datasets:

1. https://www.kdnuggets.com/datasets/government-local-public.html

2. https://github.com/awesomedata/awesome-public-datasets

3. AWS Open Datasets : https://registry.opendata.aws/

Level 3 Asia Pacific University of Technology and Innovation 10/2022


CT099-3-3 BDA Group Assignment
Page 3 of 4

4. University of Edinburgh Research datasets :


http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html

5. http://www.rdatamining.com/resources/data

Data mining Software packages:

You can implement your project using one of the following data mining software packages:

a) Rapid Miner
b) WEKA http://www.cs.waikato.ac.nz/ml/weka/.

c) R – rattle http://rattle.togaware.com/.

d) IBM Watsons & spss

e) Fusionex Tool : GIANT

f) Microsoft

g) SAS: SAS on Demand/ SAS VIYA

h) SPOTFIRE

Deliverables

Group final report


One final report is expected from each group, with length of report approximately 5000
words. Ensure all chapters are clearly documented. Group contributions and individual
components must be documented. Turnitin similarity report should be maintained below
15%.

Source code should be submitted together with documentation digital submission. All links
for submission will be created by class lecturer using Moodle.

Each student is required to present their assignment model (scope – data pre processing –
model developed- analysis – interpretation) during group presentation. Presentation schedule
would be announced by lecturer in class.

Note: If unable to form a group due to insufficient student numbers or other approved reasons by module
lecturer, marking criteria above will be considered 100% as Individual component (all criteria marked as
individual component)

PERFORMANCE CRITERIA

Level 3 Asia Pacific University of Technology and Innovation 10/2022


CT099-3-3 BDA Group Assignment
Page 4 of 4

Distinction (75% and above)

This grade will be assigned to work which is considered to be of very high standard and which
meets above 75% of the basic requirements listed above. The mapping between methodology
steps should be excellent. All deliverables should be coherent with detailed descriptions. Overall
documentation standards should be of excellent quality. Accurate, relevant and up-to-date
referencing is visible. In order to obtain a grade at this level, the group should be able to address
all issues with regards to the module.

Credit (65% – 74%)

This grade will be assigned to work which is considered to be of high standard and which meets
at least 65% of the basic requirements listed above. The mapping between methodology steps
should be good. To obtain this grade, the assignment should show all techniques applied but may
contain some errors. All deliverables should be coherent with detailed descriptions. Overall
documentation should be of excellent quality. In order to obtain a grade at this level, the group
should be able to address most issues with regards to the module. Accurate, relevant and up-to-
date referencing is visible.

Pass (50% - 64%)

This grade will be assigned to work which is considered to be of average standard and which
meets at least 50% of the basic requirements listed above. The mapping between methodology
steps should be good. The documentation should be of adequate standard in terms of language,
layout and flow. Some accurate, relevant and up-to-date referencing is visible. The group has an
adequate level of professionalism and project knowledge.

Fail (Below 50%)


Work at this level will generally be of low standard where it may even fail to meet at least 50%
of the basic requirements listed above. There is little or no evidence of mapping between
methodology steps. The documentation is of poor standard in terms of language, layout and flow.
Minimal or no referencing was done. The group has a poor level of professionalism and project
knowledge. Evaluation of individual’s contribution is poor.

Level 3 Asia Pacific University of Technology and Innovation 10/2022

You might also like