Universiti Teknologi Mara Test: Confidential 1 CS/FEB 2022/UCS551
Universiti Teknologi Mara Test: Confidential 1 CS/FEB 2022/UCS551
QUESTION 1
(4 marks)
i. descriptive analytics describes what has happened over a given period of time. Have
the number of views gone up? Are sales stronger this month than last?
ii. diagnostic analytics focuses more on why something happened. This involves more
diverse data inputs and a bit of hypothesizing. Did weather affect beer sales? Did the
latest marketing campaign impact sales?
iii. predictive analytics moves to what is likely going to happen in the near term. What
happened to sales last time we had a hot summer? How many weather models predict a
hot summer this year?
iv. prescriptive analytics moves into the territory of suggesting a course of action. If the
likelihood of a hot summer as measured as an average of these five wheather models is
above 58% then we should add an evening shift to the brewery and rent an additional
tank to increase output.
(8 marks)
3. List FOUR (4) types of data and provide ONE(1) example for each.
i. structure and unstructured: email, documents, images
ii. data structure: vector, array, matrix
iii. level of measurement: nominal, ordinal, interval
iv. univariate data: height
(8 marks)
Vector is a collection of values that all have the same data type, in one-dimensional
array while array is a colloction of elements of the same type placed in contiguous
memory locations that can be individually referenced by using an index to a unique
indentifier.
(4 marks)
5. Describe FOUR (4) data processing techniques that can be used in processing the raw
data
i. Data cleaning. Data cleaning is the process where data gets cleaned. Data in the real
world is normally incomplete, noisy and inconsistent. The data available in data
sources might be lacking attributes values, data of interest etc. Data cleaning involves
number of techniques including filling in the missing values manually, combined
computer and human inspection etc. The output of data cleaning process is adequately
cleaned data.
ii. Data Transformation. Data transformation is the process of transforming and
consolidating the data into different forms that suitable for mining. Data transformation
normally involves normalization, aggregation, generalization etc. After data
transformation, the available data is ready for data mining.
iii. Data Sampling. Data sampling is a statistical analysis technique used to select,
manipulate and analyze a representative subset of data points to identify patterns and
trends in the larger data set being examined. It enables data scientists, preditive
modelers and other data analyst to work with a small, manageable amount of data about
a statistical population to build and run analytical models more quickly, while still
producing accurate findings.
iv. Data Sub-setting and manipulating. Subsetting is the process of retrieving just the
parts of large files which are of interest for a specific purpose. This occurs usually in a
client – server setting, where the extraction of the parts of interest occurs on the server
before the data is sent to the client over a network. The main purpose of subsetting is to
save bandwidth on the network and storage space on the client computer.
(8 marks)
6. Explain how to get the median for odd and even dataset.
Given a set of data, arrange the numbers in ascending order from smallest to largest. If
the number of observations is odd, the number in the middle of the list is the median.
This can be found by taking the value of the (n+1)/2 -th term, where n is the number
of observations. Else, If the number of observations is even, then the median is the
simple average of the middle two numbers. In calculation, the median is the simple
average of the n/2 -th and the (n/2+1)-th terms.
(4 marks)
(4 marks)
8. List 4 types of AI application and give one example for each type.
i. Government- Public safety and utilities have a particular need for machine learning
since they have multiple sources of data that can be mined for insights.
ii. Financial Services- Banks and other business in the financial industry use machine
learning technology to identify important insights in data, and prevent fraud.
iii. HealthCare- wearable devices and sensors that can use data to assess the patient’s
health in real time.
iv. Oil and Gas- finding new energy sources. Analyzing minerals in the ground.
Predicting refinery sensor failure. Streamlining oil distribution to make it more
efficient and cost effective.
(12 marks)
(6 marks)
10. Briefly explain two differences between supervised learning and unsupervised
learning.
(8 marks)
11. Describe how the classification task can be performed using a significant example for
this task.
This operator should be used for performance evaluation of only classification tasks.
Many other performance evaluation operators are also available in RapidMiner or
Performance operator, Performance (Binominal Classification) operator, Performance
(Regression) operator. The Performance (Classification) operator is used with
classification tasks only. On the other hand, the Performance operator automatically
determines the learning task type and calculates the most common criteria for that
type. You can use the Performance (User-Based) operator if you want to write your
own performance measure.
Classification
i. The number of classes is known.
ii. Popular algorithms for classification include Naïve Bayes Classifier, Decision
Trees and Random Forests.
Clustering
i. The number of classes is unknown.
ii. Popular algorithms used for clustering include K-Means, Mean-Shift
Clustering, and Density-Based Spatial Clustering of Applications with Noise.
(8 marks)
a. Business
Analyzing data is broadly available at lower cost points. Data analytics can be
beneficial to business areas in order to use it in new levels, using information
technology to shore accurate, stable business experimentation that direct
decision makers and to examine outputs, business models, and regeneration in
customer experience sometimes. Finance establishments are strong
experimenters as well as principal ones who keep amend its methods for
segment credit card customers. Companies in various sectors have acquired
crucial insight from the structured data collected from different enterprise
systems and anatomized by commercial database management systems.
b. Medical
Data analytics in medical organizations can be beneficial to the community.
One of the benefits is that the disease can be detected at an early stage through
the analysis of such huge information and proper care and treatment can be
provided immediately in an effective way to an individual. Data analytics can
provide various measures to be taken to save expenditure in healthcare by the
people and to lead a healthy life by taking initial care through predictable
information. Other areas in which data analytics give enhanced profit are
identifying the patients who use maximum health resources and are at the
greatest risk for adverse outcomes.
(16 marks)