0% found this document useful (0 votes)
45 views7 pages

Universiti Teknologi Mara Test: Confidential 1 CS/FEB 2022/UCS551

Uploaded by

Hakim Razak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

Universiti Teknologi Mara Test: Confidential 1 CS/FEB 2022/UCS551

Uploaded by

Hakim Razak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

CONFIDENTIAL 1 CS/FEB 2022/UCS551

UNIVERSITI TEKNOLOGI MARA TEST

COURSE : INTRODUCTION TO DATA ANALYTICS AND


APPLICATION
COURSE CODE : UCS551
EXAMINATION : FEB 2022
TIME : 3 HOURS
DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO DO SO

This examination paper consists of 4 printed pages

© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL

CONFIDENTIAL 2 CS/FEB 2022/UCS551

NAME: AHMAD HAKIM BIN ABDUL RAZAK


CLASS: LG2414A
ID NO: 2020959863

QUESTION 1

1. Briefly describe the term data analytics.

Data analytics is a process of inspecting, cleansing, transforming and modelling data


with the goal of discovering useful information, suggesting conclutions and
supporting decision-making.

(4 marks)

2. Explain FOUR (4) types of data analytics

i. descriptive analytics describes what has happened over a given period of time. Have
the number of views gone up? Are sales stronger this month than last?
ii. diagnostic analytics focuses more on why something happened. This involves more
diverse data inputs and a bit of hypothesizing. Did weather affect beer sales? Did the
latest marketing campaign impact sales?
iii. predictive analytics moves to what is likely going to happen in the near term. What
happened to sales last time we had a hot summer? How many weather models predict a
hot summer this year?
iv. prescriptive analytics moves into the territory of suggesting a course of action. If the
likelihood of a hot summer as measured as an average of these five wheather models is
above 58% then we should add an evening shift to the brewery and rent an additional
tank to increase output.

(8 marks)

3. List FOUR (4) types of data and provide ONE(1) example for each.
i. structure and unstructured: email, documents, images
ii. data structure: vector, array, matrix
iii. level of measurement: nominal, ordinal, interval
iv. univariate data: height

(8 marks)

4. Explain the difference between vector and array.

Vector is a collection of values that all have the same data type, in one-dimensional
array while array is a colloction of elements of the same type placed in contiguous
memory locations that can be individually referenced by using an index to a unique
indentifier.

(4 marks)

5. Describe FOUR (4) data processing techniques that can be used in processing the raw
data

i. Data cleaning. Data cleaning is the process where data gets cleaned. Data in the real
world is normally incomplete, noisy and inconsistent. The data available in data
sources might be lacking attributes values, data of interest etc. Data cleaning involves
number of techniques including filling in the missing values manually, combined
computer and human inspection etc. The output of data cleaning process is adequately
cleaned data.
ii. Data Transformation. Data transformation is the process of transforming and
consolidating the data into different forms that suitable for mining. Data transformation
normally involves normalization, aggregation, generalization etc. After data
transformation, the available data is ready for data mining.
iii. Data Sampling. Data sampling is a statistical analysis technique used to select,
manipulate and analyze a representative subset of data points to identify patterns and
trends in the larger data set being examined. It enables data scientists, preditive
modelers and other data analyst to work with a small, manageable amount of data about
a statistical population to build and run analytical models more quickly, while still
producing accurate findings.
iv. Data Sub-setting and manipulating. Subsetting is the process of retrieving just the
parts of large files which are of interest for a specific purpose. This occurs usually in a
client – server setting, where the extraction of the parts of interest occurs on the server
before the data is sent to the client over a network. The main purpose of subsetting is to
save bandwidth on the network and storage space on the client computer.
(8 marks)

6. Explain how to get the median for odd and even dataset.

Given a set of data, arrange the numbers in ascending order from smallest to largest. If
the number of observations is odd, the number in the middle of the list is the median.
This can be found by taking the value of the (n+1)/2 -th term, where n is the number
of observations. Else, If the number of observations is even, then the median is the
simple average of the middle two numbers. In calculation, the median is the simple
average of the n/2 -th and the (n/2+1)-th terms.

(4 marks)

7. Briefly explain the importance of histogram in data visualization.

A histogram provides a visual representation of the distribution of a dataset: location,


spread and skewness of the data. It also helps to visualize whether the distribution is
symmetric or skewed left or right. In addition, if it is unimodal, bimodal or
multimodal, it can also show any outliers or gaps in the data. Histograms also can
display a large amount of data and the frequency. The function will calculate and
return a frequency distribution. We can use it to get the frequency if values in a
dataset.

(4 marks)

8. List 4 types of AI application and give one example for each type.

i. Government- Public safety and utilities have a particular need for machine learning
since they have multiple sources of data that can be mined for insights.
ii. Financial Services- Banks and other business in the financial industry use machine
learning technology to identify important insights in data, and prevent fraud.
iii. HealthCare- wearable devices and sensors that can use data to assess the patient’s
health in real time.
iv. Oil and Gas- finding new energy sources. Analyzing minerals in the ground.
Predicting refinery sensor failure. Streamlining oil distribution to make it more
efficient and cost effective.

(12 marks)

9. Explain the concept of learning in machine learning.


Learning is one of the fundamental building block of AI solutions. Learning is a
process that improves the knowledge of an AI program by making observations about
its environment. AI learning process focused on processing a collection of input-
output pairs for specific function and predicts the output for new input.

(6 marks)

10. Briefly explain two differences between supervised learning and unsupervised
learning.

The first difference is supervised learning is a process of adjusting weights in a


neural net using learning algorithm while unsupervised learning produce the output
based of input data without labelled responses.
The second difference is supervised learning is designed to perform pattern
classification while unsupervised learning type uses clutter analysis which is used for
exploratory data analysis to find hidden patterns or grouping data.

(8 marks)

11. Describe how the classification task can be performed using a significant example for
this task.

This operator should be used for performance evaluation of only classification tasks.
Many other performance evaluation operators are also available in RapidMiner or
Performance operator, Performance (Binominal Classification) operator, Performance
(Regression) operator. The Performance (Classification) operator is used with
classification tasks only. On the other hand, the Performance operator automatically
determines the learning task type and calculates the most common criteria for that
type. You can use the Performance (User-Based) operator if you want to write your
own performance measure.

Classification is a technique used to predict group membership for data


instances. For example, you may wish to use classification to predict whether the train
on a particular day will be 'on time', 'late' or 'very late'. Predicting whether a number of
people on a particular event would be 'below- average', 'average' or 'above-average' is
another example. For evaluating the statistical performance of a classification model
the data set should be labeled i.e. it should have an attribute with label role and an
attribute with prediction role. The label attribute stores the actual observed values
whereas the prediction attribute stores the values of label predicted by the
classification model under discussion.
(10 marks)
12. Differentiate classification and clustering. ( Give TWO(2) differences)

Classification
i. The number of classes is known.
ii. Popular algorithms for classification include Naïve Bayes Classifier, Decision
Trees and Random Forests.
Clustering
i. The number of classes is unknown.
ii. Popular algorithms used for clustering include K-Means, Mean-Shift
Clustering, and Density-Based Spatial Clustering of Applications with Noise.

(8 marks)

13. Discuss how data analytics can be benefited to these areas:

a. Business
Analyzing data is broadly available at lower cost points. Data analytics can be
beneficial to business areas in order to use it in new levels, using information
technology to shore accurate, stable business experimentation that direct
decision makers and to examine outputs, business models, and regeneration in
customer experience sometimes. Finance establishments are strong
experimenters as well as principal ones who keep amend its methods for
segment credit card customers. Companies in various sectors have acquired
crucial insight from the structured data collected from different enterprise
systems and anatomized by commercial database management systems.

b. Medical
Data analytics in medical organizations can be beneficial to the community.
One of the benefits is that the disease can be detected at an early stage through
the analysis of such huge information and proper care and treatment can be
provided immediately in an effective way to an individual. Data analytics can
provide various measures to be taken to save expenditure in healthcare by the
people and to lead a healthy life by taking initial care through predictable
information. Other areas in which data analytics give enhanced profit are
identifying the patients who use maximum health resources and are at the
greatest risk for adverse outcomes.
(16 marks)

END OF QUESTION PAPER


© Hak Cipta Universiti Teknologi MARA CONFIDENTIAL

You might also like