0% found this document useful (0 votes)
90 views

Lecture 1 - Introductory To Data Analytics

The document provides definitions and introductions related to data analytics. It defines key terms like data, quantitative and qualitative variables, datasets, databases, and analytics. It discusses the difference between data and datasets, and databases and datasets. Examples of common dataset formats and sources of datasets for machine learning research are also provided. The importance of datasets for applications like voice recognition for robots is highlighted. Finally, the document distinguishes between data analysis and data analytics, noting that analysis focuses on understanding the past while analytics focuses on understanding the past and predicting the future.

Uploaded by

Zakwan Wan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Lecture 1 - Introductory To Data Analytics

The document provides definitions and introductions related to data analytics. It defines key terms like data, quantitative and qualitative variables, datasets, databases, and analytics. It discusses the difference between data and datasets, and databases and datasets. Examples of common dataset formats and sources of datasets for machine learning research are also provided. The importance of datasets for applications like voice recognition for robots is highlighted. Finally, the document distinguishes between data analysis and data analytics, noting that analysis focuses on understanding the past while analytics focuses on understanding the past and predicting the future.

Uploaded by

Zakwan Wan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected].

my | 019-9857871

1. Introductory to Data Analytics

Before we go into details the processes involved in data analytics, we need to understand all of
terminologies and definitions of data analytics. Besides, the domains, applications, tools, as well
as the importance of data analytics will be discussed in this chapter.

1.1 Definition of Data

Source of definition: Wikipedia (https://en.m.wikipedia.org/wiki/Data)

Data is a set of values of subjects with respect to qualitative or quantitative variables.

i. Quantitative variable - are numerical variables: counts, percents, or numbers. ii. Qualitative
variable - also called a categorical variable, are variables that are not numerical. It describes
data that fits into categories. For example: (i) Eye colors (variables include: blue, green, brown,
hazel), (ii) States (variables include: Florida, New Jersey, Washington), (iii) Dog breeds
(variables include: Alaskan Malamute, German Shepherd, Siberian Husky, Shih tzu).

Figure 1: Some of the different types of data

Data and information or knowledge are often used interchangeably; however data becomes
information when it is viewed in context or in post-analysis. While the concept of data is
commonly associated with scientific research, data is collected by a huge range of
organizations and institutions, including businesses (e.g., sales data, revenue, profits,
stock price), governments (e.g., crime rates, unemployment rates, literacy rates) and non-
governmental organizations (e.g., censuses of the number of homeless people by non-profit
organizations).

Data is measured, collected and reported, and analyzed, whereupon it can be visualized using
graphs, images or other analysis tools. Data as a general concept refers to the fact that some
Lecture 1 - Introductory to Data Analytics [page 1]
Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

existing information or knowledge is represented or coded in some form suitable for better usage
or processing. Raw data ("unprocessed data") is a collection of numbers or characters
before it has been "cleaned" and corrected by researchers. Raw data needs to be
corrected to remove outliers or obvious instrument or data entry errors (e.g., a
thermometer reading from an outdoor Arctic location recording a tropical temperature).
Data processing commonly occurs by stages, and the "processed data" from one stage may be
considered the "raw data" of the next stage. Field data is raw data that is collected in an
uncontrolled "in situ" environment. Experimental data is data that is generated within the context
of a scientific investigation by observation and recording. Data has been described as the new
oil of the digital economy.

1.2 Other Definition of Data

i. Data in Computing Field


Examples: Character, string, integer, float, double etc.

ii. Dataset
A data set (or dataset) is a collection of data. In the case of tabular data, a data set
corresponds to one or more database tables, where every column of a table represents
a particular variable, and each row corresponds to a given record of the data set in
question. The data set lists values for each of the variables, such as height and weight
of an object, for each member of the data set. Each value is known as a datum. Data
sets can also consist of a collection of documents or files
[https://en.wikipedia.org/wiki/Data_set].

iii. Database
A database is an organized collection of data, generally stored and accessed
electronically from a computer system. Where databases are more complex they
are often developed using formal design and modeling techniques. The database
management system (DBMS) is the software that interacts with end users, applications,
and the database itself to capture and analyze the data. The DBMS software additionally
encompasses the core facilities provided to administer the database. The sum total of
the database, the DBMS and the associated applications can be referred to as a
"database system". Often the term "database" is also used to loosely refer to any of the
DBMS, the database system or an application associated with the database. Computer
scientists may classify database-management systems according to the database model
that they support. Relational databases became dominant in the 1980s. These model
data as rows and columns in a series of tables, and the vast majority use SQL for writing
and querying data. In the 2000s, non-relational databases became popular, referred to
as NoSQL because they use different query languages [
https://en.wikipedia.org/wiki/Database].

Data versus Dataset: From my opinion, data is purely raw information. Meanwhile dataset is an
established (processed) data which undergo some standard operating procedure (SOP) to get
a good set of data. For example, in image processing field, for some application, images data
Lecture 1 - Introductory to Data Analytics [page 2]
Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

normally taken with same size of image, in control environment (no lighting effects), etc. If you
take data by yourself, please make sure you follow SOP for collecting data based on your
field/area. Just google it you can find many SOPs that already developed by previous
researchers.

Database versus Dataset: From my experience, database (e.g. cloud) is larger than dataset. It
may consist of several dataset (many tables). It also uses programming statement and proper
design and modelling techniques in order to organize data in well manner.

Figure 2: Database versus dataset illustration


Source: (https://csharp-station.com/Tutorial/AdoDotNet/Lesson05)

1.3 Dataset Sample and Its Importance

You can find many established datasets available over the internet. Example as in
https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research. You can try to
download some of the data from the lists given. Notice that there are several formats of data
such as text, sound (wav, mp3, etc.), image, ARFF, CSV, video, etc.

Why you need data? What will you do with those data? What kind of data you need? Imagine
like this, let’s say you want to make a robot that can follow some instructions (e.g., turn left or
right), what will you need? First of all you need voice data. Why? In order to make a robot that
understands human speaking, we need speech data as speech recognition dictionary. The data
can be in English, Malay, Arabic, Mandarin, etc. From the speech/audio signal that capture from
microphone (robot ear), how robot can understand what is being told to them? You can get the
answer from data analytics knowledge. In this case, audio data will be processed by speech
analysis/voice recognition/speech recognition (example of applications)
methods/models/algorithms - this will be discussed and learned further in this course,
Lecture 1 - Introductory to Data Analytics [page 3]
Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

and finally the robot understand the spoken instruction which is in signal form to the language
that understand by robot (text, etc.).

Figure 3: Voice Recognition Systems Source:


(https://searchcustomerexperience.techtarget.com/definition/voice-recognition-speaker-recogni
tion)

Figure 4: Basic Voice Recognition Mechanisms


Source: (https://www.advanced-media.co.jp/english/aboutus/amivoice)

1.4 Definition of Analytics

Source of definition: Lexico powered by Oxford (https://www.lexico.com/en/definition/analytics)

The systematic computational analysis of data or statistics.

Source of definition: Wikipedia (https://en.wikipedia.org/wiki/Analytics)


Lecture 1 - Introductory to Data Analytics [page 4]
Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

Analytics is the discovery, interpretation, and communication of meaningful patterns in


data. It also entails applying data patterns towards effective decision making. In other
words, analytics can be understood as the connective tissue between data and effective
decision making within an organization. Especially valuable in areas rich with recorded
information, analytics relies on the simultaneous application of statistics, computer programming
and operations research to quantify performance.

Based on these two separate definitions (data and analytics), so what is data analytics? Give
your opinion and arguments.

Before you go further, I would like to explain what is data analysis versus data analytics? When
you play around with data analytics, you will get confused with these two terms.

Analysis is focused on understanding the past; what happened and why it happened.
Analytics focuses on why it happened and what will happen next
[https://en.wikipedia.org/wiki/Analytics].

To understand this, I would like to use Iris flower dataset and you can find the information of this
data from https://en.wikipedia.org/wiki/Iris_flower_data_set.

Let’s say I want to do data analysis from this data. What it is all about?

Recap the definition of data analysis. Analysis is to understand the past. From the data, I would
like to know the average, min and max, kurtosis, skewness etc.of sepal length, sepal width, petal
length, petal width for each species. Then I calculate all of the wanted information and from this
information I can conclude the size of petal and sepal for each species. This is called data
analysis. Vice versa to data analytics, let’s say I have an unlabelled data, in which class (species)
the data is belong to? This is called prediction, estimation, recognition.

1.5 Data Analytics Domain and Its Application

Organizations may apply analytics to business data to describe, predict, and improve business
performance. Specifically, areas within analytics include predictive analytics, prescriptive
analytics, enterprise decision management, descriptive analytics, cognitive analytics,
Big Data Analytics, retail analytics, supply chain analytics, store assortment and stock-
keeping unit optimization, marketing optimization and marketing mix modeling, web
analytics, call analytics, speech analytics, sales force sizing and optimization, price and
promotion modeling, predictive science, credit risk analysis, and fraud analytics. Since
analytics can require extensive computation (see big data), the algorithms and software used
for analytics harness the most current methods in computer science, statistics, and mathematics
[https://en.wikipedia.org/wiki/Analytics].

Lecture 1 - Introductory to Data Analytics [page 5]


Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

Simple exercise: In a group of student, identify the domain of data analytics, application, how it’s
work and impact of the developed system to mankind/society. You can refer to Table 1 as an
example. Write at least 10 examples and two examples for each domain. You can also add your
fyp project in the lists.

Table 1: Example of data analytics domain and its application


No. Application How it’s work Impact to mankind/society
Problem
domain

1. Transportation Automated the process


Videos from CCTV are
Traffic monitoring
analyze to count the vehicle

Images of vehicle’s plate are Help people to find misplaced


Parking storage
captured vehicle in car park

2. Industrial Fruit sorting


Images of fruit are taken from Fasten the sorting process
automation
conveyor with less human error

PCB Images of PCB are taken and Fasten the inspection with less
inspection being processed human error

Provide information from all Information is easy to gain and


3. Internet Search Google Search
over the world to the user access.
Updating on student’s
Carrier Help in determining a suitable
4. Education progress based on strengths,
Prediction career for students
weakness and interest.
Ease the work of nurse and
Help to monitor patient’s
5. Healthcare Health tracker doctors to treat patient as they
health and condition
are.
Faster delivery, can monitor
Delivery Package Installed GPS to the package
6. the package and find the best
Logistics Transport to keep tracking.
route to ship or time to deliver.
Price Comparing the prices of the Easy to make choices with the
7. Websites
Comparison product from multiple vendors. data given.
Predict the flight delay or to Can manage the time of the
8. Airline Route Planning make a stop between departures and arriving of the
destination. planes.
Help in manage the traffic on
9. Safety Traffic light Control bad traffic on the road
the road
Detect fraud credit card,
10. Banking Finance Manage resource wisely
insurance and accounting

1.6 Other Terminologies

Lecture 1 - Introductory to Data Analytics [page 6]


Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

When you go into deep into this field, you will find many other terminologies that makes you
become more confused, crazy. Don’t be crazy because you are important in this life to build a
better nation as well as for your family, society, etc.

Some terminologies that overlapped with data analytics field are such as data science, data
mining, pattern recognition, machine learning, artificial intelligence, deep learning,
computer vision, image processing, etc. You can google it yourself in order to understand
each of terminology.

So what are the differences between them ?


My summarization from the definition of each term:
● data analytics = data science = data mining = machine learning = pattern
recognition (my aim/scope for this course)
● artificial intelligence is an extension of machine learning involved natural language
processing (speech recognition), computer vision, deep learning (artificial human brain)
- this will be covered in Artificial Intelligence class which is another elective subject for
BFM program. Other class is Digital Signal Processing which is purposely to
handle/process signal data (continuous data/sampling data).
● deep learning is also extension of machine learning
● computer vision involves in high-level understanding from digital images or videos ●
image processing performs operation of digital images

To understand this, I will give you some examples:

(i) Speech Recognition - artificial intelligence (keyword natural language processing)

(ii) Plant Species Recognition - pattern recognition

Lecture 1 - Introductory to Data Analytics [page 7]


Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

(iii) Automatic identify false positive and false negative RFID reading (keyword signal processing)

(iv) Human Detection based on RCNN - deep learning (keyword neural network)

Lecture 1 - Introductory to Data Analytics [page 8]


Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

Data Analytics scope:


i. Data in form of quantitative (no image, no video) with no continuous
data (no signal data).
ii. Learning type must excluded neural network process. iii. No
involvement of natural processing language.

Based on understanding of all of these terminologies, map the suitable terminology for each of
the examples that you listed in Table 1.

1.7 Data Analytics Tools

There are many tools that are available for free to do data analytics tasks. From my google
search, here are some of lists available:

i. https://bigdata-madesimple.com/top-30-big-data-tools-data-analysis/
ii. https://www.octoparse.com/blog/top-30-big-data-tools-for-data-analysis iii.
https://financesonline.com/data-analytics/

You can google yourself if you want. There are tonnes of software available for different
purposes with pros and cons. For this class, I will focus on Weka

Lecture 1 - Introductory to Data Analytics [page 9]


Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

(https://www.cs.waikato.ac.nz/ml/weka/) and Orange ( https://orange.biolab.si /) tools.


Therefore, you need to install these software inside your PC. They are open software and
free and ready to use.

So why do we need to learn data analytics if there are so many tools available ?

1.8 Why Need Data Analytics and Tools?

i. Understanding of data analytics algorithms and models are expensive and time
consuming. You can become as data analysts if you have the knowledge. It is among
trending work nowadays.
ii. Everything needs to be automated. To automated things, you need to understand data
analytics.
iii. tonnes of algorithms and models need to be verified.
iv. different dataset (scenario, case study, etc.)
v. why need data analytics tools? → important to data visualization with standard and
establish algorithms

Lecture 1 - Introductory to Data Analytics [page 10]


Data Analytics (BFM4633) | Dr. Ahmad Fakhri Ab.Nasir | [email protected] | 019-9857871

Lecture 1 - Introductory to Data Analytics [page 11]

You might also like