0% found this document useful (0 votes)
376 views14 pages

Module 5

"Living in the IT Era" refers to the current period in which Information Technology (IT) plays a central role in our daily lives. It encompasses the widespread use of computers, the internet, digital devices, and data-driven technologies that have transformed various aspects of society, including communication, work, education, and entertainment. In the IT era, people rely on technology for information access, connectivity, and productivity, shaping the way we interact with the world and each ot

Uploaded by

shin rose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
376 views14 pages

Module 5

"Living in the IT Era" refers to the current period in which Information Technology (IT) plays a central role in our daily lives. It encompasses the widespread use of computers, the internet, digital devices, and data-driven technologies that have transformed various aspects of society, including communication, work, education, and entertainment. In the IT era, people rely on technology for information access, connectivity, and productivity, shaping the way we interact with the world and each ot

Uploaded by

shin rose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

LITE MATTERS 2023

LESSON 4.1

BIG DATA ANALYTICS

Data is the fingerprint of creation; and Analytics is the new "Queen of Sciences." Hardly
any human activity, business decision, strategy, or physical entity does not produce data or
involve data analytics to inform it. As a result, data analytics has become core to our endeavors,
from business to medicine, research, management, product development, and all life facets.

From a business perspective, data is now viewed as the new gold—and data analytics,
the machinery that mines, molds, and mints it. Data analytics is a set of computer-enabled
analytics methods, processes, and disciplines of extracting and transforming raw data into
meaningful insight, discovery, and knowledge that helps make more effective decisions.
Another definition describes it as the discipline of extracting and analyzing data to deliver new
insights about past performance and current operations and predict future events.

Data analytics is gaining significant prominence not just for improving business
outcomes or operational processes; it certainly is the new tool to improve quality, reduce costs
and improve customer satisfaction. But it's fast becoming necessary for operational,
administrative, and even legal reasons.

Since then, data analytics has come a long way and is gaining popularity thanks to the
eruption of five new SMAC technologies: social media, mobility, analytics, and cloud
computing. You might add another for sensors and the internet of things (IoT). Each
technology is significant in transforming the business and the data they generate.

What is Data?

Data (plural, data: datum), as defined by Merriam-Webster Dictionary, refers to the


following: factual information (such as measurements or statistics) used as a basis for
reasoning, discussion, or calculation, information in digital form that can be transmitted or
processed, or information output by a sensing device or organ that includes both useful and
irrelevant or redundant information and must be processed to be meaningful.

In computing, data is information translated into a form efficient for movement or


processing. Relative to today's computers and transmission media, data is information
converted into binary digital form. It is acceptable for data to be used as a singular subject or
a plural subject. Raw data is a term used to describe data in its most basic digital format.

The concept of data in the context of computing has its roots in the work of Claude
Shannon, an American mathematician known as the father of information theory. He ushered
in binary digital concepts by applying two-value Boolean logic to electronic circuits. Binary
digit formats underlie the CPUs, semiconductor memories, disk drives, and many peripheral

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 1


LITE MATTERS 2023

devices standard in computing today. Early computer input for control and data took punch
cards, magnetic tape, and hard disk.

Data's importance in business computing became apparent early on by the popularity


of the terms "data processing" and "electronic data processing," which, for a time, came to
encompass the whole gamut of what is now known as information technology. Over the history
of corporate computing, specialization occurred, and a distinct data profession emerged along
with the growth of corporate data processing.

Recall: How is data stored?

Computers represent data, including video, images, sounds, and text, as binary values
using patterns of just two numbers: 1 and 0. A bit is the smallest data unit and represents just
a single value. Usually, storage and memory are measured in megabytes and gigabytes.

The units of data measurement continue to grow as the amount of data collected and
stored grows. For example, the relatively new term "brontobyte" is data storage equal to 10 to
the 27th power of bytes.

Data can be stored in file formats, as in mainframe systems using ISAM and VSAM.
Other file formats for data storage, conversion, and processing include comma-separated
values. These formats continued to find uses across various machine types, even as more
structured-data-oriented approaches gained footing in corporate computing.

Greater specialization developed as databases, database management systems, and


relational database technology arose to organize information.

UNIT VALUE
1 byte 8 bits (binary digits)
1 kilobyte 1024 bytes
1 megabyte 1024 kilobytes
1 gigabyte 1024 megabytes
1 terabyte 1024 gigabytes
1 petabyte 1024 terabytes
1 exabyte 1024 petabytes
1 zettabyte 1024 exabytes
1 yottabyte 1024 zettabytes
1 brontobyte 1024 yottabytes
Table 1 – Common Data Storage Measurements

What is Analytics?

Analytics is a broad term that encompasses the processes, technologies, frameworks,


and algorithms to extract meaningful insights from data. Raw data does not have a meaning
until it is contextualized and processed into useful information. Analytics is the process of
extracting and creating information from raw data by filtering, processing, categorizing,

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 2


LITE MATTERS 2023

condensing, and contextualizing the data. This information obtained is then organized and
structured to infer knowledge about the system and/or its users, its environment, and its
operations and progress towards its objectives, thus making the systems more intelligent and
efficient.

The analytics goals of the application drive the choice of technologies, algorithms, and
frameworks for analytics. The objectives of the analytics task may be either of the following:

▪ to predict something (for instance, whether a transaction is a fraud or not, whether it


will rain on a particular day, or whether a tumor is benign or malignant),
▪ to find patterns in the data (for example, finding the top 10 coldest days in the year,
finding which pages are visited the most on a particular website, or finding the most
searched celebrity in a specific year),
▪ finding relationships in the data (for example, finding similar news articles, finding
similar patients in an electronic health record system, finding related products on an
eCommerce website, finding identical images, or finding the correlation between news
items and stock prices).

Types of Data Analytics

There are four different types of analytics. Here, we start with the simplest one and go
further to the more sophisticated types. As it happens, the more complex an analysis is, the
more value it brings.

1. Descriptive Analytics

Descriptive analytics comprises analyzing past data to present it in a summarized form


that can be easily interpreted. Descriptive analytics aims to answer the question - What
has happened? A significant portion of analytics done today is descriptive analytics
through statistics functions such as counts, maximum, minimum, mean, top-N, and

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 3


LITE MATTERS 2023

percentage. These statistics help describe patterns in the data and present the data in a
summarized form. Thus, descriptive analytics helps summarize the data.

Examples are the following:

▪ computing the total number of likes for a particular post


▪ calculating the average monthly rainfall
▪ finding the average number of visitors per month to a website
2. Diagnostic Analytics

Diagnostic analytics comprises analysis of past data to diagnose the reasons why
certain events happened. Thus, diagnostic analytics aims to answer the question - Why did
it happen?

Let us consider an example of a system that collects and analyzes sensor data from
machines for monitoring their health and predicting failures. While descriptive analytics
can help summarize the data by computing various statistics, diagnostic analytics can
provide more insights into why certain faults have occurred based on the patterns in the
sensor data for previous defects.

3. Predictive Analytics

Predictive analytics comprises predicting the occurrence of an event or the likely


outcome of an event or forecasting future values using prediction models. Predictive
analytics aims to answer the question - What is likely to happen?

The examples where predictive analytics can be used include the following

▪ predicting when a fault will occur in a machine


▪ predicting whether a tumor is benign or malignant
▪ predicting the occurrence of natural emergencies (events such as forest fires or
river floods)
▪ forecasting the pollution levels.

Predictive analytics is done using predictive models, which are trained by existing data.
These models learn patterns and trends from the existing data and predict the occurrence
of an event or the likely outcome of an event (classification models) or forecast numbers
(regression models). The accuracy of prediction models depends on the quality and
volume of the existing data available for training the models, such that all the patterns and
trends in the existing data can be learned accurately. Before a model is used for prediction,
it must be validated with existing data. The typical approach adopted while developing
prediction models is to divide the existing data into training and test data sets (for example,
75% of the data is used for training, and 25% is used for testing the prediction model).

4. Prescriptive Analytics

While predictive analytics uses prediction models to predict the likely outcome of an
event, prescriptive analytics uses multiple prediction models to predict various outcomes

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 4


LITE MATTERS 2023

and the best course of action for each outcome. Prescriptive analytics aims to answer the
question - What can we do to make it happen?

Prescriptive analytics can predict possible outcomes based on the current choice of
actions. Therefore, we can consider prescriptive analytics as a type of analytics that uses
different prediction models for different inputs. Prescriptive analytics prescribes actions or
the best option from the available options.

The examples that illustrate the uses of predictive analytics are the following:

▪ prescribe the best medicine for treating a patient based on the outcomes of various
medications for similar patients
▪ to suggest the best mobile data plan for a customer based on the customer's
browsing patterns.

What is Big Data?

Big data is a collection of datasets whose volume, velocity, or variety is so large that
storing, managing, processing, and analyzing the data using traditional databases and data
processing tools is complex. In recent years, there has been an exponential growth in
structured and unstructured data generated by information technology, industrial, healthcare,
the Internet of Things, and other systems.

According to an estimate by IBM, 2.5 quintillion bytes of data are created every day.
The estimated volume of data created worldwide in 2022, according to Statista, is 97
zettabytes, compared to the 79 zettabytes of data generated in 2021. In 2025 the amount
generated in 2021 is expected to double. Of all of the data in the world at the
moment, approximately 90% of it is replicated, with only 10% being genuine, new data.

Based on a report by DOMO, these things happen on the web in just 60 seconds to
see the volume and speed at which we create data online.

• 5.9 million Google searches happen.


• Instagram users share 66,000 photos.
• Facebook users post 1.7 million pieces of content.
• People send 231.4 million emails.
• YouTubers upload 500 hours of videos.
• Snapchat users send 4.3 million snaps.
• Twitter users write 347,200 tweets.
• People send 16 million texts.
• Venmo users transfer $437,600.
• Amazon shoppers spend $443,000.

Big Data can power the next generation of smart applications that will leverage the power
of the data to make the applications intelligent. Big data applications span a wide range of web,
retail and marketing, banking and financial, industrial, healthcare, environmental, Internet of
Things, and cyber-physical systems.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 5


LITE MATTERS 2023

Big data analytics involves collecting, storing, processing, and analyzing this massive-scale
data. Specialized tools and frameworks are required for big data analysis when:

1. the volume of data involved is so large that it is difficult to store, process, and analyze
data on a single machine,
2. the velocity of data is very high, and the data needs to be analyzed in real-time,
3. there is a variety of data involved, which can be structured, unstructured, or semi-
structured, and is collected from multiple data sources,
4. various types of analytics need to be performed to extract value from the data, such as
descriptive, diagnostic, predictive, and prescriptive analytics.

Big data tools and frameworks have distributed and parallel processing architectures and
can leverage the storage and computational resources of a large cluster of machines. Some
examples of big data are listed as follows:

▪ Data generated by social networks, including text, images, audio, and video data
▪ Clickstream data generated by web applications such as e-Commerce to analyze user
behavior
▪ Machine sensor data collected from sensors embedded in industrial and energy
systems for monitoring their health and detecting failures
▪ Healthcare data collected in electronic health record (EHR) systems
▪ Logs generated by web applications
▪ Stock markets data
▪ Transactional data generated by banking and financial applications

Types of Big Data

Big data can come in multiple forms, including structured and non-structured data such
as financial, text, multimedia, and genetic mappings. Contrary to much traditional data analysis
organizations perform, most Big Data is unstructured or semi-structured, requiring different
techniques and tools to process and analyze. Distributed computing environments and
massively parallel processing (MPP) architectures enabling parallelized data to ingest and
analyzed are the preferred approaches to processing such complex data.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 6


LITE MATTERS 2023

1. Structured Data

Any data stored, accessed, and processed in a fixed format is termed 'structured' data.
Over time, talent in computer science has achieved tremendous success in developing
techniques for working with such kinds of data (where the format is well known in advance)
and deriving value from it. However, nowadays, we are foreseeing issues when such data
grows to a vast extent; typical sizes are in the range of multiple zettabytes. Data containing
a defined data type, format, and structure (transaction data, online analytical processing
[OLAP] data cubes, traditional RDBMS, CSV files, and even simple spreadsheets) is an
example of structured data.

2. Semi-structured Data

Semi-structured data can contain both forms of data. We can see semi-structured data
as a structured form, but it is not defined with, e.g., a table definition in relational DBMS.
An example of semi-structured data is data represented in an XML file.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 7


LITE MATTERS 2023

3. Quasi-structured Data

Quasi-structured data refers to the textual data with erratic data formats that can be
formatted with effort, tools, and time (for instance, web clickstream data that may contain
inconsistencies in data values).

4. Unstructured Data

Any data with an unknown form or structure is classified as unstructured data. In


addition to the huge size, unstructured data poses multiple challenges in its processing for
deriving value out of it. A typical example of unstructured data is a heterogeneous data
source containing a combination of simple text files, images, videos, etc. Nowadays,
organizations have a wealth of data available to them. Still, unfortunately, they don't know
how to derive value from it since this data is in its raw form or unstructured format.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 8


LITE MATTERS 2023

Characteristics of Big Data

Big data can often be described by five characteristics known as the 5 V's: volume,
velocity, variety, veracity, and value.

1. Volume

Big data is a form of data whose volume is so large that it would not fit on a single
machine; therefore, specialized tools and frameworks are required to store, process, and
analyze such data. For example, social media applications process billions of messages
every day. Likewise, industrial and energy systems can generate terabytes of sensor data
every day; cab aggregation applications can process millions of transactions in a day, etc.
The volumes of data generated by modern IT, industrial, healthcare, Internet of Things,
and other systems is growing exponentially driven by the lowering costs of data storage
and processing architectures and the need to extract valuable insights from the data to
improve business processes, efficiency, and service to consumers. However, there is no
fixed threshold for the volume of data to be considered big data. Typically, the term big
data is used for massive-scale data that is difficult to store, manage and process using
traditional databases and data processing architectures.

2. Velocity

The velocity of data refers to how fast the data is generated. Data generated by certain
sources can arrive at very high velocities, for example, social media data or sensor data.
Velocity is another important characteristic of big data and the primary reason for the
exponential growth of data. The high velocity of data results in the volume of data
accumulated becoming very large in a short time. Some applications can have strict
deadlines for data analysis (such as trading or online fraud detection), and the data needs
to be analyzed in real-time. Specialized tools are required to ingest such high-velocity data
into the big data infrastructure and analyze the data in real time.

3. Variety

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 9


LITE MATTERS 2023

Variety refers to the forms of the data. Big data comes in different forms, such as
structured, unstructured, or semi-structured, including text data, image, audio, video, and
sensor data. Big data systems must be flexible enough to handle such a variety of data.

4. Veracity

Veracity refers to how accurate the data is. The data needs to be cleaned to remove
noise to extract the value. Data-driven applications can reap the benefits of big data only
when the data is meaningful and accurate. Therefore, data cleansing is important so
incorrect and faulty data can be filtered out.

5. Value

The value of data refers to the usefulness of data for its intended purpose. It is related
to the veracity or accuracy of the data. The end goal of any big data analytics system is to
extract value from the data. For some applications, value also depends on how fast we can
process the data.

Domain-Specific Examples of Big Data

Big data applications span a wide range of domains, including (but not limited to) homes,
cities, environment, energy systems, retail, logistics, industry, agriculture, Internet of Things,
and healthcare.

1. Web
a. Web Analytics
b. Performance Monitoring
c. Ad Targeting and Analytics
d. Content Recommendation
2. Financial
a. Credit Risk Modeling
b. Fraud Detection
3. Healthcare
a. Epidemiological Surveillance
b. Patient Similarity-based Decision Intelligence Application
c. Adverse Drug Events Prediction
d. Detecting Claim Anomalies
e. Evidence-based Medicine
f. Real-time health monitoring
4. Internet of Things
a. Intrusion Detection
b. Smart Parking
c. Smart Roads
d. Structural Health Monitoring
e. Smart Irrigation
5. Environment
a. Weather Monitoring
ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 10
LITE MATTERS 2023

b. Air Pollution Monitoring


c. Noise Pollution Monitoring
d. Forest Fire Detection
e. River Floods Detection
f. Water Quality Monitoring
6. Logistics and Transportation
a. Real-time Fleet Tracking
b. Shipment Monitoring
c. Remote Vehicle Diagnostics
d. Route Generation and Scheduling
e. Hyper-local Delivery
f. Cab/Taxi Aggregators
7. Industry
a. Machine Diagnosis and Prognosis
b. Risk Analysis of Industrial Operations
c. Production Planning and Control
8. Retail
a. Inventory Management
b. Customer Recommendation
c. Store Layout Optimization
d. Forecasting Demand

How big data analytics works?

Big data analytics involves collecting, processing, cleaning, and analyzing large datasets
to help organizations operationalize their big data.

1. Collect data.

Data collection looks different for every organization. With today's technology,
organizations can gather structured and unstructured data from various sources — from
cloud storage to mobile applications to in-store IoT sensors. Some data will be stored in
data warehouses where business intelligence tools and solutions can access it easily.

Raw or unstructured data that is too diverse or complex for a warehouse may be
assigned metadata and stored in a data lake.

2. Process data.

Once data is collected and stored, it must be appropriately organized to get accurate
analytical queries, especially when it's large and unstructured. In addition, available data is
growing exponentially, making data processing challenging for organizations.

One processing option is batch processing, which looks at large data blocks over time.
Batch processing is useful when there is a longer turnaround time between collecting and
analyzing data.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 11


LITE MATTERS 2023

Stream processing looks at small batches of data at once, shortening the delay time
between collection and analysis for quicker decision-making. However, stream processing
is more complex and often more expensive.

3. Clean data.

Data, big or small, requires scrubbing to improve data quality and get more robust
results; all data must be formatted correctly. Any redundant or irrelevant data must be
eliminated or accounted for. Dirty data can obscure and mislead, creating flawed insights.

4. Analyze data.

Getting big data into a usable state takes time. However, advanced analytics processes
can turn big data into big insights once it's ready. Some of these big data analysis methods
include:

(1) Data mining sorts through large datasets to identify patterns and relationships
by identifying anomalies and creating data clusters.
(2) Predictive analytics uses an organization's historical data to make predictions,
identifying upcoming risks and opportunities.
(3) Deep learning imitates human learning patterns using artificial intelligence and
machine learning to layer algorithms and finds patterns in the most complex
and abstract data.

Benefits of (Big) Data Analytics

1. Decision-making improves.

Companies may use the information they obtain from data analytics to guide their
decisions, leading to improved results. Data analytics removes much guesswork from
preparing marketing plans, deciding what material to make, creating goods, and more.
With advanced data analytics technologies, new data can be constantly gathered and
analyzed to enhance your understanding of changing circumstances.

2. Marketing becomes more effective.

When businesses understand their customers better, they can sell to them more
efficiently. Data analytics also gives businesses invaluable insights into how their marketing
campaigns work so that they can fine-tune them for better results.

3. Customer service improves.

Data analytics provides businesses with deeper insight into their clients, helping them
to customize customer experience to their needs, offer more customization, and create
better relationships with them.

4. The efficiency of operations increases.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 12


LITE MATTERS 2023

Data analytics will help businesses streamline operations, save resources, and improve
the bottom line. When companies obtain a better idea of what the audience needs, they
spend less time producing advertisements that do not meet their desires.

Challenges of Big Data Analytics

Although big data analytics brings several benefits to a business, its implementation is
not always straightforward. First, companies must adopt a data-driven culture and have the
necessary tools to collect, process, and analyze data. Here are some challenges organizations
might face while adopting big data analytics.

1. Quality of data.

In big data analytics, quality data is everything. Unfortunately, low-quality, duplicate, or


inconsistent data sets can lead to many problems, including misinterpretation, poor decision-
making, and, ultimately, loss of revenue. Low-quality data can also create involuntary bias in
a system.

Of course, big data can't be 100% accurate. And it doesn't have to be entirely accurate to be
useful. But significantly low-quality data sets will do more harm than good and won't bring
valuable insight. Duplicate data can also cause contradictions and spoil your efforts in
making decisions requiring utmost accuracy.

2. Synchronization of data sources.

Data is collected from various sources, including social media platforms and company
websites. Businesses can also collect customer data using in-store facilities such as Wi-Fi. In
addition, retailers like Walmart are known to couple in-store surveillance with computer
vision technology to identify the aisles customers visit the most and the least.

Most businesses are growing at a rapid pace. It also means that the amount of data they
generate is also increasing. Although the data storage part is sorted for a decade or more,
data lakes and data warehouses synchronizing data across different data sources can be
challenging.

Combining data from different sources into a unified view is called data integration and is
crucial for deriving valuable insights. Unfortunately, this is one aspect of big data analytics
that many companies overlook, leading to logic conflicts and incomplete or inaccurate
results.

3. Organizational resistance.

Apart from some technological aspects of big data analytics, adopting a data-driven culture
in an organization can be challenging. For example, a 2021 NewVantage Partners Big Data
and AI Executive Survey revealed that only 24.4% of the participating companies had forged
a data culture within their firms.

Lack of understanding, lack of middle management adoption, business resistance, and poor
organizational alignment are reasons why companies have yet to adopt a data-driven culture.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 13


LITE MATTERS 2023

4. Making big data accessible.

Collecting and processing data becomes more difficult as the amount of data grows.
Therefore, organizations must make data accessible and convenient for data owners of all
skill levels.

5. Maintaining quality data.

With so much data to maintain, organizations spend more time scrubbing for duplicates,
errors, absences, conflicts, and inconsistencies than ever.

6. Keeping data secure.

As the amount of data grows, so do privacy and security concerns. As a result, organizations
will need to strive for compliance and put tight data processes in place before taking
advantage of big data.

7. Finding the right tools and platforms.

New technologies for processing and analyzing big data are developed all the time.
Organizations must find the right technology to work within their established ecosystems and
address their particular needs. Often, the right solution is flexible and can accommodate
future infrastructure changes.

8. Other challenges

• Lack of talent is a significant challenge company face while integrating big data.
Although the number of individuals opting for a career in data science and analytics is
steadily increasing, there's still a skill shortage.

• Data quality maintenance is another issue. Since data comes from multiple sources at
high velocity, the time and resources required to manage data quality properly can be
significant.

ANGELES, GALVEZ, GAN, REYES, ROXAS, SUMALA, VILLAFUERTE 14

You might also like