0% found this document useful (0 votes)
46 views30 pages

Data Analytic and Big Data: Edwin Puertas, PH.D (C) - Epuerta@utb - Edu.co

The document discusses data analytics and big data. It describes the evolution of the internet and how the volume of data created every two days now exceeds what was created on the entire web up until 2003. It outlines the characteristics, types, and challenges of big data as well as data mining techniques and tools. Data science is defined as using scientific methods to extract knowledge from structured and unstructured data. The document also presents the methodology for data analytics projects.

Uploaded by

Daniel Berrio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views30 pages

Data Analytic and Big Data: Edwin Puertas, PH.D (C) - Epuerta@utb - Edu.co

The document discusses data analytics and big data. It describes the evolution of the internet and how the volume of data created every two days now exceeds what was created on the entire web up until 2003. It outlines the characteristics, types, and challenges of big data as well as data mining techniques and tools. Data science is defined as using scientific methods to extract knowledge from structured and unstructured data. The document also presents the methodology for data analytics projects.

Uploaded by

Daniel Berrio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Data Analytic and Big Data Edwin Puertas, Ph.D(c).

Introduction [email protected]
Programa de Ingeniería de Sistemas y Computación
Introduction | Internet Growth

Programa de Ingeniería de Sistemas y Computación


Introduction | Internet Growth

Programa de Ingeniería de Sistemas y Computación


Eras of the Web

The Web is evolving towards a


shared social experience, in
which consumers will rely on
their peers as they make online
decisions and will shape future
products

E Cambria. Affective computing and sentiment analysis. IEEE Intelligent


Systems 31(2), 102-107 (2016)

Programa de Ingeniería de Sistemas y Computación


Evolution of Internet

Programa de Ingeniería de Sistemas y Computación


Big social data analysis

Between the dawn of the


Internet and year 2003, there
were five exabytes of
information on the Web. Now,
we create five exabytes every
two days.

Programa de Ingeniería de Sistemas y Computación


Big Data
▪ Macro data
▪ Big data
▪ Large-scale data
▪ Such large and complex data sets
▪ Structured or unstructured data

Programa de Ingeniería de Sistemas y Computación


Characteristics of Big Data
Accuracy
Is the information correct in every
detail?
Completeness
How comprehensive is the information?
Reliability
Does the information contradict other
trusted resources?
Relevance
Do you really need this information?
Timeliness
How up- to-date is information? Can it
be used for real-time reporting?

Programa de Ingeniería de Sistemas y Computación


Volume
• The anticipated volume of data that
is processed by Big Data solutions
is substantial and ever-growing.
• High data volumes impose distinct
data storage and processing
demands, as well as additional
data preparation, curation and
management processes.

Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: concepts, drivers & techniques. Prentice Hall Press.

Programa de Ingeniería de Sistemas y Computación


Velocity
• In Big Data environments, data can
arrive at fast speeds, and
enormous datasets can
accumulate within very short
periods of time.
• From an enterprise’s point of view,
the velocity of data translates into
the amount of time it takes for the
data to be processed once it enters
the enterprise’s perimeter

Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: concepts, drivers & techniques. Prentice Hall Press.
Programa de Ingeniería de Sistemas y Computación
Variety

• Data variety refers to the multiple formats and types of data that need to
be supported by Big Data solutions.
• Data variety brings challenges for enterprises in terms of data integration,
transformation, processing, and storage.

Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: concepts, drivers & techniques. Prentice Hall Press.
Programa de Ingeniería de Sistemas y Computación
Veracity

▪ The quality of the data that is being


captured can vary greatly.
▪ Accuracy of the analysis depends
on the veracity of the source data.

Programa de Ingeniería de Sistemas y Computación


Value
• Value is defined as the usefulness
of data for an enterprise.
• The value characteristic is
intuitively related to the veracity
characteristic in that the higher the
data fidelity, the more value it holds
for the business.
• Value is also dependent on how
long data processing takes
because analytics results have a
shelf-life

Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: concepts, drivers & techniques. Prentice Hall Press.
Programa de Ingeniería de Sistemas y Computación
Different types of data
The data processed by Big
Data solutions can be human-
generated or machine-
generated, although it is
ultimately the responsibility of
machines to generate the
analytic results.
▪ structured data
▪ unstructured data
▪ semi-structured data
Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: concepts, drivers & techniques. Prentice Hall Press.
Programa de Ingeniería de Sistemas y Computación
Different types of data

Structured Data Unstructured Data Semi-structured Data

Programa de Ingeniería de Sistemas y Computación


Data workflow

The Data Science Workflow: https://towardsdatascience.com/the-data-science-workflow-43859db0415


Programa de Ingeniería de Sistemas y Computación
Big Data Solutions
Financial

Government

Education

Industries

Retailers

Logistics

Health

Programa de Ingeniería de Sistemas y Computación


Data management infrastructure
• Decentralization and flexibility of
data architecture
• Distributed computing models that
manage unstructured data and
allow the development of very
intensive massive computing tasks
• Analyze large amounts of data in
distributed environments
• Databases SQL and NoSQL

Programa de Ingeniería de Sistemas y Computación


Data Analytics

Knowing, classifying, filtering, and


Extract
using the information through an
exhaustive analysis of this data is
essential for the data to become true
assets and business generators of
Processing
the company.

Analyzed
Types of Analytics

Types of Analytics: descriptive, predictive, prescriptive analytics: https://www.dezyre.com/

Programa de Ingeniería de Sistemas y Computación


Descriptive Analytics

Types of Analytics: descriptive, predictive, prescriptive analytics: https://www.dezyre.com/


Programa de Ingeniería de Sistemas y Computación
Predictive Analytics

Types of Analytics: descriptive, predictive, prescriptive analytics: https://www.dezyre.com/


Programa de Ingeniería de Sistemas y Computación
Prescriptive Analytics

Types of Analytics: descriptive, predictive, prescriptive analytics: https://www.dezyre.com/


Programa de Ingeniería de Sistemas y Computación
Challenges of Data Analytics

Current Data Analytics


They are stored in Data It is a process that Analysts are separated
Internal, limited and It is descriptive or
marts before being requires months of from those responsible
structured data sources reporting
analyzed development and business decisions

Data Analysis Artificial Intelligence


Internal and external Interactive reports,
Data from a common Data Increased performance Business managers work
sources, unlimited, predictive analytics and
Warehouse made easy directly with analysts
structured and unstructured prescription

Programa de Ingeniería de Sistemas y Computación


Data Mining
▪ It is the automatic extraction of
hidden predictive information from
databases.
▪ It studies methods and algorithms
that allow the automatic extraction
of synthesized information that
allows characterizing the hidden
relationships.
▪ The data does not change while it
is being analyzed.

Programa de Ingeniería de Sistemas y Computación


Data Mining Techniques

▪ Classification
▪ Association
▪ Outlier detection
▪ Clustering
▪ Regression
▪ Prediction
▪ Sequential patterns
▪ Decision trees
▪ Neural networks
▪ Data warehousing
▪ Long-term memory processing

Programa de Ingeniería de Sistemas y Computación


Data Mining and Data Analytics Tools
It is supported by three technologies
that are mature enough:

▪ Massive data collection.


▪ Multiprocessing computers.
▪ Data mining algorithms.

Programa de Ingeniería de Sistemas y Computación


What is Data Science?

Data science is an interdisciplinary


field that uses scientific methods,
processes, algorithms, and systems
to extract knowledge and insights
from many structural and
unstructured data.

Dhar, Vasant. "Data science and prediction." Communications of the ACM


56.12 (2013): 64-73.

Programa de Ingeniería de Sistemas y Computación


Methodology for data analytics projects

Programa de Ingeniería de Sistemas y Computación


Programa de Ingeniería de Sistemas y Computación

You might also like