$R3N9XOZ
$R3N9XOZ
Course Description
Introduction to Big Data
Course Description
Grading (100%):
Final exam 70
Mid-term exam 10
Practice exam 10
Course work 10
Timing:
Lecture 3
Practice 3
Introduction
Buying Online
Introduction
Shopping cart
Wish list and Previous purchases
Items rated and reviewed
Geo-location
Time-on-site and Duration of views
Links clicked & Text Searched
Telephone inquiries
Responses to marketing materials
Social media posting
Introduction
Customer Data
Introduction
Customer Data
Introduction
Introduction
Introduction
Introduction
Big Data
People,
Machines or
Sensors.
What is a big data?
The Big Data Framework organization attempts to categories the development of Big data to
three main phases;
Phase 1.0 (1970-2000): Big data was mainly described by the data storage and analytics, and it
was an extension to the modern database management systems and data warehousing technologies;
Phase 2.0 (2000-2010): with the uprising of Web 2.0, and the propagation of semi-structured and
unstructured content, the notion of Big data has changed to embody advanced technical solutions to
extract meaningful information from dissimilar and heterogeneous data formats;
Phase 3.0 (2010-now): with the emergence of smartphones and mobile devices, sensor data,
wearable devices, Internet of Things (IoT), to many more data generators, Big Data has entered a new
era and has drawn a new horizon with a new range of opportunities
What is a big data?
Big Data Characteristics
Big Data Characteristics
Big data V-features
Big Data Characteristics
Amount of data
Size of data plays a very critical role in determining the value out of
data
This is evident as more than 90% of the data was produced recently.
In fact, more than 2.5 Exabyte (=1018) bytes are created daily since even
as earlier as 2013 from every post, share, search, click, stream, and
many more data producers. It is expected to be 463 Exabyte in 2025.
People share 500 terabytes of data per day on Facebook. Also, there are
over 300 hours of video shared every minute on YouTube
Velocity
Being able to identify the relevance and accuracy of data and apply it to the
appropriate purpose.
Data availability
Correctness and
Consistency
Variability
It differs from the veracity in that the validity does “mean the correctness and
accuracy of data with regard to the intended usage”.
In other word, data can be trustworthy, thus satisfy the veracity aspect. But,
poor interpretation to the data might lead to unintended use. Moreover, the
same truthful data can be valid to be used in one application and invalid for a
different one.
Vulnerability
Refers to the security of the collected datasets that will be used for later
analysis.
It also denotes the errors in the system which permits harmful activities
to be conducted on the collected datasets.
The rate of increase in data is much faster than the existing processing
systems.
The current storage systems are not capable enough to store these data.
What if data volume gets so large that we do not know how to deal with it?
The demand for people with good analytical skills in big data is
increasing.
Technical Issues
Fault Tolerance
Scalability
Quality of Data
Heterogeneous Data
Technical Issues: Fault Tolerance
Data Quality :
Completeness
Validity
Accuracy
Consistency
Integrity
Timeless
Technical Issues: Heterogeneous Data
Data is collected from different source with different formats.
Database
Websites
Social Networks
Files
Ontologies
APIs
….
Big Data Analytics
A set of fundamental concepts/principles that underlie techniques for
extracting useful knowledge from large datasets containing a variety of data
types.
Big data analytics is a term that describes the process of using data to
discover trends, patterns, and other correlations, as well as using them to
make data-driven decisions.
Types of Big Data Analytics
There are four main types of big data analytics: descriptive,
diagnostic, predictive, and prescriptive analytics.
It allows you to know the trends from raw data and describe what is
currently happening.
Use to investigate data and content to answer “Why did it happen?”. So, by
analyzing data, we understand the reasons for certain behaviors and events
related to specific situation.
Some tools and techniques used for such a task include: searching for patterns
in the data sets, filtering the data, using probability theory, regression analysis,
and more.
predictive Analytics
Answers the question, “What might happen in the future?”
Making predictions for the future can help your organization formulate
strategies based on likely scenarios.
Prescriptive Analytics
Answers the question, “What should we do next?”
It takes the results from descriptive and predictive analysis and finds
solutions for optimizing decisions through various simulations and
techniques
Assignment