0% found this document useful (0 votes)
35 views37 pages

Big Data Lec1

Big Data Lec1

Uploaded by

Aya Isma3eel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views37 pages

Big Data Lec1

Big Data Lec1

Uploaded by

Aya Isma3eel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Big Data Analysis

Dr. Mona Abbass


Content
❑ Course Description
❑ Introduction to Big Data
▪ What is a big data?
▪ Characteristics of Big Data
▪ Big data Challenges
Course Description
❑The aim of course:
▪The aim of this course is to provide the students
with theoretical and practical skills related to big
data analysis.
Course Content
1. Basic concept in big data
2. Cloud computing
3. Introduction to big data analytics
4. Introduction to Hadoop technology
5. Mapreduce
6. Revision
7. Final exam
Course Description
Grading (100%):

❑ Final exam 70
❑ Mid-term exam 10
❑ Practice exam 10
❑ course work 10

Timing:

❑ Lecture 3
❑ Practice 3
What is a big data?
❑The notion of Big data comes before the advances in databases
technologies and from the need for solutions to handle the huge
deluge of datasets and, therefore, the lack of sufficient storage
capacity.

❑The notion of Big data has evolved through the past decades
where each decade is described in terms of computer disc space,
from Megabyte (MB) in 1970s to Exabyte (EB) which was
introduced in 2011.
What is a big data?
What is a big data?
❑Big data is a term for a collection of data sets, so large and
complex that it becomes often difficult to process using
traditional data processing applications.

❑Big data is Large amounts of different types of data produced


from various types of sources, such as
▪ People,
▪ Machines or
▪ Sensors.
What is a big data?
The Big Data Framework organization attempts to categories the
development of Big data to three main phases;
❑Phase 1.0: Big data was mainly described by the data storage
and analytics, and it was an extension to the modern database
management systems and data warehousing technologies;
❑Phase 2.0: With the uprising of Web 2.0, and the propagation of
semi-structured and unstructured content, the notion of Big data
has evolved to embody advanced technical solutions to extract
meaningful information from dissimilar and heterogeneous data
formats;
What is a big data?
The Big Data Framework organization attempts to categories the
development of Big data to three main phases;

❑Phase 3.0: with the emergence of smartphones and mobile


devices, sensor data, Internet of Things (IoT), to many more data
generators, Big Data has entered a new era and has drawn a new
horizon with a new range of opportunities.
What is a big data?
Characteristics of Big Data
❑ Big data has been amply characterized by the well-known 3Vs
(Volume, Velocity and Variety)
❑ The following is the 10Vs of Big data.
▪ Volume
▪ Velocity
▪ Variety
▪ Veracity
▪ Variability
▪ Validity
▪ Vulnerability
▪ Volatility
▪ Value
Volume

❑ Refers to the vast increase in the data growth.

❑ In fact, more than 2.5 quintillion (𝟏𝟎𝟏𝟖 ) bytes are created daily since
even as earlier as 2013 from every post, share, search, click, stream,
and many more data producers.
Velocity
❑ Represents the accumulation of data in high speed, near real-time
and real-time from dissimilar data sources.
Variety (Format)
❑ Involves collecting data from various resources and in fuzzy and
heterogeneous types.

❑ This includes importing data in dissimilar formats, namely


▪ Structured (tables reside in relational databases – RDBMS, etc.),
▪ Semi-structured (email, XML, and other markup languages, etc.) and
▪ Unstructured (text, pictures, audio files, video, sensor data, etc.).
Veracity
❑ Refers to the accuracy, and correctness of data.

❑ There are multiple factors to ensure the veracity of Big data:


▪ Trustworthiness of data origin;
▪ Reliability and security of data store;
▪ Data availability
▪ Correctness and
▪ Consistency.
Variability
❑ Refers to variance in meaning, number of inconsistences, multitude of
data dimensions, and inconsistent data receiving speeds.
Validity
❑ Refers to the “data are shown (or known) to be an accurate indicator
of the claim being made”.
❑ It differs from the veracity in that the validity does “mean the
correctness and accuracy of data with regard to the intended usage.
❑ In other word, data can be trustworthy, thus satisfy the veracity
aspect. Yet, poor interpretation to the data might lead to unintended
use. Moreover, the same veracious data can be valid to be used in
one application and invalid for a different one.
Vulnerability
❑ Refers to the security of the collected datasets that will be used for
later analysis.

❑ It also denotes the flaws in the system which permits malicious


activities to be conducted on the collected datasets.
Volatility
❑ Refers to time up which data is valid to be stored/used before it be-
comes obsolete or no longer relevant.

❑ It is crucial dimension since cost of storage and maintenance


augments with longer Big data retention.
Visualization
❑ Refers to the ability to present Big data into a visual context, such as
diagrams, graphs, maps, etc. toward better understanding and
interpreting of data.

❑ It also assists people and organizations to discover patterns,


correlations, trends, relationships and dependencies.

❑ Big data visualization is a powerful tool for decision makers to access,


evaluate and interpret massive data in even real time and act upon it.
Value
❑ Represents the outcome product of Big data analysis (i.e. new
insights).
Big data V-features
Big data Challenges
❑Storing and processing issue
❑Privacy and Security
❑Data access and sharing
❑Analytical challenges
❑Skills requirements
❑Technical Issues
Storing and processing issue
❑The rate of increase in data is much faster than the existing
processing systems.

❑The current storage systems are not capable enough to store these
data.

❑There is a need to develop a processing system that not only caters


to today's needs but also future needs.
Privacy and Security
❑New devices and technologies like cloud computing provide a
gateway to access and to store information for analysis.

❑This integration of IT architectures will pose greater risks to data


security and intellectual property.
Data access and sharing
❑Generally data is used for making accurate decisions.
❑The data should be available in accurate complete and timely
manner.
Analytical challenges
❑Traditional RDBMS are suitable only for structured data.
❑What if data volume gets so large that we do not know how to deal
with it?
❑ Does all data need to be store?
❑Does all data need to be analyzed?
❑Which data points are important?
❑How can data be used for best advantages?
Skills requirements
❑With the increase in amount of (structured and unstructured) data
generated, there is a need for talent.

❑The demand for people with good analytical skills in big data is
increasing.
Technical Issues
❑Fault Tolerance
❑Scalability
❑Quality of Data
❑Heterogeneous Data
Fault Tolerance
❑A system's ability to continue operating uninterrupted
despite the failure of one or more of its components.
❑Fault-tolerant systems use backup components that
automatically take the place of failed components, ensuring
no loss of service.
Scalability
The property of a system to handle a growing amount of work by adding
resources to the system.

Vertical Scalability (Scale-up) –


In this type of scalability, we increase the power of existing resources in
the working environment in an upward direction
Scalability
Horizontal Scalability (Scale-Out) –

In this kind of scaling, the resources are added in a horizontal row.


Quality of Data
Heterogeneous Data
Questions
1. What are the types (format) of Big data?
2. Mention some of the Big data features?
3. Mention some of the Big data challenges?
Thanks
Dr. Mona Abbass
E-mail [email protected]

You might also like