0% found this document useful (0 votes)
39 views95 pages

Week 3 4th Revolution

The document discusses the exponential growth of data in today's digital world. Some key points: - The rise of the internet, social media, and technology has led to massive amounts of data being created every day by individuals, organizations, and devices. - During the COVID-19 pandemic, internet and technology usage increased dramatically, fueling even greater data growth as people spent more time online. - By the end of 2021, total global data is predicted to reach 74 zettabytes. Various statistics are provided on the data generated daily by internet activities like searches, emails, social media posts, and videos. - As more people and devices connect to the internet, data will continue growing

Uploaded by

momo Adam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views95 pages

Week 3 4th Revolution

The document discusses the exponential growth of data in today's digital world. Some key points: - The rise of the internet, social media, and technology has led to massive amounts of data being created every day by individuals, organizations, and devices. - During the COVID-19 pandemic, internet and technology usage increased dramatically, fueling even greater data growth as people spent more time online. - By the end of 2021, total global data is predicted to reach 74 zettabytes. Various statistics are provided on the data generated daily by internet activities like searches, emails, social media posts, and videos. - As more people and devices connect to the internet, data will continue growing

Uploaded by

momo Adam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Industrial Revolution – IV

By

Professor Jihad Mohamad ALJA’AM

1
BIG DATA
We live surrounded and submerged by DATA

DIGITAL DATA

2
3
Introduction
Since the invention of computers, people have
used the term data to refer to computer
information.

Example: I have data, I don’t have data

DATA
Data can be texts or numbers written on
papers, or it can be bytes and bits inside the
memory of electronic devices, or it could be
facts that are stored inside a person’s mind.

DIGITAL DATA  Binary 0, 1


4
Data: Text + Video + Audio + Logs from Web

Web Logs File


Contains Information about Visitors

WE CAN FIND DATA EVERY WHERE

ALL PEOPLE CAN GENERATE DATA


MASSIVELY

5
DATA EXPLOSION WITH INTERNET AND SOCIAL MEDIA

6
7
WORLD OF DATA

8
Internet and Data
You are on the Internet almost daily. You check
your email, send replies, maybe browse
websites, and even click on things (image, link).

Every move you make online generate data.

With around 4.66 billion active Internet users


worldwide, the data produced daily surpasses
the imagination.

9
SOCIAL MEDIA GENERATE DATA

A HUGE AMOUNT OF DATA IS GENERATED

10
The Internet & DATA & CORONA VIRUS
The coronavirus pandemic shuttered offices,
schools, restaurants, and other establishments.
It allowed people to spend more time on the
Internet for work, learning, and entertainment.

 1.7 MB is how much data is created every


second per person. (Northeastern
University)

Photos + Videos + Voices + Text

11
 2.5 quintillion bytes of data were created
every day. (SG Analytics, 2020):

1000000000000000000 = 1018 Byte/day


That is equivalent to 10 million discs, which
when stacked would be as tall as two Eiffel
Towers combined. (Dihuni, 2020)

12
 As of August 2020, in one Internet minute
there were:

41,666,667 messages

by WhatsApp users. That is the most media


usage in 2020. (Domo, 2020)

 That is followed by voice or video calls,


which amounted to 1,388,889 per Internet
minute. (Domo, 2020)

13
 404,444 users streamed on Netflix every
minute. (Domo, 2020)

 Amazon shipped 6,659 packages per


minute. This figure contributed to the
explosive of E-Commerce (Domo, 2020)

14
 Email users sent 306.4 billion emails per
day in 2020. In contrast, 293.6 billion were
exchanged in 2019. (Radicati Group, 2019;
TechJury, 2020)

 People sent 500 million tweets daily.


(TechJury, 2020). That was 5,787 tweets
per second. (e-Learning Infographics, 2020)

 3.5 billion searches were made on Google.


(e-Learning Infographics, 2020). Most
visited search engine.

15
 300 hours of video were uploaded on
YouTube per minute. (e-Learning
Infographics, 2020)

 A connected car produced 4 TB of data in


one day. (Raconteur, 2020)

16
Smart Transportation

WHAT TO DO WITH DATA?


17
DATA ENGINEERING

1.DATA STORAGE
2.DATA PROCESSING
3.INFORMATION RETRIEVAL
4.SEARCHING DATA
5.ORGANISING DATA
6.DATA CLASSIFICATION
7.DATA CLEANING
8.COMPLETING MISSING DATA
9.REASONNING, PREDECTION, PLANNING

18
DATA & EVENTS

 Over six million posts were made in one


day to commemorate Supreme Court
Justice Ruth Bader Ginsburg. (Facebook,
2020)

 When Kamala Harris was voted as United


States vice president, the announcement
drew over 10 million posts per day in
August. (Facebook, 2020)

19
 Facebook created 4 PB of data in one day.
(Raconteur, 2020)
 Users posted 350 million photos in a day on
Facebook. (Raconteur, 2020)

 47 million stories with the Support Small


Business Sticker were created on Instagram
in the last quarter of 2020. (Facebook,
2020)

20
 Instagram users uploaded 95 million photos
per day over the year. (e-Learning
Infographics, 2020).

 The average user stayed on the Instagram


app for 15 minutes. Within those 15
minutes, they comment, like, search, and
scroll, adding more to the data produced.
(e-Learning Infographics, 2020).

 Two professionals signed up on LinkedIn


every second in 2020. (e-Learning
Infographics, 2020).

TECHNOLOGY GENERATE DATA


21
DATA GROWTH IN 2021
How much data is created every day 2021? As
of April 2021, the number of people on the
Internet has grown by 7.6%. This means 60%
of the world’s population is now online.
 74 zettabytes – the total data in the world
by the end of 2021, according to expert
predictions. (IDC & Statista, 2020)

 There would be a 3% growth of email users


in 2021. (Radicati Group, 2019)

 One study shows that 1.145 trillion MB of


data is created every day. (TechJury, 2021)

 There could be 2 trillion searches on Google


by the end of 2021. (Internet Live Stats,
2021)

22
 That would be six billion searches in 365
days. (Internet Live Stats, 2021)

 3,026,626 emails are sent every second.


(Internet Live Stats, 2021)

 Of which, 67% are spam. (Internet Live


Stats, 2021)

 Users send 31 million messages every


minute each day on Facebook. (Strategic
Tech Investor, 2021)

 Facebook users view around 2.7 million


videos per minute every day. (Strategic
Tech Investor, 2021),

 Every year, more than 2.5 billion blog posts


go up (GrowthBadger, 2021)

23
 Each month, users publish 70 million blog
posts and post 77 million new comments on
WordPress. (GrowthBadger, 2021)

 As more and more people use the internet,


cybersecurity threats also continue to
grow. To date, 230,000 new malware
samples are created every day. (PurpleSec,
2021).

24
WORLD OF ZETTABYTE

25
DATA = KNOWLEDGE

KNOWLEDGE GENERATE MONEY

26
27
PILE OF DVD THAT REACHES THE MOON
WHEN STACKED
DIFFICULTIES with DATA

28
Importance of DATA

If you work in human services because you hate


math, terms like “data,” “quantitative analysis,”
might sound scary.

Don’t be intimidated! Data does not have to be


complicated.

Data is useful information that you collect


to support organizational decision-making
and strategy.

IMPROVE OUTCOMES

29
Quality
Improving quality is first and foremost among the
reasons why organizations should be using data.

DATA = KNOWLEDGE
MORE DATA = MORE KNOWLEDGE

YOU CAN SEE THE WORLD BETTER WITH DATA


30
MONITORING
Data allows you to monitor the health of your
organization:

Organizations are able to respond to challenges


before they become full-blown crises.

Effective quality monitoring will allow your


organization to be proactive rather than reactive
and will support the organization to maintain best
practices over time.
PROACTIVE VERSU REACTIVE

31
PROACTIVE Versus REACTIVE

Weather Provider Companies

32
Example.
Data: Weather NEXT Week
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
35°C 38°C 45°C 51°C 55°C 60°C 65°C

PROACTIVE:
Inform people from now to be prepared.
Provide sufficient bottles of water.
Check your Air conditioners at home and work
Ban works from 12:00 – 15:00
Thursday Friday Saturday Sunday
51°C 55°C 60°C 65°C

REACTIVE:
Reaching Thursday, some people die by the
heat, Then take ACTIONS

33
DATA = MONEY

YOU CAN COLLECT AND SELL DATA


COMPANIES MAY BUY DATA

Data about people


Data about organisations
Data about countries
DATA SETS Data about families
Data about students in universities
Data about products
Data about hotels
34
DATA and Strategy

Data allows organizations to measure the


effectiveness of a given strategy.

When strategies are put into place to overcome a


challenge, collecting data will allow you to
determine how well your solution is performing,
and whether or not your approach needs to be
tweaked or changed over the long-term.

A data strategy is a long-term plan that


defines the technology, processes,
people, and rules required to manage an
organization's information assets.
35
Example
Solve the large number of students failure in the
physics course.

Strategy:
1. Give more homework’s
2. Use Videos to Explain Theoretical Concepts
3. Reduce score in midterms and final exams
4. Ask the students to work in groups

Data Collection:
Collect the data over a period of six months and see if
this strategy leads to solve the problem.
REDUCE THE NUMBER OF FAILING STUDENTS.

You can adjust or change something in


your strategy based on the analysis of
the data

36
Find Solutions to Problems
Data allows organizations to more effectively
determine the cause of problems.

Data allows organizations to visualize relationships


between what is happening in different locations,
and departments.
DATA ENGINEERS

SOFTWARE FOR DATA VISUALISATION

37
Example: Travel Agency:
AGA travel agency has 4 offices. Get data of sales
in every office over the year.
Collect DATA
Office-1 5 Million Consumption: 3 Millions
Office-2 25 Millions Consumption: 8 Millions
Office-3 18 Millions Consumption: 8 Millions
Office-4 1 Million Consumption: 2 Millions

Office-4: Lose money

DATA ANALYSIS

Action-1: Make training for staff.

Action-2: Reduce staff in Office-4 or even close it.

38
Systems Advocacy
Data is a key component of systems advocacy.
Utilizing data will help you present a strong
argument for systems change.

Argue why it is important to make changes


in your current systems or software

Whether you are advocating for increased


funding from public or private sources, or
making the case for changes in regulation,
illustrating your argument through the use of
data will allow you to demonstrate why
changes are needed.
1. Change something: Systems/Software
2. Ask for fund from government
3. Increase/reduce the staff
4. Buy faster computers

DATA HELPS TO STRENGTHEN YOUR ARGUMENT


FOR CHANGES
39
DATA FOR STRATIGIC DECISIONS
Data will help you explain (both good and bad)
decisions to your stakeholders. Whether or not
your strategies and decisions have the outcome
you anticipated, you can be confident that you
developed your approach based not upon guesses,
but good solid of data analysis.

DATA HELPS TO EVALUATE DECISIONS &


ADOPTED STRATEGIES

Strategic Planning
Data allows you to replicate areas of strength
across your organization. Data analysis will support
you to identify high-performing programs, service
areas, and people.

Once you identify your high-performers, you can


study them in order to develop strategies to assist
programs, service areas and people that are under-
performing (make training).

40
ALL BUSNESSES NEED BIGDATA TO FLOURISH

BIG DEMANDS TO DATA SCIENTISTS

Top Industries Hiring Data Scientists in 2022

https://www.naukri.com/learning/articles/top-industries-hiring-data-scientists/

41
WE CANNOT GROWTH UP BUSNESSES
WITHOUT DATA
The value of the data science market
is slated to reach $16 billion by 2025

42
Top Recruiters of DATA SCIENTISTS
 Amazon
 Flipkart

 Walmart

 Aditya Birla Fashion & Retail Ltd.

 Future Enterprises Ltd.

 Reliance Retail Ltd.

 K. Raheja Group (Shoppers’ Stop)

 Landmark Group (Lifestyle)

 ITC

43
How is Data Stored?

Computers represent data (e.g., text, images,


sound, video), as binary values that employ two
numbers: 1 and 0.

The smallest unit of data is called a “bit,” and it


represents a single value. Additionally, a byte
is eight bits long.

Memory and storage are measured in units


such as megabytes, gigabytes, terabytes,
petabytes, and exabytes.

44
Data

Data Measurement Size


Single Binary Digit
Bit
(1 or 0)
Byte 8 bits
Kilobyte (KB) 1,024 Bytes
Megabyte (MB) 1,024 Kilobytes
Gigabyte (GB) 1,024 Megabytes
Terabyte (TB) 1,024 Gigabytes
Petabyte (PB) 1,024 Terabytes
Exabyte (EB) 1,024 Petabytes

A zettabyte is storage for 30 Billion


4K movies
45
The Human Brain Capacity is 1.2ZB

Huge Amount of Data need to be


Stored, Structured and Searched
46
Current Technology fails to work with
Bigdata

Data Processing Cycle

Data processing is defined as the re-ordering


or re-structuring of data by people or machines
to increase its utility and add value for a
specific function or purpose.

Example

Search tweets on Qatar and World Cup.

TWEETED TEXTS ARE NOT STRUCTURED

HOW TO STRUCTURE THEM IN ORDER


TO EXTRACT SOME USEFUL
INFORMATION

“What people think about Qatar”

47
We can address queries to structured data.

This is done with a language called


Structured Query Language or SQL for
short.

For example, if we want to find out how


many users made a tweet between 10am
and 11am we could do something like:
A QUERY IN SQL

SELECT Users.UserId, Twitter.Tweet, Twitter.Time

FROM Twitter

INNER JOIN Users ON

Twitter.UserId=Users.UserId

WHERE Twitter.Time >=10am OR <=11am

Organising Data
Standard data processing is made up of three basic
steps:

Input, Processing, and Output

48
Together, these three steps make up the data
processing cycle.

 Input: The input data gets prepared for


processing in a convenient form that relies
on the machine carrying out the
processing.

HOW TO PROCESS BIG-DATA?

 Processing: Next, the input data’s form is


changed to something more useful. For
example, information from timecards
(attendance) is used to calculate
paychecks.

 Output: In the final step, the processing


results are collected as output data, with its
final form depending on what it’s being
used for. Using the previous example,
output data becomes the employees’
actual paychecks.

HOW MUCH MONEY SHOULD BE GIVEN

49
Employee Timecards: ATTENDANCE

ANALYSE THE TIMECARDS OVER THE WEEK


AND GENERATE THE PAYCHECK
ACCORDINGLY

$675.80 based on the worked hours


50
Big-Data
Big Data is a data but with a huge size

ERA OF
ZETTABYTE

'Big Data' is a term used to describe


collection of data that is huge in size and
yet growing exponentially with time.

51
TONS OF DATA
Data which are very large in size is called
Big Data like ZETTABYTES

Working with Bigdata is problematic.


We need much powerful software and
computers to work with Big Data.

BigData Needs Storage & Processing


52
WE NEED FAST INTERNET CONNECTIVITY
TO DEAL WITH BIGDATA
Connection Speed Technology
Internet Data Rate Data Rate Data Rate Data Rate
Technology (per second) (per second) (per second) (per second)
28.8K Modem 28.8 Kbps 28,800 Bits 3,600 Bytes 3.5 Kilobytes
36.6K Modem 36.6 Kbps 36,600 Bits 4,575 Bytes 4.4 Kilobytes
56K Modem 56 Kbps 56,000 Bits 7,000 Bytes 6.8 Kilobytes
ISDN 128 Kbps 128,000 Bits 16,000 Bytes 15 Kilobytes
T1 1.544 Mbps 1,544,000 Bits 193,000 Bytes 188 Kilobytes
512 Kbps to 8
DSL 8,000,000 Bits 1,000,000 Bytes 976 Kilobytes
Mbps
512 Kbps to 52 6,469 Kilobytes
Cable Modem 53,000,000 Bits 6,625,000 Bytes
Mbps (6.3MB/sec)
5,460 Kilobytes
T3 44.736 Mbps 44,736,000 Bits 5,592,000 Bytes
(5.3MB/sec)
Gigabit 1,000,000,000 125,000,000 122,070 Kilobytes
1 Gbps
Ethernet Bits Bytes (119MB/sec)
13,271,000,000 1,658,875,000 1,619,995 Kilobytes
OC-256 13.271 Gbps
Bits Bytes (1.5GB/sec)

SPEED AFFECTS BUSNIESSES

WE NEED FAST CONNECTIVITY


TO
WORK WITH BIGDATA

53
Types of various Units of Memory
Byte 01011111 011111111 00000000

Kilo Byte 1000 Bytes


Mega Byte 1024 Kilos
Giga Byte 1024 Mega
Tera Byte 1024 Giga
Peta Byte 1024 Tera
Exa Byte 1024 Peta
Zetta Byte 1024 Exa
Yotta Byte 1024 Zetta

54
Name Equal To Size(In Bytes)

Bit 1 Bit 1/8

Nibble 4 Bits ½ (rare)

Byte 8 Bits 1

Kilobyte 1024 Bytes 1024

1, 024
Megabyte Kilobytes 1, 048, 576

1, 024
Gigabyte Megabytes 1, 073, 741, 824

1, 024
Terrabyte Gigabytes 1, 099, 511, 627, 776

1, 024 1, 125, 899, 906, 842,


Petabyte Terabytes 624

1, 024 1, 152, 921, 504, 606,


Exabyte Petabytes 846, 976

1, 024 1, 180, 591, 620, 717,


Zettabyte Exabytes 411, 303, 424

1, 024 1, 208, 925, 819, 614,


Yottabyte Zettabytes 629, 174, 706, 176

55
DELUGE OF DATA

DATA SCIENTISTS ARE NEEDED IN ALL


BUSINESSES
56
TONS OF DATA GENERATED
57
Current software fail to deal with bigdata

58
The currents systems will be very slow
and almost impossible to deal with
BIGDATA
Normally we work on data of size MB
(Word Doc, Excel) or maximum GB
(Movies) but data in Zetta bytes or Peta
bytes i.e. 1012 or 1015 byte size called
Big Data, impossible to work with them

59
DATA SCIENCE ENGINEERS

GOOLE WORKS WITH BIGDATA

60
Google processes more
than 20 petabytes of data
every day. This includes
around 3.5 billion search
queries.

“Data is the new oil of


Technology”

61
Volume of DATA

74 Zettabytes (74 trillion GBs) of data


would be generated by the Internet.

In short, such a data is so large and


complex that none of the traditional data
management tools can store it or process it
efficiently.

62
The amount of data in the world was
estimated to be 44 zettabytes at the
dawn of 2020.

By 2025, the amount of data generated


each day is expected to reach 463
exabytes globally.

Google, Facebook, Microsoft, and


Amazon store at least 1,200
petabytes of information.

The world spends almost $1


million per minute on commodities on
the Internet based on BIGDATA

By 2025, there would be 75


billion Internet-of-Things (IoT)
devices in the world.

63
By 2030, nine out of every ten people
aged six and above would be digitally
active.

ZETTABYTES

The New York Stock Exchange generates about one


terabyte of new data per day.

64
Social Media generate 500 terabytes of
new data Facebook, Google, LinkedIn, …,
every day. This data is mainly generated in
terms of photo and video uploads,
message exchanges, comments etc.

Single Jet engine can generate


10+terabytes of data in 30 minutes of a
flight time.

Many thousand flights per day, generation


of data reaches up to many Petabytes.

Weather Station: All the weather stations


and satellites give very huge data which are
stored and manipulated to forecast
weather.

65
WEATHER PREDICTION COMPANIES CAN SELL
DATA TO ORGANISATIONS

Telecom company: Telecom like Ooredoo,


Vodafone study the user trends and
accordingly publish their plans and for this
they store the data of its million users for
analysis.

E-commerce site: Sites like Amazon, Flipkart,


Alibaba generates huge amount of logs from
which users buying trends can be traced.

66
SOFTWARE TO HANDLE BIGDATA

What are you interested in? Science


Fiction, Perfumes, etc.

Software to analyse the log files and detect


user trends.

67
Identification: Your IP address.
Trends: Web pages you visited
Items you are interested in.

Processing these massive amounts of


data is not impossible with new
technologies like Quantum and
Hadoop.

Quantum computers are among these


technologies, which work a thousand
times faster than traditional
computers.
NEW COMPUTERS

QUANTIB COMPUTING TO HANDLE BIGDATA


68
QUANTUM COMPUTING TO WORK WITH BITCOINS
CRYPTO CURRENCY

69
How to process BigData

70
From DATA to KNOWLEDGE

CLEAN DATA, COMPLETE DATA

71
HADOOP FRAMEWORK FOR BIGDATA

Data are Text and Tables


CAN BE PROCESSED IN PARALLEL WITH
SEVERAL COMPUTERS
72
GOOGLE STRATEGY with HADOOP

GOOGLE STORE DATA IN DIFFERENT


COMPUTERS CALLED CLUSTERS

73
Hadoop Distributed File Systems
STORAGE

STORE DATA IN DIFFERENT COMPUTERS


CALLED CLUSTERS

74
Distributed Storage into Blocks

EVERY BLOCK CAN STORE A PORTION


OF DATA
75
PROCESSING BIGDATA IN PARALLEL

FILTRING AND GIVING RESULTS AFTER


PROCESSING

DSIPLAYING THE RESULTS


76
HADOOP FOR BIG DATA
STORE DATA IN BLOCKS
PROCESS DATA IN PARALLEL

77
78
DATA TYPES

79
Structured Data

Structured data can be defined as the data that


resides in a fixed field within a record. It is split
into multiple tables. All of the data follows the
same format. Structured data is easy to enter,
query, and analyze.

80
STRUCTURED DATA

81
Semi-Structured Data
To consider what semi-structured data is,
let's start with an analogy -- interviewing.
Let's say you're conducting a semi-
structured interview. This, as the name
implies, falls somewhere in-between a
structured and unstructured interview.

For context, a structured interview is one in


which the questions being asked, as well as the
order in which they are asked, is pre-
determined by your HR team and consistent for
each candidate.

82
An unstructured interview, on the other hand,
is one in which the questions, and the order in
which they are asked, is up to the discretion of
the interviewer -- and could be entirely different
for each candidate.

When you consider these two extremes, you


can begin to see the benefits of semi-
structured interview, which are fairly consistent
and quantitative (like a structured interview),
but still provide the interviewer with a window
for building rapport, and asking follow-up
questions.

Semi-structured data is similar in nature to


a semi-structured interview -- it's not as
messy and uncontrolled as unstructured
data, but not as rigid and readily
quantifiable as structured data.

83
Semi-structured data is information that does
not reside in a relational database or any other
data table, but nonetheless has some
organizational properties to make it easier to
analyze. A good example of semi-structured
data is HTML code to build web pages.

84
Unstructured data  Any format of data.

85
Data Velocity Defined
Data velocity refers to the speed in which data
is generated, and collected.

The velocity rate is based on factors such


as the number of sensors present on IoT –
enabled devices and the amount of
individuals using the internet and Social
Medias

Velocity refers to the speed at which


data is entered into a system and must
be processed.

For example, Amazon captures every


click of the mouse while shoppers are
browsing on its website. This happens
rapidly.
86
It is incredibly important to have real-
time data at any time to make better
business decisions faster.

87
Search for Coco Chanel Perfume

Give the surfer spontaneous offers

Propose to you immediately some offers

PROPOSE DISCOUNTS

88
89
DATA visualization – 3D DATA

TO DESSIMINATE IDEAS

ACCESS – ORACLE Cannot handle bigdata

90
1. STRUCTURED
2. SEMI-STRUCTURED
3. UNSTRUCTURED

91
92
HADOOP IS A SOFTWARE THAT HAS
MAN TOOLS TO
WORK WITH BIGDATA
USING CLUSTERS OF COMPUTERS

93
HADOOP IS FREE

94
95

You might also like