0% found this document useful (0 votes)
31 views19 pages

Emrging

Uploaded by

lishanefrem4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views19 pages

Emrging

Uploaded by

lishanefrem4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

HAWASSA UNIVERSITY

Institute of Technology (IOT)


Faculty of Informatics

Introduction to Emerging Technologies

(CoSc1012)

Intr. to Emerging Tech. 2020


prep: Werkneh E.(Msc in Comp Net)
Chapter 2(Two)

Data Science

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Outline

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Objective
Describe what data science is and the role of data scientists.

Differentiate data and information.

Describe data processing life cycle

Understand different data types from diverse perspectives

Describe data value chain in emerging era of big data.

Understand the basics of Big Data.

Describe the purpose of the Hadoop ecosystem components.


Intr. to Emerging Tech.
prep: Werkneh E.(Msc in Comp Net)
2.1. An Overview of Data Science
 Data Science:
 multi-disciplinary field that uses scientific methods, processes, algorithms, and systems
to extract knowledge and insights from structured, semi-structured and
unstructured data.
 We are in Digital world, every actions we take generates data
 From our mobile devices, online interactions, sensors and computers, cameras, wearable devices,
watches collect, store and process information about the environment around us.
 New, huge data sets are now open and publicly accessible
 This gives us the power to make more informed decisions, react more quickly to change, and
better understand the world around us.
 Advances traditional skills of analyzing large amounts of data, data mining, and
programming skills

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Cont’d…
 Data science produces data insights— insights you can use to understand
and improve your business, your investments, your health, and even your
lifestyle and social life.
 Using data science is like being able to see in the dark.
 Data scientists need to be curious and result-oriented, with exceptional
industry-specific knowledge and communication skills that allow them to
explain highly technical results

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Cont’d…
 To practice data science, in the true meaning of the term, you need the
analytical know-how of math and statistics, the coding skills(python, R,
SQL) necessary to work with data, and an area of subject-matter expertise.

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
2.1.1. Data Vs. Information
Data Information
Representation of facts, concepts, or Interpreted/processed data
instructions
Represented using (A-Z, a-z), digits (0-9) or arranged in meaningful way and useful
special characters (+, -, /, *, <,>, =, etc.). manner, and it is used for making decisions

have no meaning. E.g.


Can be Numbers, letters, pictures, line
graphs, etc.
Input for the computer
E.g. Abebe, 21, IT, 2.5 etc. Possible Info: Abebe is an IT student, and has
low GPA
Decision :Abebe needs Tutorial
Intr. to Emerging Tech.
prep: Werkneh E.(Msc in Comp Net)
2.1.2. Data Processing Cycle
 Data processing
 is the re-structuring or re-ordering of data by people or machines to increase their
usefulness and add values for a particular purpose.
 Data processing cycle
2. Numbers compared/sorted/calculated, text
formatted, image/sound edited, etc.…

1. Text, numbers, audio, video, 3. Result displayed and collected on the


image, symbols monitor, speaker, printers, etc. or

4. Data stored for later use

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
2.3 Data types and their representation
From Computer programming perspective
• Integers(int)- stores whole numbers
• Booleans(bool)- stores only two values (true/false, On/Off)
• Characters(char)- stores a single character
• Float: store real numbers
• Alphanumeric strings(string)- stores a combination of characters and numbers

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Cont’d…
From Data Analytics perspective
• Structured: stored, processed, and manipulated in tabular form (Excel, SQL) and ready for
analyses (e.g. sensor data, weblogs, financial institutions data)
• Unstructured: commonly generated from human activities and is not organized in a pre-
defined manner. (audio, video file) (e.g. satellite, radar, experimental, social media, mobile,
website data)
• Semi-structured: doesn’t fit into a structured database system, but is structured by tags
(JSON, XML)
• Metadata: Data about Data
• Most important elements for Big Data initial analysis
• In a set of photographs, for example, metadata could describe when and where the photos were taken.

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
2.4. Data value Chain

Data Acquisition Data Analysis Data curation Data Storage Data Usage:

• process of • exploring, • management of data • RDBMS which used • increasing


over its life cycle, as storage paradigm business decision
gathering, filtering, transforming, ensuring it’s quality for long time but is no
and cleaning data and modeling requirements for its scalable for BIG data
making
before it is stored data for effective usage • NoSQL: designed
highlighting • through Content with the scalability
relevant data creation, selection, goal in mind
and classification,,
extracting transformation,
validation, and
hidden preservation activities
information

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
2.5. Basic concepts of big data
 What is Big Data?
 so large and complex data sets to store and process using traditional tools and single
computer
 Characterized by 4/5vs

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Cont’d….

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Cont’d…

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
Intr. to Emerging Tech.
prep: Werkneh E.(Msc in Comp Net)
2.5.2. Clustered Computing and Hadoop Ecosystem
 Cluster computing:
 connected computers that work together- behave like a single machine
 Big data can’t be handled by individual computers
 To better address the high storage and computational needs of BigData
 Advantages:
 Resource Pooling: combining Storage space, CPU, Memory
 High Availability: Fault tolerance
 Easy Scalability: add more machines to the pool

 requires:
❑ managing cluster membership, coordinating resource sharing and
❑ scheduling actual work on individual nodes.

 Cluster membership and resource allocation can be handled by software like Hadoop’s
YARN. “Yet Another Resource Negotiator”.
Intr. to Emerging Tech.
prep: Werkneh E.(Msc in Comp Net)
2.5.2.2.Hadoop and its Ecosystem
 Open source Software for distributed
storage and processing, of large
datasets across clusters of computers
 Four key characteristics of Hadoop are:
 Economical: ordinary computers can be used
for data processing.
 Reliable: stores copies of the data on different
machines and is resistant to hardware failure.
 Scalable: It is easily scalable, Horizontal and
vertical

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)
2.5.3.Big Data Life Cycle with Hadoop

Processing the Visualizing the


• Data transferred to data in storage • Pig,Hive, and Impala results
Hadoop from various
sources • Spark and MapReduce • analyzed data can be
• Sqoop transfers data from perform data processing accessed by users
RDBMS to HDFS, whereas • Hue and Cloudera Search
Flume transfers event data

Ingesting data Computing and


into the system analyzing data

Intr. to Emerging Tech.


prep: Werkneh E.(Msc in Comp Net)

You might also like