0% found this document useful (0 votes)

98 views47 pages

Chapter 2 Introduction To Data Science

Introduction to Data Science Overview for Data Science Definition of data and information Data types and representation Data Value Chain Data Acquisition Data Analysis Data Curating Data Storage Data Usage Basic concepts of Big data

Uploaded by

Bedasa Wayessa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views47 pages

Chapter 2 Introduction To Data Science

Uploaded by

Bedasa Wayessa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Course Title: Introduction to Emerging Technologies

Credit Hour: 3 hrs.

Course Code: EmTe1012.
ECTS: 5 [3 Lecture hours and 0 Lab hours]
Lecture Schedule: Every ____________

Bedasa Wayessa
[email protected]

EmTe1012 1
Classroom Rules
• Late comer will only tolerated for the first 5 minutes of every class
• Talk to me and Not to each other
• Do not sleep
• Do not use phones
• Fail to obey the Classroom rule  face 2  3 class ban

EmTe1012 2
Assignment Submission
 Guidelines for submission will be provided with every assignment
 Re-grade requests will ONLY be entertained within one week after
the assignments have been handed back to students or assignment due
date
 IMPORTANT: Late submissions are allowed ONLY until 1 day following
the deadline, with 10% marks deduction.
 IMPORTANT: Late + Copy = ZERO Marking

EmTe1012 3
QUIZZES
• Quizzes will NOT be announced
• Re-grade requests will only be entertained within one week after the
marked quizzes have been handed back to students [with tangible and
acceptable reason only]

EmTe1012 4
Chapter 2

Introduction to Data Science

EmTe1012 5
Outlines
• Introduction to Data Science
– Overview for Data Science
• Definition of data and information
• Data types and representation
– Data Value Chain
• Data Acquisition
• Data Analysis
• Data Curating
• Data Storage
• Data Usage
– Basic concepts of Big data

EmTe1012 6
Objectives
• Describe what data science is and the role of data scientists.
• Differentiate data and information.
• Describe data processing life cycle
• Understand different data types from diverse perspectives
• Describe data value chain in emerging era of big data.
• Understand the basics of Big Data.
• Describe the purpose of the Hadoop ecosystem components.

EmTe1012 7
Activity
 What is data science?
 Can you describe the role of data in emerging technology?
 What are data and information?
 What is big data?

EmTe1012 8
Definition of Data Science
• Data science is a multi-disciplinary field that uses
 Scientific methods,
 Processes,
 Algorithms, and
 Systems to extract knowledge and insights from
 Structured,
 Semi-structured and
 Unstructured data.
• Data science is much more than simply analyzing data.
• It offers a range of roles and requires a range of skills.

EmTe1012 9
Data and Information
• Data can be defined as a representation of
 facts,
 concepts, or
 instructions in a formalized manner which should be suitable for
communication, interpretation, or processing, by human or
electronic machines.
• It can be described as unprocessed facts and figures
• It is represented with the help of characters such as
 alphabets (A-Z, a-z),
 digits (0-9) or
 special characters
EmTe1012 10
Data and Information
• Information
 is the processed data on which decisions and actions are based.
 It is data that has been processed into a form that is meaningful to
the recipient.
• Information is interpreted data;
• created from
 organized,
 structured, and
 processed data in a particular context.

EmTe1012 11
Data Processing cycle
• Data processing - is the re-structuring or re-ordering of data by
people or machines to increase their usefulness and add values for a
particular purpose.
• Data processing consists of the following basic steps –
 input,
 processing, and output.
• These three steps constitute the data processing cycle

EmTe1012 12
Data Processing cycle
 Input
 The input data is prepared in some convenient form for processing.
 The form will depend on the processing machine.
 For example, when electronic computers are used, the input data
can be recorded on any one of the several types of storage medium,
such as hard disk, CD, flash disk and so on.
 Processing
 The input data is changed to produce data in a more useful form.
 For example, interest can be calculated on deposit to a bank, or a
summary of sales for the month can be calculated from the sales
orders.

EmTe1012 13
Data Processing cycle
• Output –
 The result of the proceeding processing step is collected.
 The particular form of the output data depends on the use of the
data.
 For example, output data may be payroll for employees.

EmTe1012 14
Data Types and Their Representation
 Data types can be described from diverse perspectives.
1. Data types from Computer programming perspective
– In computer programming a data type is simply an attribute of data
that tells the compiler or interpreter how the programmer intends
to use the data.
– Data types help ensure the correct use and interpretation of data.
– They prevent errors and improve code readability.
– Different data types have different properties and operations.
• Common data types include:
 Integers(int)- is used to store whole numbers, mathematically
known as integers

EmTe1012 15
Data Types and Their Representation
 Data types can be described from diverse perspectives.
1. Data types from Computer programming perspective
o Common data types include:
 Booleans(bool)- is used to represent restricted to one of two
values: true or false
 Characters(char)- is used to store a single character
 Floating-point numbers(float)- is used to store real numbers
 Alphanumeric strings(string)- used to store a combination of
characters and numbers

EmTe1012 16
Data Types and Their Representation
• Common data types include: continued
 Characters(char)- is used to store a single character
 Floating-point numbers(float)- is used to store real numbers
 Alphanumeric strings(string)- used to store a combination of
characters and numbers
• A data type makes the values that expression, such as a variable or
a function might take.
• This data type defines:
the operations that can be done on the data,
the meaning of the data, and
the way values of that type can be stored.

EmTe1012 17
Data Types and Their Representation
2. Data types from Data Analytics perspective
• From a data analytics point of view, there are three common types of
data types:
 Structured: Relational databases, spreadsheets, CSV files.
 Semi-structured: XML, JSON, log files.
 Unstructured data types. Text documents, images, videos, audio
recordings

EmTe1012 18
Data Types and Their Representation
 Structured data
 Is a data that adheres to a pre-defined data model and is therefore
straightforward to analyze.
 Structured data conforms to a tabular format with a relationship
between the different rows and columns.
o Common examples of structured data are Excel files or SQL
databases.
 Each of these has structured rows and columns that can be sorted.
 Easy to store, query, and analyze.
 Well-understood and supported by many tools and technologies.
 Provides a clear and organized view of the data.

EmTe1012 19
Data Types and Their Representation
 Semi-structured Data
 is a form of structured data that does not conform with
• the formal structure of data models associated with relational
databases or other forms of data tables contains tags or other
markers to separate semantic elements and enforce hierarchies
of records and fields within the data.
• More flexible than structured data.
 Therefore, it is also known as a self-describing structure.
 Can accommodate a wide variety of data formats and structures.
 Examples: JSON (JavaScript Object Notation), XML (Extensible
Markup Language), HTML (HyperText Markup Language)

EmTe1012 20
Data Types and Their Representation
 Unstructured Data
 Is the information that either does not have a predefined data model
or is not organized in a pre-defined manner.
 Unstructured information is typically text-heavy but may contain data
such as dates, numbers, and facts as well.
 This results in irregularities and ambiguities that that make it difficult
to understand using traditional programs as compared to data stored
in structured databases.
 Contains valuable information that is not captured by structured data.
 Can provide insights into customer behavior, sentiment, and trends.
 Examples of unstructured data include:
o Text documents, audio, video files or No-SQL databases.
EmTe1012 21
Data Types and Their Representation
 Metadata – Data about Data
 Metadata is data that provides additional information about other data.
 It is essentially "data about data.“
 In a set of photographs, for example, metadata could describe
when and where the photos were taken.
The metadata then provides fields for dates and locations
which, by themselves, can be considered structured data.
 It is one of the most important elements for Big Data analysis and
big data solutions.
To describe the characteristics of a dataset.
To provide context and meaning to the data.
To make data more discoverable and reusable.
EmTe1012 22
Activity
• Discuss data types from programing and analytics perspectives.
• Compare metadata with structured, unstructured and semi-structured data
• Given at least one example of structured, unstructured and semi-
structured data types.

EmTe1012 23
Data value Chain
• The Big Data Value Chain describes the key steps involved in
transforming raw data into valuable insights and actions.
• It consists of the following high-level activities:
– Acquisition, Analysis, Curation, Storage and Usage.

EmTe1012 24
Data value Chain
1. Data Acquisition
• It is the process of:
gathering, • Before data put in
filtering, and data warehouse
cleaning data
• It is one of the major big data challenges in terms of infrastructure
requirements.
• The infrastructure required to support the acquisition of big data must deliver:
 low, predictable latency in both capturing data and in executing queries;
 be able to handle very high transaction volumes, often in a
distributed environment;
 support flexible and dynamic data structures

EmTe1012 25
Data value Chain
2. Data Analysis
• It is concerned with making the raw data acquired amenable to use
in decision-making as well as domain-specific usage.
• Data analysis involves: Exploring, transforming, and modeling data
with the goal of:
• highlighting relevant data,
• synthesizing and
• extracting useful hidden information with high potential from a
business point of view.
• Related areas include data mining, business intelligence, and machine
learning.

EmTe1012 26
Data value Chain
3. Data Curation
• It is the active management of data over its life cycle to ensure it meets
the necessary data quality requirements for its effective usage.
• Data curation processes can be categorized into different activities such as:
• content creation, selection, classification, transformation,
validation, and preservation.
• Data curation is performed by expert curators that are responsible for
improving the accessibility and quality of data.
• Data curators are responsible for ensuring that data are trustworthy,
discoverable, accessible, reusable and fit their purpose.

EmTe1012 27
Data value Chain
4. Data Storage
• Data storage refers to the persistence and management of data in a scalable
way that satisfies the needs of applications that require fast access to the data.
• Traditional Data Storage: RDBMS have been the main, and almost
unique, a solution to the storage paradigm for nearly 40 years.
• However, the ACID (Atomicity, Consistency, Isolation, and Durability)
properties that guarantee database transactions
– Lacks flexibility with regard to schema changes and the performance.
– fault tolerance when data volumes and complexity grow, making
them unsuitable for big data scenarios.
• NoSQL technologies have been designed with the scalability goal in mind
and present a wide range of solutions based on alternative data models.
EmTe1012 28
Data value Chain
5. Data Usage
• It covers the data-driven business activities that need
– access to data, its analysis, and the tools needed to integrate the
data analysis within the business activity.
• Data usage in business decision-making can enhance competitiveness
through:
– the reduction of costs,
– increased added value,
– Enhanced decision-making or
– any other parameter that can be measured against existing
performance criteria.

EmTe1012 29
Basic concepts of big data
• Big data is a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or
traditional data processing applications.
– large dataset means a dataset too large to reasonably process or store
with traditional tool or on a single computer.
• Big data is characterized by 3V and more:
– Volume: Large amounts of data Zeta bytes/Massive datasets
– Velocity: Data is live streaming or in motion
– Variety: Data comes in many different forms from diverse sources
– Veracity: Can we trust the data? How accurate is it?

EmTe1012 30
Basic concepts of big data

Figure: Characteristics of big data

• Variability: Big data can be highly dynamic and constantly changing.

• Complexity: Big data analysis often requires specialized tools and techniques.
• Privacy and security: Ensuring the privacy and security of big data is crucial.
EmTe1012 31
Clustered Computing and Hadoop Ecosystem
• Individual computers are often inadequate for handling the big data at
most stages.
• Challenges of Individual Computers for Big Data:
– Limited storage capacity
– Insufficient processing power
– Scalability limitations
– Single point of failure
• Solutions for Minimizing Individual Computers:
– Distributed computing
– Cloud computing
– Clustered Computing

EmTe1012 32
Clustered Computing and Hadoop Ecosystem
• Clustered Computing
• To better address the high storage and computational needs of big data,
computer clusters are a better fit.
• Big data clustering software combines the resources of many smaller
machines, seeking to provide a number of benefits.
• Benefits of Clustered Computing for Big Data:
– Resource Pooling: Combining the available storage space to hold
data is a clear benefit, but CPU and memory pooling are also
extremely important.
• Processing large datasets requires large amounts of all three of
these resources.

EmTe1012 33
Clustered Computing and Hadoop Ecosystem
 Clustered Computing
– High Availability: Clusters can provide varying levels of fault
tolerance and availability guarantees to prevent hardware or software
failures from affecting access to data and processing.
• This becomes increasingly important as we continue to
emphasize the importance of real-time analytics.
– Easy Scalability: Clusters make it easy to scale horizontally by
adding additional machines to the group.
• This means the system can react to changes in resource
requirements without expanding the physical resources on a
machine.

EmTe1012 34
Clustered Computing and Hadoop Ecosystem
• Using clusters requires a solution for managing:
– cluster membership,
– coordinating resource sharing and
– scheduling actual work on individual nodes.
• Cluster Management Software:
• Cluster membership and resource allocation can be handled by software like
– Hadoop YARN (Yet Another Resource Negotiator): A popular
resource management framework for Hadoop clusters.
• Clustered computing provides a powerful and flexible solution for addressing
the storage, computational, and scalability challenges of big data.

EmTe1012 35
Activity
• List and discuss the characteristics of big data
• Describe the big data life cycle.
• Which step you think most useful and why?
• List and describe each technology or tool used in the big data life cycle.
• Discuss the three methods of computing over a large dataset.

EmTe1012 36
Hadoop and its Ecosystem
 Hadoop is an open-source framework designed for distributed
processing of large datasets across clusters of computers.
 The Hadoop Ecosystem is a suite of tools that provides various
services to solve big data problems.
 It is a framework that allows for the distributed processing of large datasets
across clusters of computers using simple programming models.
 It is inspired by a technical document published by Google.
 The Hadoop Ecosystem refers to a collection of open-source
software tools and technologies that work together to provide a
comprehensive solution for big data processing and analysis.

EmTe1012 37
Hadoop and its Ecosystem
• The four key characteristics of Hadoop are:
– Economical: Its systems are highly economical as ordinary computers
can be used for data processing. [Cost-effective, Open-source]
– Reliable: It is reliable as it stores copies of the data on different
machines and is resistant to hardware failure. [Fault tolerance, Data
redundancy]
– Scalable: It is easily scalable both, horizontally and vertically.
• A few extra nodes help in scaling up the framework. And vertically
adding more resources (CPU, memory) to existing nodes.
– Flexible: It is flexible and you can store as much structured and
unstructured data as you need to and decide to use them later.
• Data format and Programming language agnostic
EmTe1012 38
Hadoop and its Ecosystem
• There are four major elements of Hadoop i.e.
– HDFS, MapReduce,YARN, and Hadoop Common.
• Hadoop has an ecosystem that has evolved from its four core
components:
– data management,
– access,
– processing, and
– storage.
– It is continuously growing to meet the needs of Big Data.

EmTe1012 39
Hadoop and its Ecosystem
• It comprises the following components and many others:
– HDFS: Hadoop Distributed File System
– YARN:Yet Another Resource Negotiator
– MapReduce: Programming based Data Processing
– Spark: In-Memory data processing
– PIG, HIVE: Query-based processing of data services
– HBase: NoSQL Database
– Mahout, Spark MLLib: Machine Learning algorithm libraries
– Solar, Lucene: Searching and Indexing
– Zookeeper: Managing cluster
– Oozie: Job Scheduling
EmTe1012 40
Hadoop and its Ecosystem
• n

Hadoop Ecosystem
EmTe1012 41
Four Major Elements of Hadoop
 HDFS: is responsible for storing large data sets of structured or
unstructured data across various nodes and thereby maintaining the
metadata in the form of log files.
– It breaks down large files into smaller blocks and distributes them
across a cluster of machines.
 HDFS consists of two core components i.e.
 Name node and Data Node
 YARN: YARN (Yet Another Resource Negotiator):
– YARN is the resource management layer of Hadoop that manages
resources and schedules tasks across the cluster.
– It allows different data processing engines (like MapReduce, Spark,
etc.) to run on the same cluster.
EmTe1012 42
Four Major Elements of Hadoop
 MapReduce:
– A programming model and processing engine used to process and
generate large datasets in parallel across a distributed cluster.
– It consists of two main functions: map, which processes data into key-
value pairs, and reduce, which performs summarization or
aggregation on the output of the map phase.
• Hadoop Common: refers to the collection of common utilities and
libraries that support other Hadoop modules.
– It is an essential part or module of the Apache Hadoop Framework,
along with the Hadoop Distributed File System (HDFS), Hadoop
YARN and Hadoop MapReduce.

EmTe1012 43
Big Data Life Cycle with Hadoop
• Ingesting data into the system
– The data is ingested or transferred to Hadoop from various sources
such as relational databases, systems, or local files.
– Sqoop transfers data from RDBMS to HDFS, whereas Flume
transfers event data.
– Processing the data in storage
– In this stage, the data is stored and processed.
– The data is stored in the distributed file system, HDFS, and the
NoSQL distributed data, HBase. Spark and MapReduce perform data
processing.

EmTe1012 44
Big Data Life Cycle with Hadoop
– Computing and analyzing data
– Here, the data is analyzed by processing frameworks such as Pig,
Hive, and Impala.
– Pig converts the data using a map and reduce and then analyzes it.
– Hive is also based on the map and reduce programming and is most
suitable for structured data.
• Visualizing the results
– The fourth stage is Access, which is performed by tools such as Hue
and Cloudera Search.
– In this stage, the analyzed data can be accessed by users.

EmTe1012 45
Chapter One Review Questions

Reading Assignment

EmTe1012 46
End of Chapter 2

EmTe1012 47

Living in The IT Era - Lesson 3 Activity 3
No ratings yet
Living in The IT Era - Lesson 3 Activity 3
10 pages
#1 Technopreneurs and The Technology Evolution - Nicochelle
100% (1)
#1 Technopreneurs and The Technology Evolution - Nicochelle
3 pages
Ba 236 Activities
No ratings yet
Ba 236 Activities
5 pages
Lesson 4 The Web and The Internet
No ratings yet
Lesson 4 The Web and The Internet
49 pages
GEE1 Living in The IT Era Syllabus BEED
No ratings yet
GEE1 Living in The IT Era Syllabus BEED
17 pages
STSModule 3
No ratings yet
STSModule 3
29 pages
The Internet and The Web
No ratings yet
The Internet and The Web
42 pages
740 OK INTE 30023 - Integrative Programming and Technologies
No ratings yet
740 OK INTE 30023 - Integrative Programming and Technologies
106 pages
Laboratory Chapter 1
100% (1)
Laboratory Chapter 1
8 pages
How To Develop Product and Manage On How To Control Inventory
No ratings yet
How To Develop Product and Manage On How To Control Inventory
9 pages
Change and Unexpected Developments
No ratings yet
Change and Unexpected Developments
39 pages
Platform Technologies Reviewer
No ratings yet
Platform Technologies Reviewer
46 pages
Basics of Technopreneurship
No ratings yet
Basics of Technopreneurship
59 pages
Emerging Technology Chapter 4
No ratings yet
Emerging Technology Chapter 4
20 pages
Module 08 Storage Area Network: Background
100% (1)
Module 08 Storage Area Network: Background
4 pages
Reviewer - Social & Professional Issues in Ict
No ratings yet
Reviewer - Social & Professional Issues in Ict
11 pages
Module 1
No ratings yet
Module 1
61 pages
Euthenics Prelims
No ratings yet
Euthenics Prelims
2 pages
Module 3: Planning An Incubator - Basic Training
No ratings yet
Module 3: Planning An Incubator - Basic Training
55 pages
Prelim TECHNOPRENEURSHIP
No ratings yet
Prelim TECHNOPRENEURSHIP
3 pages
CC 101 Module 2 Lesson 2 Students
No ratings yet
CC 101 Module 2 Lesson 2 Students
11 pages
Lesson 3-Online Threats
No ratings yet
Lesson 3-Online Threats
17 pages
Lesson 3 Ethical Issues in ICT
No ratings yet
Lesson 3 Ethical Issues in ICT
17 pages
01 Instructors Guide
No ratings yet
01 Instructors Guide
13 pages
Technoprenuership 01
No ratings yet
Technoprenuership 01
80 pages
ICT Applications in Society
No ratings yet
ICT Applications in Society
38 pages
Technopreneurship 9 PDF
No ratings yet
Technopreneurship 9 PDF
5 pages
Unit 1.0 It Ge Elec 4 Entrep. Mind Bsit 1 1 Isu SC Secsem Fs 2023 2024
No ratings yet
Unit 1.0 It Ge Elec 4 Entrep. Mind Bsit 1 1 Isu SC Secsem Fs 2023 2024
21 pages
It0423 Ipt Manual 2012-13
100% (1)
It0423 Ipt Manual 2012-13
78 pages
Working With The Java Class Library
100% (1)
Working With The Java Class Library
56 pages
Basic Economics With Taxation and Agrarian Reform: Dr. Vicente S. Betarmos, JR
No ratings yet
Basic Economics With Taxation and Agrarian Reform: Dr. Vicente S. Betarmos, JR
6 pages
Living in An IT Era Final Exam
No ratings yet
Living in An IT Era Final Exam
2 pages
Chapter 4 Implementation Strategies PDF
No ratings yet
Chapter 4 Implementation Strategies PDF
32 pages
Lesson 1 Introduction To HCI
No ratings yet
Lesson 1 Introduction To HCI
17 pages
CHAPTER 4 - Marketing Plan
No ratings yet
CHAPTER 4 - Marketing Plan
30 pages
Technopreneurship 101 Topic 1
No ratings yet
Technopreneurship 101 Topic 1
28 pages
Mark Carlo Sanorjo Bscpe - 3B: Review Questions Fill in The Blanks
No ratings yet
Mark Carlo Sanorjo Bscpe - 3B: Review Questions Fill in The Blanks
5 pages
Intro To Technopreneurship
No ratings yet
Intro To Technopreneurship
25 pages
CC102 Lesson 3 Bsit - PPT Variables Data Types
No ratings yet
CC102 Lesson 3 Bsit - PPT Variables Data Types
25 pages
Techno
No ratings yet
Techno
2 pages
Chapter 1 - The Dynamic New Workplace
100% (1)
Chapter 1 - The Dynamic New Workplace
42 pages
Module 3 Lab
No ratings yet
Module 3 Lab
7 pages
Techno Midterm Notes
No ratings yet
Techno Midterm Notes
10 pages
1 - Outline - TECHNOPRENEURSHIP BUSINESS PLAN PROPOSAL CONTENT
No ratings yet
1 - Outline - TECHNOPRENEURSHIP BUSINESS PLAN PROPOSAL CONTENT
2 pages
Living in The It Era Module 5
No ratings yet
Living in The It Era Module 5
9 pages
10 Commandments of Computer Ethics
No ratings yet
10 Commandments of Computer Ethics
13 pages
New Geed 20133 Living in The It Era Modulepdf PDF Free
No ratings yet
New Geed 20133 Living in The It Era Modulepdf PDF Free
138 pages
Lesson 1-Introduction To Enterprise Systems For Management
No ratings yet
Lesson 1-Introduction To Enterprise Systems For Management
17 pages
History of Computer
100% (1)
History of Computer
53 pages
Introduction To Technopreneurship
No ratings yet
Introduction To Technopreneurship
33 pages
Syllabus - CS112-Program-Logic-Formulation
No ratings yet
Syllabus - CS112-Program-Logic-Formulation
9 pages
Part 2 - Appraising and Developing Yourself For An Entrepreneurial Career - PDF - 2
No ratings yet
Part 2 - Appraising and Developing Yourself For An Entrepreneurial Career - PDF - 2
4 pages
Information Management LEC Syllabus 2
No ratings yet
Information Management LEC Syllabus 2
10 pages
GEC ELEC2 Living in The IT Era CM1
No ratings yet
GEC ELEC2 Living in The IT Era CM1
15 pages
INNOVATION AND IDEA GENERATION. Group 1 Report
No ratings yet
INNOVATION AND IDEA GENERATION. Group 1 Report
9 pages
Number System Reviewer
No ratings yet
Number System Reviewer
5 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
30 pages
Chapter 2 - EMTE
No ratings yet
Chapter 2 - EMTE
39 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
QUIZ040404
No ratings yet
QUIZ040404
7 pages
The Blue Ocean Strategy: W. Chan Kim & Renée Mauborgne
No ratings yet
The Blue Ocean Strategy: W. Chan Kim & Renée Mauborgne
47 pages
PHP Developer Course Material
No ratings yet
PHP Developer Course Material
10 pages
Alka Bhagat
No ratings yet
Alka Bhagat
2 pages
THCDC
No ratings yet
THCDC
132 pages
Playing With Thy Name
No ratings yet
Playing With Thy Name
1 page
Biogeochemical Cycle
No ratings yet
Biogeochemical Cycle
29 pages
Study On The Relationship Between The WTO's IP Agreement and The Convention On Biological Diversity - Ipleaders
No ratings yet
Study On The Relationship Between The WTO's IP Agreement and The Convention On Biological Diversity - Ipleaders
20 pages
Yaesu FT-8800 Summary Sheet
No ratings yet
Yaesu FT-8800 Summary Sheet
1 page
INF464 Coursework PDF
No ratings yet
INF464 Coursework PDF
10 pages
Luce Irigaray - Sharing The Fire - Outline of A Dialectics of Sensitivity (2019, Springer International Publishing - Palgrave Macmillan) PDF
100% (2)
Luce Irigaray - Sharing The Fire - Outline of A Dialectics of Sensitivity (2019, Springer International Publishing - Palgrave Macmillan) PDF
114 pages
Brave New World Essay
No ratings yet
Brave New World Essay
3 pages
Chapter - Ii Muslim Law of Testamentary Succession
100% (1)
Chapter - Ii Muslim Law of Testamentary Succession
51 pages
Music Listening and Critical Thinking
No ratings yet
Music Listening and Critical Thinking
15 pages
United States District Court Northern District of California
No ratings yet
United States District Court Northern District of California
1 page
Chap 2
No ratings yet
Chap 2
4 pages
Cambridge O Level: Economics 2281/11
No ratings yet
Cambridge O Level: Economics 2281/11
12 pages
Vocabulary + Grammar Unit 1 Test A PDF
100% (1)
Vocabulary + Grammar Unit 1 Test A PDF
3 pages
Billet Marker
0% (1)
Billet Marker
4 pages
Learning and Development Knowledge Series - Traditional and Modern Approaches of Training & Development, HR News, ETHRWorld
No ratings yet
Learning and Development Knowledge Series - Traditional and Modern Approaches of Training & Development, HR News, ETHRWorld
21 pages
Importancia de Los Árboles
100% (1)
Importancia de Los Árboles
8 pages
Pubmed Microneedl Set
No ratings yet
Pubmed Microneedl Set
3 pages
Understanding Smart Cities - An Integrative Framework - Chourabi
No ratings yet
Understanding Smart Cities - An Integrative Framework - Chourabi
9 pages
KDAY
No ratings yet
KDAY
23 pages
Assignment 2 - S4 - G3
No ratings yet
Assignment 2 - S4 - G3
3 pages
Amrit Navy Form 2023
No ratings yet
Amrit Navy Form 2023
2 pages
Ethics Notes
100% (2)
Ethics Notes
47 pages
LOADING SEQ - Iron Ore 4
100% (1)
LOADING SEQ - Iron Ore 4
1 page
DMS Admission
No ratings yet
DMS Admission
3 pages
Annex D
No ratings yet
Annex D
1 page

Chapter 2 Introduction To Data Science

Uploaded by

Chapter 2 Introduction To Data Science

Uploaded by

Course Title: Introduction to Emerging Technologies

Credit Hour: 3 hrs.

Introduction to Data Science

Figure: Characteristics of big data

• Variability: Big data can be highly dynamic and constantly changing.

You might also like