0% found this document useful (0 votes)
24 views64 pages

BDA Module1

The document is a presentation on the topic of Big Data and Analytics. It includes: 1. An outline of the course covering vision/mission, outcomes, syllabus, and case studies. 2. Details on the course outcomes and syllabus including 6 modules, credits, and 5 textbooks. 3. Slides from Module 1 on introducing big data analytics, including the need for big data, characteristics, types, and techniques for handling large volumes of data.

Uploaded by

Nidhi Srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views64 pages

BDA Module1

The document is a presentation on the topic of Big Data and Analytics. It includes: 1. An outline of the course covering vision/mission, outcomes, syllabus, and case studies. 2. Details on the course outcomes and syllabus including 6 modules, credits, and 5 textbooks. 3. Slides from Module 1 on introducing big data analytics, including the need for big data, characteristics, types, and techniques for handling large volumes of data.

Uploaded by

Nidhi Srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

B . N . M .

Institute of
Technology
Department of Computer Science & Engineering
Strive for Excellence

Semester:7th Semester
Course Name: Big Data and Analytics
Course Code:18CS72
Module No:1
Presenter: A. K. Sreeja
Strive for Excellence
Slide #1

Course Outline

 Stating Vision, Mission, Course Outcomes


 Course Syllabus & Text Books

1.Big Data
2.Scalability & Parallel Processing
3.Designing Data Architecture
4.Data Sources, Quality, Pre-Processing and Storing
5.Data Storage & Analysis
6.Big Data Analytics Applications & Case Studies

Department of Computer Science & Engineering


Strive for Excellence

Slide #2

Course Outcomes:
CO1: Understand fundamentals of Big Data analytics
.
CO2: Investigate Hadoop framework and Hadoop
Distributed File system.
CO3: Illustrate the concepts of NoSQL using
MongoDB and Cassandra for Big Data.
CO4: Demonstrate the MapReduce
programming model to process the big data along with
Hadoop tools
CO5: Use Machine Learning algorithms for real world
big data.
CO6:
DepartmentAnalyze webEngineering
of Computer Science contents and Social Networks to
Strive for Excellence
Slide #3

Course Syllabus
CREDITS – 04

Department of Computer Science & Engineering


Strive for Excellence
Slide #4

Course Syllabus

Department of Computer Science & Engineering


Strive for Excellence
Slide #5

Course Syllabus
Text Books:
1. Raj Kamal and Preeti Saxena, “Big Data Analytics Introduction to
Hadoop, Spark, and Machine-Learning”, McGraw Hill Education,
2018 ISBN: 9789353164966, 9353164966
2. Douglas Eadline, "Hadoop 2 Quick-Start Guide: Learn the
Essentials of Big Data Computing in the Apache Hadoop 2
Ecosystem", 1 stEdition, Pearson Education, 2016. ISBN13: 978-
9332570351
Reference Books:
3. 1. Tom White, “Hadoop: The Definitive Guide”, 4 th Edition,
O‟Reilly Media, 2015.ISBN-13: 978- 9352130672
4. Boris Lublinsky, Kevin T Smith, Alexey Yakubovich,
"Professional Hadoop Solutions", 1 stEdition, Wrox Press,
2014ISBN-13: 978-8126551071
5. Eric Sammer, "Hadoop Operations: A Guide for Developers and
Administrators",1 stEdition, O'Reilly Media, 2012.ISBN-13: 978-
9350239261
6. Arshdeep Bahga, Vijay Madisetti, "Big Data Analytics: A Hands-
On Approach", 1st Edition, VPT Publications, 2018. ISBN-13:
978-0996025577
Department of Computer Science & Engineering
Strive for Excellence
Slide #6

Module 1

Introduction to Big Data


Analytics

Department of Computer Science & Engineering


Strive for Excellence
Slide #7

Need of Big Data

Department of Computer Science & Engineering


Strive for Excellence
Slide #8

1.Big Data
• Definition of Data
• Definition of Web data
• Classification of Data- Structured, Semi-
structured and Unstructured
• Definition of Big Data

 Big Data is high-volume, high-velocity and/or


high-variety information asset that requires
new forms of processing for enhanced
decision making, insight discovery and
process optimization.
 A collection of data sets so large or complex
that traditional data processing applications
are inadequate.

Department of Computer Science & Engineering


Strive for Excellence
Slide #9

Big Data
Big Data Characteristics

Characteristics of Big Data, called 3Vs(and 4Vs also


used) are:
 Volume
 Velocity
 Variety
 Veracity

Department of Computer Science & Engineering


Strive for Excellence
Slide #10

Big Data
Big Data Types

1. Social networks and web data


2. Transactions data and Business
Processes(BPs) data
3. Customer master data
4. Machine-generated data
5. Human-generated data

Department of Computer Science & Engineering


Strive for Excellence
Slide #11

Big Data
Big Data Classification

Department of Computer Science & Engineering


Strive for Excellence
Slide #12

Big Data
Big Data Handling Techniques

• Huge data volumes storage, data


distribution, high-speed networks and high-
performance computing.
• Open source tools which are scalable and
provide virtualized environment.
• Data management using NOSQL, document
database, graph database and other forms of
database as per the needs of the application.
• Data mining and analytics, data reporting,
machine-learning Big Data tools.

Department of Computer Science & Engineering


Strive for Excellence
Slide #13

2. Scalability and Parallel Processing


Analytics Scalability to Big Data
• Scalability enables increase or decrease in the capacity of data storage,
processing& analytics.
• Scalability is the capability of a system to handle the workload as per the
magnitude of the work.
• Vertical scalability means scaling up the given system’s resources and
increasing the system’s analytics, reporting and visualization capabilities.
• Scaling up means designing the algorithm according to the architecture that
uses resources efficiently.
• Horizontal scalability means increasing the number of systems working in
coherence and scaling out the workload.
• Scaling out means using more resources and distributing the processing and
storage tasks in parallel.
Department of Computer Science & Engineering
Strive for Excellence
Slide #14

Scalability and Parallel Processing


Massively Parallel Processing Platforms
Parallelization of tasks can be done at several levels:
• distributing separate tasks onto separate threads on
the same CPU,
• distributing separate tasks onto separate CPUs on
the same computer and
• distributing separate tasks onto separate
computers.
 Distributed Computing Model
A distributed computing model uses cloud, grid or
clusters, which process and analyze big and large
datasets on distributed computing nodes connected by
high-speed networks.
Department of Computer Science & Engineering
Strive for Excellence
Slide #15

Scalability and Parallel Processing


Cloud Computing
• “Cloud computing is a type of Internet-based
computing that provides shared processing
resources and data to the computers and other
devices on demand.”
• One of the best approach for data processing is to
perform parallel and distributed computing in a
cloud-computing environment.
• Cloud resources can be Amazon Web Service
(AWS) Elastic Compute Cloud (EC2), Microsoft
Azure or Apache CloudStack.

Department of Computer Science & Engineering


Strive for Excellence
Slide #16

Scalability and Parallel Processing


Cloud Computing
Features of Cloud Computing
• on-demand service
• resource pooling,
• scalability,
• broad network access.
• Cloud services can be accessed from anywhere and
at any time through the Internet.
Cloud Services
There are three types of Cloud Services
• Infrastructure as a Service (IaaS):
• Platform as a Service (PaaS):
• Software as a Service (SaaS):
Department of Computer Science & Engineering
Strive for Excellence
Slide #17

Scalability and Parallel Processing


Cloud Computing
• Infrastructure as a Service (IaaS):
 Providing access to resources, such as hard disks,
network connections, databases storage, data
center and virtual server spaces is Infrastructure as
a Service (IaaS).
 Some examples are Tata Communications,
Amazon data centers and virtual servers.

Department of Computer Science & Engineering


Strive for Excellence
Slide #18

Scalability and Parallel Processing


Cloud Computing
• Platform as a Service (PaaS):
 It implies providing the runtime environment to
allow developers to build applications and
services, which means cloud Platform as a Service.
 Examples are Hadoop Cloud Service, Oracle Big
Data Cloud Services

Department of Computer Science & Engineering


Strive for Excellence
Slide #19

Scalability and Parallel Processing


Cloud Computing
• Software as a Service (SaaS):
 Providing software applications as a service to
end- users is known as Software as a Service.
 Software applications are hosted by a service
provider and made available to customers over the
Internet.
 Some examples are SQL Google SQL, IBM
BigSQL and Oracle Big Data SQL.

Department of Computer Science & Engineering


Strive for Excellence
Slide #20

Scalability and Parallel Processing


Grid and Cluster Computing

• Grid Computing:
 Grid Computing refers to distributed computing, in which a group of
computers from several locations are connected with each other to achieve a
common task.
 A group of computers that might spread over remotely comprise a grid.
 Grid computing, similar to cloud computing, is scalable.
 Cloud computing depends on sharing of resources (for example, networks,
servers, storage, applications and services) to attain coordination and
coherence among resources similar to grid computing.

Department of Computer Science & Engineering


Strive for Excellence
Slide #21

Scalability and Parallel Processing


Grid and Cluster Computing
• Cluster Computing:
 A cluster is a group of computers connected by a network. The group works
together to accomplish the same task. Clusters are used mainly for load
balancing.
 They shift processes between nodes to keep an even load on the group of
connected computers.

Department of Computer Science & Engineering


Strive for Excellence
Slide #22

Scalability and Parallel Processing


Volunteer Computing

• Volunteer computing is a distributed computing


paradigm which uses computing resources of
the volunteers.
• Volunteers are organizations or members who
own personal computers.
• Projects examples are science related projects
executed by universities or academia in
general.

Department of Computer Science & Engineering


Strive for Excellence
Slide #23

3. Designing Data Architecture


Data Architecture Design

Big Data architecture is the logical and/or


physical layout/structure of how Big Data will
be stored, accessed and managed within a Big
Data or IT environment.
• Ingestion is the process of obtaining and
importing data for immediate use or
transfer. Ingestion may be in batches or in
real time using preprocessing or semantics.

Department of Computer Science & Engineering


Strive for Excellence
Slide #24

Designing Data Architecture


Data Architecture Design

Department of Computer Science & Engineering


Strive for Excellence
Slide #25

Designing Data Architecture


Managing Data for Analysis

Data managing means enabling, controlling, protecting, delivering and enhancing


the value of data and information asset.
Data management function includes:
1.Data assets creation, maintenance and protection.
2.Data governance, which includes establishing the processes for ensuring the
availability, usability, integrity, security and high-quality of data.
3.Data architecture creation, modelling and analysis.
4. Database maintenance, administration and management system. For eg,
RDBMS, NoSQL.
5. Managing data security, data access control, deletion, privacy and security.

Department of Computer Science & Engineering


Strive for Excellence
Slide #26

Designing Data Architecture


Managing Data for Analysis

6. Managing the data quality.


7. Data collection using ETL process.
8. Managing documents, records and contents
9.Creation of reference and master data, and data control and supervision.
10.Data and application integration.
11. Integrated data management, fast access and analysis, automation and
simplification of operation on the data.
12. Data warehouse management.
13. Maintenance of business intelligence.
14. Data mining and analytics algorithms.

Department of Computer Science & Engineering


Strive for Excellence
Slide #27

4. Data Sources, Quality, Pre-Processing and Storing


Data Sources
• Sources can be external, internal.
• Structured Data Sources-
• A data source name implies a defined name, which a process uses to identify
the source.
 For eg, a name which identifies the stored data in student grades during
processing. The data source name could be StudentName_Data_Grades.
 A data dictionary enables references for accesses to data.
 The name of the dictionary can be UniversityStudents_DataPlusGrades.
• Unstructured Data Sources
• Data Sources- Sensors, Signals and GPS

Department of Computer Science & Engineering


Strive for Excellence
Slide #28

Data Sources, Quality, Pre-Processing and Storing


Data Quality
High quality means data, which enables all the
required operations, analysis, decisions, planning
and knowledge discovery correctly.
Five R's as follows:
• Relevancy, recency, range, robustness,
reliability.
Data Integrity
• Data integrity refers to the maintenance of
consistency and accuracy in data over its usable
life.
• Software, which store, process, or retrieve the
data, should maintain the integrity of data.
Department of Computer Science & Engineering
Strive for Excellence
Slide #29

Data Sources, Quality, Pre-Processing and Storing


Data Quality

Factors Affecting Data Quality


• Data Noise
• Outlier
• Missing Value
• Duplicate value
Data Noise
• One of the factors effecting data quality is
noise.
• Noise in data refers to data giving additional
meaningless information besides true
(actual/required) information. Eg. WRMP
Department of Computer Science & Engineering
Strive for Excellence
Slide #30

Data Sources, Quality, Pre-Processing and Storing


Data Quality
Outlier
• An outlier in data refers to data, which appears to not belong to the dataset.
For example,
data that is outside an expected range.
• Actual outliers need to be removed from the dataset, else the result will be
effected by a small or large amount.
Missing Value, duplicate Value
• Another factor effecting data quality is missing values.
• Missing value implies data not appearing in the data set.
• Another factor effecting data quality is duplicate values.
• Duplicate value implies the same data appearing two or more times in a
dataset. Eg, ACVM
Department of Computer Science & Engineering
Strive for Excellence
Slide #31

Data Sources, Quality, Pre-Processing and Storing


Data Pre-processing
Data pre-processing is an important step at the
ingestion layer.
Data Cleaning
• Data cleaning refers to the process of
removing or correcting incomplete,
incorrect, inaccurate or irrelevant parts of
the data after detecting them.
• Data cleaning is done before mining of data.
• Data cleaning tools help in refining and
structuring data into usable data. Examples
of such tools are OpenRefine and
DataCleaner.
Department of Computer Science & Engineering
Strive for Excellence
Slide #32

Data Sources, Quality, Pre-Processing and Storing


Data Pre-processing

Data Enrichment
"Data enrichment refers to operations or processes which refine, enhance or
improve the raw data.“
Data Editing
Data editing refers to the process of reviewing and adjusting the acquired
datasets.
Data Reduction
• Data reduction enables the transformation of acquired information into an
ordered, correct and simplified form.
Data wrangling refers to the process of transforming and mapping the data.

Department of Computer Science & Engineering


Strive for Excellence
Slide #33

Data Sources, Quality, Pre-Processing and Storing


Data Pre-processing
Data Format used during Pre-Processing
• Comma-separated values CSV
• Java Script Object Notation (JSON)
• Tag Length Value (TLV)
• Key-value pairs
• Hash-key-value pair
 CSV Format
• An example is a table or Microsoft Excel file which needs conversion to CSV
format.
• A student _record.xlsx converts to student_record.csv file.
• Comma-separated values (CSV) file refers to a plain text file which stores the table
data of number and text.
Department of Computer Science & Engineering
Strive for Excellence
Slide #34

Data Sources, Quality, Pre-Processing and Storing


Data Pre-processing
Data Format used during Pre-Processing
 CSV Format

Department of Computer Science & Engineering


Strive for Excellence
Slide #35

Data Sources, Quality, Pre-Processing and Storing


Data Store Export to Cloud

Department of Computer Science & Engineering


Strive for Excellence
Slide #36

Data Sources, Quality, Pre-Processing and Storing


Cloud Services

Department of Computer Science & Engineering


Strive for Excellence
Slide #37

Data Sources, Quality, Pre-Processing and Storing

• Google cloud platform provides a cloud service called


BigQuery .
• The data exports from a table or partition schema,
JSON, CSV or AVRO files from data sources after the
pre-processing.
• Cloud service BigQuery consists of
bigquery.tables.create; bigquery.dataEditor;
bigquery.dataOwner; bigquery.admin;
bigquery.tables.updateData and other service
functions.
• Analytics uses Google Analytics 360. BigQuery cloud
exports data to a Google cloud or cloud backup only.
Department of Computer Science & Engineering
Strive for Excellence
Slide #38

Data Sources, Quality, Pre-Processing and Storing

Department of Computer Science & Engineering


Strive for Excellence
Slide #39

5. Data Storage and Analysis


Data Storage and Management: Traditional Systems

 Data Store with Structured or Semi-Structured


Data
• Traditional systems use structured or semi-
structured data.
• The sources of structured data store are:
Traditional relational database-management
system (RDBMS) data, such as MySQL DB2,
enterprise server and data warehouse.
• Examples of semi-structured data are
-XML and JSON semi-structured documents, CSV
file.

Department of Computer Science & Engineering


Strive for Excellence
Slide #40

5. Data Storage and Analysis


Data Storage and Management: Traditional Systems
• SQL
An RDBMS uses SQL (Structured Query Language). SQL is a language for viewing or
changing (update, insert or append or delete) databases.
SQL does the following:
1.Create schema
2.Create catalog
3.DDL
4.DML
5.DCL
Distributed Database Management System: A distributed DBMS (DDBMS) is a
collection of logically interrelated databases at multiple system over a computer
network.
Department of Computer Science & Engineering
Strive for Excellence
Slide #41

5. Data Storage and Analysis


Data Storage and Management: Traditional Systems
• In-Memory Column Formats Data
 A columnar format in-memory allows faster data retrieval when only a few columns in a
table need to be selected during query processing or aggregation.
 Online Analytical Processing (OLAP) in real-time transaction processing is fast when
using in-memory column format tables.
 The CPU accesses all columns in a single instance of access to the memory in columnar
format in memory data-storage.
• In-Memory Row Format Databases
 A row format in-memory allows much faster data processing during OLTP
 Each row record has corresponding values in multiple columns and the on-line values
store at the consecutive memory addresses in row format.

Department of Computer Science & Engineering


Strive for Excellence
Slide #42

5. Data Storage and Analysis


Data Storage and Management: Traditional Systems

Enterprise Data-Store Server and Data


Warehouse
 Enterprise data server use data from several
distributed sources which store data using
various technologies.
 All data merge using an integration tool.
 Integration enables collective viewing of
the datasets at the data warehouse.

Department of Computer Science & Engineering


Strive for Excellence
Slide #43

5. Data Storage and Analysis


Data Storage and Management: Traditional Systems

Department of Computer Science & Engineering


Strive for Excellence
Slide #44

5. Data Storage and Analysis


Big Data Storage

Big Data No SQL or Not Only SQL


• NoSQL databases are considered as semi-
structured data.
• Big Data Store uses NoSQL.
• The stores do not integrate with applications
using SQL.
• NoSQL is also used in cloud data store.
• May not use fixed table schema.

Department of Computer Science & Engineering


Strive for Excellence
Slide #45

5. Data Storage and Analysis


Big Data Storage
Coexistence of Big Data, NoSQL and Traditional Data Stores

Department of Computer Science & Engineering


Strive for Excellence
Slide #46

5. Data Storage and Analysis


Big Data Platform
• A Big Data platform supports large datasets and volume of data.
• Managing Big Data requires large resources of MPPs, cloud, parallel processing and
specialized tools.
• Big data sources: Data storages, data warehouse, Oracle Big Data, MongoDB NoSQL,
Cassandra NoSQL.
Hadoop
• Big Data platform consists of Big Data storage(s), server(s) and data management and
business intelligence software. Storage can deploy Hadoop Distributed File System
(HDFS), NoSQL data stores, such as HBase, MongoDB, Cassandra. HDFS system is an
open source storage system.
• HDFS is a scaling, self-managing and self-healing file system.

Department of Computer Science & Engineering


Strive for Excellence
Slide #47

5. Data Storage and Analysis


Big Data Platform

Department of Computer Science & Engineering


Strive for Excellence
Slide #48

5. Data Storage and Analysis


Big Data Platform
Big Data Stack
• A stack consists of a set of software components
and data store units.
• Applications, machine learning algorithms,
analytics and visualization tools use Big Data
Stack (BDS) at a cloud service, such as Amazon
EC2, Azure or private cloud.
• The stack uses cluster of high performance
machines.

Department of Computer Science & Engineering


Strive for Excellence
Slide #49

5. Data Storage and Analysis


Big Data Platform
Big Data Stack

Department of Computer Science & Engineering


Strive for Excellence
Slide #50

5. Data Storage and Analysis


Big Data Analytics
Big Data Analytics
• Data Analytics can be formally defined as
the statistical and mathematical data
analysis that clusters, segments, ranks and
predicts future possibilities.
• An important feature of data analytics is
its predictive, forecasting and prescriptive
capability.

Department of Computer Science & Engineering


Strive for Excellence
Slide #51

5. Data Storage and Analysis


Big Data Analytics
Phases in Analytics
1. Descriptive analytics enables deriving the additional
value from visualizations and reports.
2. Predictive analytics is advanced analytics which
enables extraction of new facts and knowledge, and
then predicts/forecasts.
3. Prescriptive analytics enable derivation of the
additional value and undertake better decisions for new
option(s) to maximize the profits.
4. Cognitive analytics enables derivation of the
additional value and undertake better decision. It refers
to analysis of sentiments, emotions, gestures, facial
expressions similar to ones the human do.
Department of Computer Science & Engineering
Strive for Excellence
Slide #52

5. Data Storage and Analysis


Big Data Analytics
Phases in Analytics

Department of Computer Science & Engineering


Strive for Excellence
Slide #53

5. Data Storage and Analysis


Big Data Analytics
Berkeley Data Analytics Stack(BDAS)
Berkeley Data Analytics Stack (BDAS) consists of data processing, data
management and resource management layers.

Department of Computer Science & Engineering


Strive for Excellence
Slide #54

6. Big Data Applications and Case Studies


Big Data in Marketing and Sales
• Customer Value (CV) depends on three factors - quality, service and price.
• A definition of marketing is the creation, communication and delivery of
value to customers.
• Customer (desired) value means what a customer desires from a product.
• Customer (perceived) value means what the customer believes to have
received from a product after purchase of the product.
• Customer value analytics (CVA) means analyzing what a customer really
needs.
• CVA makes it possible for leading marketers, such as Amazon to deliver
the consistent customer experiences.

Department of Computer Science & Engineering


Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data in Marketing and Sales
 Big Data Analytics in Detection of Marketing Frauds
Big Data analytics enable fraud detection. Big Data usages has the following
features-for enabling detection and prevention of frauds:
• Fusing of existing data at an enterprise data warehouse with data from
sources such as social media, websites, blogs, e-mails, and thus enriching
existing data
• Using multiple sources of data and connecting with many applications
• Providing greater insights using querying of the multiple source data
• Analyzing data which enable structured reports and visualization
• Providing high volume data mining, new innovative applications and thus
leading to new business intelligence and knowledge discovery.
Department of Computer Science & Engineering
Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data in Marketing and Sales

 Big Data Risks

• Large volume and velocity of Big Data


provide greater insights but also associate
risks with the data used.
• Data included may be erroneous, less
accurate or far from reality.
• Analytics introduces new errors due to such
data.
• Five data risks, described by Bernard Marr
are data security, data privacy breach, costs
affecting profits, bad analytics and bad data.

Department of Computer Science & Engineering


Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data in Marketing and Sales
 Big Data Credit Risk Management
Financial institutions, such as banks, extend loans to industrial and household sectors.
These institutions in many countries face credit risks, mainly risks of (i) loan defaults, (ii)
timely return of interests and principal amount. Financing institutions are keen to get insights
into the
following:
1. Identifying high credit rating business groups and individuals,
2. Identifying risk involved before lending money
3. Identifying industrial sectors with greater risks
4. Identifying types of employees (such as daily wage earners in construction sites) and
businesses (such as oil exploration) with greater risks
5. Anticipating liquidity issues (availability of money for further issue of credit and
rescheduling credit installments) over the years.
Department of Computer Science & Engineering
Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data in Marketing and Sales

 Big Data and Algorithmic Trading


• Algorithmic trading is a method of executing
large order using automated pre-programmed
trading instructions accounting for variables
such as time, price and volume.
• Complex mathematics computations enable
algorithmic trading and business investment
decisions to buy and sell.

Department of Computer Science & Engineering


Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data and Healthcare
Big Data analytics in health care use the following data sources: clinical records, (ii)
pharmacy records, (3) electronic medical records (4) diagnosis logs and notes and (v)
additional data, such as deviations from person usual activities, medical leaves from job,
social interactions.
Healthcare analytics using Big Data can facilitate the following:
1. Provisioning of value-based and customer-centric healthcare,
2. Utilizing the 'Internet of Things' for health care
3. Preventing fraud, waste, abuse in the healthcare industry and reduce healthcare
costs (Examples of frauds are excessive or duplicate claims for clinical and hospital
treatments. Example of waste is unnecessary tests. Abuse means unnecessary use
of medicines, such as tonics and testing facilities.)
4. Improving outcomes
5. Monitoring patients in real time.
Department of Computer Science & Engineering
Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data in Medicine

Big Data analytics deploys large volume of data to identify and derive intelligence using
predictive models about individuals.
Following are some findings: building the health profiles of individual patients and
predicting models for diagnosing better and offer better treatment,
• Aggregating large volume and variety of information around from multiple sources the
DNAs, proteins, and metabolites to cells, tissues, organs, organisms, and ecosystems, that
can enhance the understanding of biology of diseases. Big data creates patterns and models
by data mining and help in better understanding and research,
• Deploying wearable devices data, the devices data records during active as well as
inactive periods, provide better understanding of patient health, and better risk profiling
the user for certain diseases.

Department of Computer Science & Engineering


Strive for Excellence
Slide #26

Big Data Applications and Case Studies


Big Data in Advertising
• The impact of Big Data is tremendous on the digital advertising industry. The digital
advertising industry sends advertisements using SMS, e-mails, WhatsApp, Linkedln,
Facebook, Twitter and other mediums.
• The data helps digital advertisers to discover new relationships, lesser competitive regions
and areas.
• Success from advertisements depend on collection, analyzing and mining. The new insights
enable the personalization and targeting the online, social media and mobile for advertisements
called hyper-localized advertising.
• Advertising on digital medium needs optimization. Too much usage can also effect
negatively. Phone calls, SMSs, e-mail-based advertisements can be nuisance if sent without
appropriate researching on the potential targets. The analytics help in this direction.
• The usage of Big Data after appropriate filtering and elimination is crucial enabler of
BigData Analytics with appropriate data, data forms and data handling in the right manner.
Department of Computer Science & Engineering
Strive for Excellence
Slide #74

Module 2

Chapter 2 : Introduction to Hadoop(T1)


Chapter 3 :Hadoop Distributed File
System Basics(T2)
Chapter 7:Essential Hadoop Tools(T2)

Department of Computer Science & Engineering

You might also like