0% found this document useful (0 votes)

24 views64 pages

BDA Module1

The document is a presentation on the topic of Big Data and Analytics. It includes: 1. An outline of the course covering vision/mission, outcomes, syllabus, and case studies. 2. Details on the course outcomes and syllabus including 6 modules, credits, and 5 textbooks. 3. Slides from Module 1 on introducing big data analytics, including the need for big data, characteristics, types, and techniques for handling large volumes of data.

Uploaded by

Nidhi Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views64 pages

BDA Module1

Uploaded by

Nidhi Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

B . N . M .

Institute of
Technology
Department of Computer Science & Engineering
Strive for Excellence

Semester:7th Semester
Course Name: Big Data and Analytics
Course Code:18CS72
Module No:1
Presenter: A. K. Sreeja
Strive for Excellence
Slide #1

Course Outline

 Stating Vision, Mission, Course Outcomes

 Course Syllabus & Text Books

1.Big Data
2.Scalability & Parallel Processing
3.Designing Data Architecture
4.Data Sources, Quality, Pre-Processing and Storing
5.Data Storage & Analysis
6.Big Data Analytics Applications & Case Studies

Department of Computer Science & Engineering

Strive for Excellence

Slide #2

Course Outcomes:
CO1: Understand fundamentals of Big Data analytics
.
CO2: Investigate Hadoop framework and Hadoop
Distributed File system.
CO3: Illustrate the concepts of NoSQL using
MongoDB and Cassandra for Big Data.
CO4: Demonstrate the MapReduce
programming model to process the big data along with
Hadoop tools
CO5: Use Machine Learning algorithms for real world
big data.
CO6:
DepartmentAnalyze webEngineering
of Computer Science contents and Social Networks to
Strive for Excellence
Slide #3

Course Syllabus
CREDITS – 04

Department of Computer Science & Engineering

Strive for Excellence
Slide #4

Course Syllabus

Department of Computer Science & Engineering

Strive for Excellence
Slide #5

Course Syllabus
Text Books:
1. Raj Kamal and Preeti Saxena, “Big Data Analytics Introduction to
Hadoop, Spark, and Machine-Learning”, McGraw Hill Education,
2018 ISBN: 9789353164966, 9353164966
2. Douglas Eadline, "Hadoop 2 Quick-Start Guide: Learn the
Essentials of Big Data Computing in the Apache Hadoop 2
Ecosystem", 1 stEdition, Pearson Education, 2016. ISBN13: 978-
9332570351
Reference Books:
3. 1. Tom White, “Hadoop: The Definitive Guide”, 4 th Edition,
O‟Reilly Media, 2015.ISBN-13: 978- 9352130672
4. Boris Lublinsky, Kevin T Smith, Alexey Yakubovich,
"Professional Hadoop Solutions", 1 stEdition, Wrox Press,
2014ISBN-13: 978-8126551071
5. Eric Sammer, "Hadoop Operations: A Guide for Developers and
Administrators",1 stEdition, O'Reilly Media, 2012.ISBN-13: 978-
9350239261
6. Arshdeep Bahga, Vijay Madisetti, "Big Data Analytics: A Hands-
On Approach", 1st Edition, VPT Publications, 2018. ISBN-13:
978-0996025577
Department of Computer Science & Engineering
Strive for Excellence
Slide #6

Module 1

Introduction to Big Data

Analytics

Department of Computer Science & Engineering

Strive for Excellence
Slide #7

Need of Big Data

Department of Computer Science & Engineering

Strive for Excellence
Slide #8

1.Big Data
• Definition of Data
• Definition of Web data
• Classification of Data- Structured, Semi-
structured and Unstructured
• Definition of Big Data

 Big Data is high-volume, high-velocity and/or

high-variety information asset that requires
new forms of processing for enhanced
decision making, insight discovery and
process optimization.
 A collection of data sets so large or complex
that traditional data processing applications
are inadequate.

Department of Computer Science & Engineering

Strive for Excellence
Slide #9

Big Data
Big Data Characteristics

Characteristics of Big Data, called 3Vs(and 4Vs also

used) are:
 Volume
 Velocity
 Variety
 Veracity

Department of Computer Science & Engineering

Strive for Excellence
Slide #10

Big Data
Big Data Types

1. Social networks and web data

2. Transactions data and Business
Processes(BPs) data
3. Customer master data
4. Machine-generated data
5. Human-generated data

Department of Computer Science & Engineering

Strive for Excellence
Slide #11

Big Data
Big Data Classification

Department of Computer Science & Engineering

Strive for Excellence
Slide #12

Big Data
Big Data Handling Techniques

• Huge data volumes storage, data

distribution, high-speed networks and high-
performance computing.
• Open source tools which are scalable and
provide virtualized environment.
• Data management using NOSQL, document
database, graph database and other forms of
database as per the needs of the application.
• Data mining and analytics, data reporting,
machine-learning Big Data tools.

Department of Computer Science & Engineering

Strive for Excellence
Slide #13

2. Scalability and Parallel Processing

Analytics Scalability to Big Data
• Scalability enables increase or decrease in the capacity of data storage,
processing& analytics.
• Scalability is the capability of a system to handle the workload as per the
magnitude of the work.
• Vertical scalability means scaling up the given system’s resources and
increasing the system’s analytics, reporting and visualization capabilities.
• Scaling up means designing the algorithm according to the architecture that
uses resources efficiently.
• Horizontal scalability means increasing the number of systems working in
coherence and scaling out the workload.
• Scaling out means using more resources and distributing the processing and
storage tasks in parallel.
Department of Computer Science & Engineering
Strive for Excellence
Slide #14

Scalability and Parallel Processing

Massively Parallel Processing Platforms
Parallelization of tasks can be done at several levels:
• distributing separate tasks onto separate threads on
the same CPU,
• distributing separate tasks onto separate CPUs on
the same computer and
• distributing separate tasks onto separate
computers.
 Distributed Computing Model
A distributed computing model uses cloud, grid or
clusters, which process and analyze big and large
datasets on distributed computing nodes connected by
high-speed networks.
Department of Computer Science & Engineering
Strive for Excellence
Slide #15

Scalability and Parallel Processing

Cloud Computing
• “Cloud computing is a type of Internet-based
computing that provides shared processing
resources and data to the computers and other
devices on demand.”
• One of the best approach for data processing is to
perform parallel and distributed computing in a
cloud-computing environment.
• Cloud resources can be Amazon Web Service
(AWS) Elastic Compute Cloud (EC2), Microsoft
Azure or Apache CloudStack.

Department of Computer Science & Engineering

Strive for Excellence
Slide #16

Scalability and Parallel Processing

Cloud Computing
Features of Cloud Computing
• on-demand service
• resource pooling,
• scalability,
• broad network access.
• Cloud services can be accessed from anywhere and
at any time through the Internet.
Cloud Services
There are three types of Cloud Services
• Infrastructure as a Service (IaaS):
• Platform as a Service (PaaS):
• Software as a Service (SaaS):
Department of Computer Science & Engineering
Strive for Excellence
Slide #17

Scalability and Parallel Processing

Cloud Computing
• Infrastructure as a Service (IaaS):
 Providing access to resources, such as hard disks,
network connections, databases storage, data
center and virtual server spaces is Infrastructure as
a Service (IaaS).
 Some examples are Tata Communications,
Amazon data centers and virtual servers.

Department of Computer Science & Engineering

Strive for Excellence
Slide #18

Scalability and Parallel Processing

Cloud Computing
• Platform as a Service (PaaS):
 It implies providing the runtime environment to
allow developers to build applications and
services, which means cloud Platform as a Service.
 Examples are Hadoop Cloud Service, Oracle Big
Data Cloud Services

Department of Computer Science & Engineering

Strive for Excellence
Slide #19

Scalability and Parallel Processing

Cloud Computing
• Software as a Service (SaaS):
 Providing software applications as a service to
end- users is known as Software as a Service.
 Software applications are hosted by a service
provider and made available to customers over the
Internet.
 Some examples are SQL Google SQL, IBM
BigSQL and Oracle Big Data SQL.

Department of Computer Science & Engineering

Strive for Excellence
Slide #20

Scalability and Parallel Processing

Grid and Cluster Computing

• Grid Computing:
 Grid Computing refers to distributed computing, in which a group of
computers from several locations are connected with each other to achieve a
common task.
 A group of computers that might spread over remotely comprise a grid.
 Grid computing, similar to cloud computing, is scalable.
 Cloud computing depends on sharing of resources (for example, networks,
servers, storage, applications and services) to attain coordination and
coherence among resources similar to grid computing.

Department of Computer Science & Engineering

Strive for Excellence
Slide #21

Scalability and Parallel Processing

Grid and Cluster Computing
• Cluster Computing:
 A cluster is a group of computers connected by a network. The group works
together to accomplish the same task. Clusters are used mainly for load
balancing.
 They shift processes between nodes to keep an even load on the group of
connected computers.

Department of Computer Science & Engineering

Strive for Excellence
Slide #22

Scalability and Parallel Processing

Volunteer Computing

• Volunteer computing is a distributed computing

paradigm which uses computing resources of
the volunteers.
• Volunteers are organizations or members who
own personal computers.
• Projects examples are science related projects
executed by universities or academia in
general.

Department of Computer Science & Engineering

Strive for Excellence
Slide #23

3. Designing Data Architecture

Data Architecture Design

Big Data architecture is the logical and/or

physical layout/structure of how Big Data will
be stored, accessed and managed within a Big
Data or IT environment.
• Ingestion is the process of obtaining and
importing data for immediate use or
transfer. Ingestion may be in batches or in
real time using preprocessing or semantics.

Department of Computer Science & Engineering

Strive for Excellence
Slide #24

Designing Data Architecture

Data Architecture Design

Department of Computer Science & Engineering

Strive for Excellence
Slide #25

Designing Data Architecture

Managing Data for Analysis

Data managing means enabling, controlling, protecting, delivering and enhancing

the value of data and information asset.
Data management function includes:
1.Data assets creation, maintenance and protection.
2.Data governance, which includes establishing the processes for ensuring the
availability, usability, integrity, security and high-quality of data.
3.Data architecture creation, modelling and analysis.
4. Database maintenance, administration and management system. For eg,
RDBMS, NoSQL.
5. Managing data security, data access control, deletion, privacy and security.

Department of Computer Science & Engineering

Strive for Excellence
Slide #26

Designing Data Architecture

Managing Data for Analysis

6. Managing the data quality.

7. Data collection using ETL process.
8. Managing documents, records and contents
9.Creation of reference and master data, and data control and supervision.
10.Data and application integration.
11. Integrated data management, fast access and analysis, automation and
simplification of operation on the data.
12. Data warehouse management.
13. Maintenance of business intelligence.
14. Data mining and analytics algorithms.

Department of Computer Science & Engineering

Strive for Excellence
Slide #27

4. Data Sources, Quality, Pre-Processing and Storing

Data Sources
• Sources can be external, internal.
• Structured Data Sources-
• A data source name implies a defined name, which a process uses to identify
the source.
 For eg, a name which identifies the stored data in student grades during
processing. The data source name could be StudentName_Data_Grades.
 A data dictionary enables references for accesses to data.
 The name of the dictionary can be UniversityStudents_DataPlusGrades.
• Unstructured Data Sources
• Data Sources- Sensors, Signals and GPS

Department of Computer Science & Engineering

Strive for Excellence
Slide #28

Data Sources, Quality, Pre-Processing and Storing

Data Quality
High quality means data, which enables all the
required operations, analysis, decisions, planning
and knowledge discovery correctly.
Five R's as follows:
• Relevancy, recency, range, robustness,
reliability.
Data Integrity
• Data integrity refers to the maintenance of
consistency and accuracy in data over its usable
life.
• Software, which store, process, or retrieve the
data, should maintain the integrity of data.
Department of Computer Science & Engineering
Strive for Excellence
Slide #29

Data Sources, Quality, Pre-Processing and Storing

Data Quality

Factors Affecting Data Quality

• Data Noise
• Outlier
• Missing Value
• Duplicate value
Data Noise
• One of the factors effecting data quality is
noise.
• Noise in data refers to data giving additional
meaningless information besides true
(actual/required) information. Eg. WRMP
Department of Computer Science & Engineering
Strive for Excellence
Slide #30

Data Sources, Quality, Pre-Processing and Storing

Data Quality
Outlier
• An outlier in data refers to data, which appears to not belong to the dataset.
For example,
data that is outside an expected range.
• Actual outliers need to be removed from the dataset, else the result will be
effected by a small or large amount.
Missing Value, duplicate Value
• Another factor effecting data quality is missing values.
• Missing value implies data not appearing in the data set.
• Another factor effecting data quality is duplicate values.
• Duplicate value implies the same data appearing two or more times in a
dataset. Eg, ACVM
Department of Computer Science & Engineering
Strive for Excellence
Slide #31

Data Sources, Quality, Pre-Processing and Storing

Data Pre-processing
Data pre-processing is an important step at the
ingestion layer.
Data Cleaning
• Data cleaning refers to the process of
removing or correcting incomplete,
incorrect, inaccurate or irrelevant parts of
the data after detecting them.
• Data cleaning is done before mining of data.
• Data cleaning tools help in refining and
structuring data into usable data. Examples
of such tools are OpenRefine and
DataCleaner.
Department of Computer Science & Engineering
Strive for Excellence
Slide #32

Data Sources, Quality, Pre-Processing and Storing

Data Pre-processing

Data Enrichment
"Data enrichment refers to operations or processes which refine, enhance or
improve the raw data.“
Data Editing
Data editing refers to the process of reviewing and adjusting the acquired
datasets.
Data Reduction
• Data reduction enables the transformation of acquired information into an
ordered, correct and simplified form.
Data wrangling refers to the process of transforming and mapping the data.

Department of Computer Science & Engineering

Strive for Excellence
Slide #33

Data Sources, Quality, Pre-Processing and Storing

Data Pre-processing
Data Format used during Pre-Processing
• Comma-separated values CSV
• Java Script Object Notation (JSON)
• Tag Length Value (TLV)
• Key-value pairs
• Hash-key-value pair
 CSV Format
• An example is a table or Microsoft Excel file which needs conversion to CSV
format.
• A student _record.xlsx converts to student_record.csv file.
• Comma-separated values (CSV) file refers to a plain text file which stores the table
data of number and text.
Department of Computer Science & Engineering
Strive for Excellence
Slide #34

Data Sources, Quality, Pre-Processing and Storing

Data Pre-processing
Data Format used during Pre-Processing
 CSV Format

Department of Computer Science & Engineering

Strive for Excellence
Slide #35

Data Sources, Quality, Pre-Processing and Storing

Data Store Export to Cloud

Department of Computer Science & Engineering

Strive for Excellence
Slide #36

Data Sources, Quality, Pre-Processing and Storing

Cloud Services

Department of Computer Science & Engineering

Strive for Excellence
Slide #37

Data Sources, Quality, Pre-Processing and Storing

• Google cloud platform provides a cloud service called

BigQuery .
• The data exports from a table or partition schema,
JSON, CSV or AVRO files from data sources after the
pre-processing.
• Cloud service BigQuery consists of
bigquery.tables.create; bigquery.dataEditor;
bigquery.dataOwner; bigquery.admin;
bigquery.tables.updateData and other service
functions.
• Analytics uses Google Analytics 360. BigQuery cloud
exports data to a Google cloud or cloud backup only.
Department of Computer Science & Engineering
Strive for Excellence
Slide #38

Data Sources, Quality, Pre-Processing and Storing

Department of Computer Science & Engineering

Strive for Excellence
Slide #39

5. Data Storage and Analysis

Data Storage and Management: Traditional Systems

 Data Store with Structured or Semi-Structured

Data
• Traditional systems use structured or semi-
structured data.
• The sources of structured data store are:
Traditional relational database-management
system (RDBMS) data, such as MySQL DB2,
enterprise server and data warehouse.
• Examples of semi-structured data are
-XML and JSON semi-structured documents, CSV
file.

Department of Computer Science & Engineering

Strive for Excellence
Slide #40

5. Data Storage and Analysis

Data Storage and Management: Traditional Systems
• SQL
An RDBMS uses SQL (Structured Query Language). SQL is a language for viewing or
changing (update, insert or append or delete) databases.
SQL does the following:
1.Create schema
2.Create catalog
3.DDL
4.DML
5.DCL
Distributed Database Management System: A distributed DBMS (DDBMS) is a
collection of logically interrelated databases at multiple system over a computer
network.
Department of Computer Science & Engineering
Strive for Excellence
Slide #41

5. Data Storage and Analysis

Data Storage and Management: Traditional Systems
• In-Memory Column Formats Data
 A columnar format in-memory allows faster data retrieval when only a few columns in a
table need to be selected during query processing or aggregation.
 Online Analytical Processing (OLAP) in real-time transaction processing is fast when
using in-memory column format tables.
 The CPU accesses all columns in a single instance of access to the memory in columnar
format in memory data-storage.
• In-Memory Row Format Databases
 A row format in-memory allows much faster data processing during OLTP
 Each row record has corresponding values in multiple columns and the on-line values
store at the consecutive memory addresses in row format.

Department of Computer Science & Engineering

Strive for Excellence
Slide #42

5. Data Storage and Analysis

Data Storage and Management: Traditional Systems

Enterprise Data-Store Server and Data

Warehouse
 Enterprise data server use data from several
distributed sources which store data using
various technologies.
 All data merge using an integration tool.
 Integration enables collective viewing of
the datasets at the data warehouse.

Department of Computer Science & Engineering

Strive for Excellence
Slide #43

5. Data Storage and Analysis

Data Storage and Management: Traditional Systems

Department of Computer Science & Engineering

Strive for Excellence
Slide #44

5. Data Storage and Analysis

Big Data Storage

Big Data No SQL or Not Only SQL

• NoSQL databases are considered as semi-
structured data.
• Big Data Store uses NoSQL.
• The stores do not integrate with applications
using SQL.
• NoSQL is also used in cloud data store.
• May not use fixed table schema.

Department of Computer Science & Engineering

Strive for Excellence
Slide #45

5. Data Storage and Analysis

Big Data Storage
Coexistence of Big Data, NoSQL and Traditional Data Stores

Department of Computer Science & Engineering

Strive for Excellence
Slide #46

5. Data Storage and Analysis

Big Data Platform
• A Big Data platform supports large datasets and volume of data.
• Managing Big Data requires large resources of MPPs, cloud, parallel processing and
specialized tools.
• Big data sources: Data storages, data warehouse, Oracle Big Data, MongoDB NoSQL,
Cassandra NoSQL.
Hadoop
• Big Data platform consists of Big Data storage(s), server(s) and data management and
business intelligence software. Storage can deploy Hadoop Distributed File System
(HDFS), NoSQL data stores, such as HBase, MongoDB, Cassandra. HDFS system is an
open source storage system.
• HDFS is a scaling, self-managing and self-healing file system.

Department of Computer Science & Engineering

Strive for Excellence
Slide #47

5. Data Storage and Analysis

Big Data Platform

Department of Computer Science & Engineering

Strive for Excellence
Slide #48

5. Data Storage and Analysis

Big Data Platform
Big Data Stack
• A stack consists of a set of software components
and data store units.
• Applications, machine learning algorithms,
analytics and visualization tools use Big Data
Stack (BDS) at a cloud service, such as Amazon
EC2, Azure or private cloud.
• The stack uses cluster of high performance
machines.

Department of Computer Science & Engineering

Strive for Excellence
Slide #49

5. Data Storage and Analysis

Big Data Platform
Big Data Stack

Department of Computer Science & Engineering

Strive for Excellence
Slide #50

5. Data Storage and Analysis

Big Data Analytics
Big Data Analytics
• Data Analytics can be formally defined as
the statistical and mathematical data
analysis that clusters, segments, ranks and
predicts future possibilities.
• An important feature of data analytics is
its predictive, forecasting and prescriptive
capability.

Department of Computer Science & Engineering

Strive for Excellence
Slide #51

5. Data Storage and Analysis

Big Data Analytics
Phases in Analytics
1. Descriptive analytics enables deriving the additional
value from visualizations and reports.
2. Predictive analytics is advanced analytics which
enables extraction of new facts and knowledge, and
then predicts/forecasts.
3. Prescriptive analytics enable derivation of the
additional value and undertake better decisions for new
option(s) to maximize the profits.
4. Cognitive analytics enables derivation of the
additional value and undertake better decision. It refers
to analysis of sentiments, emotions, gestures, facial
expressions similar to ones the human do.
Department of Computer Science & Engineering
Strive for Excellence
Slide #52

5. Data Storage and Analysis

Big Data Analytics
Phases in Analytics

Department of Computer Science & Engineering

Strive for Excellence
Slide #53

5. Data Storage and Analysis

Big Data Analytics
Berkeley Data Analytics Stack(BDAS)
Berkeley Data Analytics Stack (BDAS) consists of data processing, data
management and resource management layers.

Department of Computer Science & Engineering

Strive for Excellence
Slide #54

6. Big Data Applications and Case Studies

Big Data in Marketing and Sales
• Customer Value (CV) depends on three factors - quality, service and price.
• A definition of marketing is the creation, communication and delivery of
value to customers.
• Customer (desired) value means what a customer desires from a product.
• Customer (perceived) value means what the customer believes to have
received from a product after purchase of the product.
• Customer value analytics (CVA) means analyzing what a customer really
needs.
• CVA makes it possible for leading marketers, such as Amazon to deliver
the consistent customer experiences.

Department of Computer Science & Engineering

Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data in Marketing and Sales
 Big Data Analytics in Detection of Marketing Frauds
Big Data analytics enable fraud detection. Big Data usages has the following
features-for enabling detection and prevention of frauds:
• Fusing of existing data at an enterprise data warehouse with data from
sources such as social media, websites, blogs, e-mails, and thus enriching
existing data
• Using multiple sources of data and connecting with many applications
• Providing greater insights using querying of the multiple source data
• Analyzing data which enable structured reports and visualization
• Providing high volume data mining, new innovative applications and thus
leading to new business intelligence and knowledge discovery.
Department of Computer Science & Engineering
Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data in Marketing and Sales

 Big Data Risks

• Large volume and velocity of Big Data

provide greater insights but also associate
risks with the data used.
• Data included may be erroneous, less
accurate or far from reality.
• Analytics introduces new errors due to such
data.
• Five data risks, described by Bernard Marr
are data security, data privacy breach, costs
affecting profits, bad analytics and bad data.

Department of Computer Science & Engineering

Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data in Marketing and Sales
 Big Data Credit Risk Management
Financial institutions, such as banks, extend loans to industrial and household sectors.
These institutions in many countries face credit risks, mainly risks of (i) loan defaults, (ii)
timely return of interests and principal amount. Financing institutions are keen to get insights
into the
following:
1. Identifying high credit rating business groups and individuals,
2. Identifying risk involved before lending money
3. Identifying industrial sectors with greater risks
4. Identifying types of employees (such as daily wage earners in construction sites) and
businesses (such as oil exploration) with greater risks
5. Anticipating liquidity issues (availability of money for further issue of credit and
rescheduling credit installments) over the years.
Department of Computer Science & Engineering
Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data in Marketing and Sales

 Big Data and Algorithmic Trading

• Algorithmic trading is a method of executing
large order using automated pre-programmed
trading instructions accounting for variables
such as time, price and volume.
• Complex mathematics computations enable
algorithmic trading and business investment
decisions to buy and sell.

Department of Computer Science & Engineering

Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data and Healthcare
Big Data analytics in health care use the following data sources: clinical records, (ii)
pharmacy records, (3) electronic medical records (4) diagnosis logs and notes and (v)
additional data, such as deviations from person usual activities, medical leaves from job,
social interactions.
Healthcare analytics using Big Data can facilitate the following:
1. Provisioning of value-based and customer-centric healthcare,
2. Utilizing the 'Internet of Things' for health care
3. Preventing fraud, waste, abuse in the healthcare industry and reduce healthcare
costs (Examples of frauds are excessive or duplicate claims for clinical and hospital
treatments. Example of waste is unnecessary tests. Abuse means unnecessary use
of medicines, such as tonics and testing facilities.)
4. Improving outcomes
5. Monitoring patients in real time.
Department of Computer Science & Engineering
Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data in Medicine

Big Data analytics deploys large volume of data to identify and derive intelligence using
predictive models about individuals.
Following are some findings: building the health profiles of individual patients and
predicting models for diagnosing better and offer better treatment,
• Aggregating large volume and variety of information around from multiple sources the
DNAs, proteins, and metabolites to cells, tissues, organs, organisms, and ecosystems, that
can enhance the understanding of biology of diseases. Big data creates patterns and models
by data mining and help in better understanding and research,
• Deploying wearable devices data, the devices data records during active as well as
inactive periods, provide better understanding of patient health, and better risk profiling
the user for certain diseases.

Department of Computer Science & Engineering

Strive for Excellence
Slide #26

Big Data Applications and Case Studies

Big Data in Advertising
• The impact of Big Data is tremendous on the digital advertising industry. The digital
advertising industry sends advertisements using SMS, e-mails, WhatsApp, Linkedln,
Facebook, Twitter and other mediums.
• The data helps digital advertisers to discover new relationships, lesser competitive regions
and areas.
• Success from advertisements depend on collection, analyzing and mining. The new insights
enable the personalization and targeting the online, social media and mobile for advertisements
called hyper-localized advertising.
• Advertising on digital medium needs optimization. Too much usage can also effect
negatively. Phone calls, SMSs, e-mail-based advertisements can be nuisance if sent without
appropriate researching on the potential targets. The analytics help in this direction.
• The usage of Big Data after appropriate filtering and elimination is crucial enabler of
BigData Analytics with appropriate data, data forms and data handling in the right manner.
Department of Computer Science & Engineering
Strive for Excellence
Slide #74

Module 2

Chapter 2 : Introduction to Hadoop(T1)

Chapter 3 :Hadoop Distributed File
System Basics(T2)
Chapter 7:Essential Hadoop Tools(T2)

Department of Computer Science & Engineering

Databricks
No ratings yet
Databricks
56 pages
Architecting Cloud Native NET Apps For Azure
100% (2)
Architecting Cloud Native NET Apps For Azure
195 pages
SEminar Report On Cloud Computing PDF
50% (2)
SEminar Report On Cloud Computing PDF
25 pages
Exam Report Csa
No ratings yet
Exam Report Csa
13 pages
Salesforce Admin Q&A-20
No ratings yet
Salesforce Admin Q&A-20
16 pages
CH 2 Emerging Trends 1
No ratings yet
CH 2 Emerging Trends 1
43 pages
Atos Refinery Logistics Optimization Digitalplant PDF
No ratings yet
Atos Refinery Logistics Optimization Digitalplant PDF
16 pages
Cloud Computing Unit 2
No ratings yet
Cloud Computing Unit 2
54 pages
AZ 900 New Exam 470QA
No ratings yet
AZ 900 New Exam 470QA
336 pages
Overview On Network Orchestration
No ratings yet
Overview On Network Orchestration
30 pages
ISAD GROUP 1 Virtualization and Cloud Computing Environment and Business Application System 1
100% (1)
ISAD GROUP 1 Virtualization and Cloud Computing Environment and Business Application System 1
47 pages
AWS2
100% (1)
AWS2
37 pages
Cloud Computing
100% (2)
Cloud Computing
39 pages
BDA Module 2 - Notes PDF
No ratings yet
BDA Module 2 - Notes PDF
101 pages
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
No ratings yet
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
76 pages
CC Notes
No ratings yet
CC Notes
75 pages
AZ 900 Official Course Study Guide v2.0
No ratings yet
AZ 900 Official Course Study Guide v2.0
38 pages
Unit 9 Cloud Computing
No ratings yet
Unit 9 Cloud Computing
31 pages
CO - Chap 1
No ratings yet
CO - Chap 1
33 pages
Az Questions
No ratings yet
Az Questions
15 pages
Cloud Computing Basics ICT Presentation
No ratings yet
Cloud Computing Basics ICT Presentation
33 pages
Memory Management Unit
No ratings yet
Memory Management Unit
5 pages
Paper 58-Metaheuristic Optimization For Dynamic Task Scheduling
No ratings yet
Paper 58-Metaheuristic Optimization For Dynamic Task Scheduling
8 pages
Cloud Computing Practical
No ratings yet
Cloud Computing Practical
7 pages
Rohan Sujay Resume
No ratings yet
Rohan Sujay Resume
1 page
Cloud Computing
No ratings yet
Cloud Computing
24 pages
Accounting in The Cloud
No ratings yet
Accounting in The Cloud
15 pages
Resume Shreyas Annadurai
No ratings yet
Resume Shreyas Annadurai
2 pages
2 Cloud
No ratings yet
2 Cloud
21 pages
Cloud Native Application Architecture: Nanodegree Program Syllabus
No ratings yet
Cloud Native Application Architecture: Nanodegree Program Syllabus
17 pages
Benefits and Challenges of The Adoption of Cloud Computing in Business
No ratings yet
Benefits and Challenges of The Adoption of Cloud Computing in Business
15 pages
The Overview of Cloud Security
No ratings yet
The Overview of Cloud Security
8 pages
Odin Mat Handover
No ratings yet
Odin Mat Handover
14 pages
CO Problems M3
No ratings yet
CO Problems M3
2 pages
Resource Allocation in Cloud Computing: A Review
No ratings yet
Resource Allocation in Cloud Computing: A Review
4 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)