0% found this document useful (0 votes)

48 views99 pages

Unit 1

The document discusses big data and related concepts. It defines data, information and knowledge, and provides examples of each. It then defines big data as large volumes of data that grow exponentially over time. The document outlines different types of big data including structured, unstructured and semi-structured data. It also describes characteristics of big data such as volume, variety and velocity. Finally, it discusses big data analytics and some challenges and sources of big data.

Uploaded by

Bhushan Kelkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views99 pages

Unit 1

Uploaded by

Bhushan Kelkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 99

BIG DATA

Name: KARISHMA N. PARAB GAONKAR

ASSISTANT PROFESSOR
COMPUTER ENGINEERING DEPARTMENT
GOA COLLEGE OF ENGINEERING

1
Content

 Data , Information , Knowledge

 Example
 Big Data
 Examples
 Types of Big Data
 Characteristics of Big Data
 Big Data Analytics

2
DATA,INFORMATION ,KNOWLEDGE

 Data : The quantities , characters or symbols on which

operation are performed , which may be stored and
transmitted over a medium.
 Example : weights, price, cost, number, name etc.

 Information: It is a data that has been converted into useful

or intelligible form.
 Example: Report Card, Merit List, Documents etc.

 Knowledge : Ability of person to use his information for a

specific purpose.
3
Example

 “386” is a data

 “Your marks are 386 out off 400” is information

 “It is a result of your hard work” is knowledge

4
Big Data
 Its also a data with huge size.

 It is a term used to describe data which is huge in volume and

yet grows exponentially over a period of time.

 Not suitable for traditional data management tools.

 The challenges include capture, duration, storage, search,

sharing, transfer, analysis, and visualization.

5
Examples

 Stock market

 Social Media

 Sensors

 Web

6
Types of Big Data

 Structured

 Unstructured

 Semi-Structured

7
Structured Data
 Data that can be Stored , Accessed and Processed using fixed
format.

 It contains rows and columns.

 Easy to derive values.

 Example : Relational databases

8
Unstructured Data
 Any data with unknown form or structured is classified as
unstructured data.

 Size is huge , involved multiple challenges to process the

data.

 It contain heterogeneous data : text files , videos , images etc.

 Example : Output of google search

9
Semi Structured Data

 It contain both forms of data.

 Example : XML file , JSON etc

10
Characteristics Of Big Data

 Volume

 Variety

 Velocity

 Variability

11
Volume

Exponential increase in collected/generated data

12
4.6 billion
30 billion RFID tags camera
today phones
12+ TBs (1.3B in 2005) world wide
of tweet data
every day

100s of
millions of
GPS enabled
data every day
? TBs of

devices sold
annually

25+ TBs of
2+ billion
log data
every day people on
the Web by
end 2011
76 million smart meters in
2009…
200M by 2014
13
Variety

Social Banking
Finance
Media

Our
Gaming
Customer Known
History

Entertain Purchase

14
Velocity

Mobile devices
(tracking all objects all the time)

Social media and networks Scientific instruments

(all of us are generating data) (collecting all sorts of data)

Sensor technology and networks

(measuring all kinds of data)

15
Some Make it 4V’s

16
Growth of Big Data

17
Big Data Analytics
 It is complete process of collecting, gathering, organizing and
huge sets of data(Big Data) to identify pattern and to extract
other useful information to make a decision.

 Big Data analytics can be used in various sectors like media,

education, healthcare, various government and non
government sectors etc.

18
example

 Uber- driver, vehicle, location etc. fairs, location etc can be

predicted.

 Banking – fraud detection

19
THANK YOU

20
content

 Structured data vs. Unstructured data

 Big Data vs Conventional Data

 Techniques used by big data

 Importance of Big Data Analytics

 Sources of Big Data

 Challenges of Big Data

 How big data differs

 Web data

1 Prof. Karishma Parab ,Computer Engineering dept,GEC

Structured data vs. Unstructured data

Structured Data Unstructured Data

Predefined data models No predefined data model
Usually text only May be text,images,sound,videos or other
formats.

Easy to search Difficult to search

Resides in relational databases Resides in non relational databases

Generated by humans or machines in airline Generated by humans and machines through word
reservation systems, inventory control systems, processing, email clients, tools for viewing or
ERP systems etc. editing media.

2 Prof. Karishma Parab ,Computer Engineering dept,GEC

Big Data vs Conventional Data
Big Data Conventional Data
Huge data sets. Data set size in control.
Unstructured data such as text, video, Normally structured data such as numbers
and audio. and categories, but it can take other forms
as well.
Hard-to-perform queries and analysis. Relatively easy-to-perform queries and
analysis.
Needs a new methodology for analysis Data analysis can be achieved by using
conventional methods.
Need tools such as Hadoop, Hive, Tools such as SQL, SAS, R, and Excel
Hbase, Pig, Sqoop, and so on. alone may be sufficient.
Raw transactional data. The aggregated or sampled or filtered data.

Generated by big financial institutions, Generated by small enterprises and small

Facebook, Google, Amazon, eBay, banks.
Walmart, and so on.

3 Prof. Karishma Parab ,Computer Engineering dept,GEC

Techniques used by big data
 Distributed storage and processing

 Non-Relational database

 Streams and Complex Event processing

 Data processing

 In-Memory Processing

 Reporting Layer(visualization)

4 Prof. Karishma Parab ,Computer Engineering dept,GEC

Importance of Big Data Analytics

 Cost reduction.

 Faster, better decision making.

 New products and services.

5 Prof. Karishma Parab ,Computer Engineering dept,GEC

Sources of Big Data

 Social Networks(facebook,twitter etc)

 Traditional Business Systems(banking, e-commerce etc)

 Internet of things(satellites images, sensors etc)

 etc

6 Prof. Karishma Parab ,Computer Engineering dept,GEC

Challenges of Big Data
 Meeting the need for speed :
 Not only to find relevant data but to find quickly.
 Granularity of data
 Solution: hardware with increase memory and parallel processing.
 Grid computing approach

 Understanding the data

 Visualization
 Customer using a set of product and understand what your trying to
infer from this data.
 Solution : proper domain expertise

7 Prof. Karishma Parab ,Computer Engineering dept,GEC

Challenges of Big Data
 Addressing data quality
 accuracy of data for decision making
 Solution : companies should have information management process to
ensure data is clean.
 Use of proactive method(solution before they appear)

 Displaying meaningful results (clustering)

 Plotting points is difficult(many datapoints)
 Solution: cluster data for better visualization

8 Prof. Karishma Parab ,Computer Engineering dept,GEC

Challenges of Big Data
 Dealing with outliers
 The graphical representations of data made possible by visualization
can communicate trends and outliers much faster than tables
containing numbers and text.
 Solution :remove the outliers

9 Prof. Karishma Parab ,Computer Engineering dept,GEC

How big data differs
 Automatically generated by machine

 Entirely new source of data

 Not a user friendly

 Need to focus on important part.

10 Prof. Karishma Parab ,Computer Engineering dept,GEC

Web Data
 Data collected over the internet specific to a corporate or to
an individual.

 It’s a versatile data and its used for predictive analysis, that’s y
it is considered as a most popular big data.

 360 view of particular item.

11 Prof. Karishma Parab ,Computer Engineering dept,GEC

Web Data

 Earlier company has started with basic recency, frequency,and

monetary value(RFM)metrics attached to customers.

 But now companies have newly evolving big data sources such
as data from web browsers, mobile applications, social media
sites etc

12 Prof. Karishma Parab ,Computer Engineering dept,GEC

Web Data
 We might have missing data so improvement will be to know
everything about the customer does as they go through the process
of your organization.(new level of understanding and interaction)

 Web data provides new level of information, i.e. it provides

factual information on customer preferences, future intentions,
and motivations that are virtually impossible to get from other
sources

 Privacy :remove privacy concern by use an arbitrary identification

number for each customer,

13 Prof. Karishma Parab ,Computer Engineering dept,GEC

Web Data
 What Data should be Collected?
Purchases
Requesting help
Product views
Forwarding a link
Shopping basket additions
Posting a comment
Watching a video
Registering for a webinar
Accessing a download
Executing a search
Reading / writing a review

14 Prof. Karishma Parab ,Computer Engineering dept,GEC

Web Data help organization
 Shopping Behaviour (search term, search engine, reference
sites)
 identify products that are of interest to a customer before they make
a purchase.

 Research Behaviour (make easy to fine their favourite option)

 Customers who goes for specification ,customers who go for photos

 Feedback Behaviour

15 Prof. Karishma Parab ,Computer Engineering dept,GEC

Web data in action
 Attrition Modelling : Those customers who are most at risk
of cancelling their accounts, so that action can be taken
proactively to prevent them from doing so.

 Response Modeling: Many models are created to help predict

the choice a customer.

 Customer Segmentation: segment customers based solely

upon their typical browsing patterns.

16 Prof. Karishma Parab ,Computer Engineering dept,GEC

Evolution of Analytic Scalability

1 Prof. Karishma Parab ,Computer Engineering dept,GEC

Analytic Scalability

 New level of scalability.

 As the amount of data organizations process continues to

increase, the same old methods for handling data just won’t
work anymore.

 Organization has to update their technologies to provide a

higher level of scalability.

2 Prof. Karishma Parab ,Computer Engineering dept,GEC

The Convergence of the Analytic
and Data Environment
 So earlier Data was not on one place and tools required to run
the analysis was not able to find data.
 Analytic professionals had to pull all their data together into a
separate analytics environment to do analysis.

 Analyst did Merging of data.

 Analysts do what is called “data preparation.”(pull and merge)

 In data warehousing world this process is called “extract,
transform, and load (ETL).”

3 Prof. Karishma Parab ,Computer Engineering dept,GEC

 Earlier they used mainframe, then they came up with RDB
(scalability).

 Data Marts: Databases were built for each specific purpose or

team, and relational databases were spread all over an
organization.

 Enterprise Data Warehouse (EDW):Combining the various

database systems into one big system.

4 Prof. Karishma Parab ,Computer Engineering dept,GEC

Traditional Analytic Architecture
Old way of doing things.

5 Prof. Karishma Parab ,Computer Engineering dept,GEC

Modern In-Database Architecture

•Processing stays in database.

•User machine has to just submit the request

•This is the concept of In Database Analytics

6 Prof. Karishma Parab ,Computer Engineering dept,GEC

MASSIVELY PARALLEL PROCESSING
SYSTEMS
 It is a mechanism for storing and analyzing large amounts of
data.

 An MPP database spreads data out into independent pieces

managed by independent storage and central processing unit
(CPU) resources.

 It removes the constraints of having one central server with

only a single set CPU and disk to manage it.

7 Prof. Karishma Parab ,Computer Engineering dept,GEC

Massively Parallel Processing
System Data Storage

8 Prof. Karishma Parab ,Computer Engineering dept,GEC

 An MPP database breaks the data into independent chunks with
independent disk and CPU

100-gigabyte 100-gigabyte 100-gigabyte 100-gigabyte 100-gigabyte

chunks chunks chunks chunks chunks
One-terabyte
table 100-gigabyte 100-gigabyte 100-gigabyte 100-gigabyte 100-gigabyte
chunks chunks chunks chunks chunks

A Traditional database will

query a one-terabyte table 10 simultaneous 100-gigabyte queries
one row at time

9 Prof. Karishma Parab ,Computer Engineering dept,GEC

Massively Parallel Processing
System Data Storage

 This allows much faster query execution, since many

independent smaller queries are running simultaneously
instead of just one big query.

 It gets a little more complicated in cases where data must be

moved, but MPP handles that in a very fast way.

10 Prof. Karishma Parab ,Computer Engineering dept,GEC

Traditional Query versus an MPP
Query

11 Prof. Karishma Parab ,Computer Engineering dept,GEC

Advantages of MPP

 Data is stored at multiple location(redundance),so incase of

failure recovery is faster.

 Uses resource management tools to manage the CPU and

disk space.

 query optimizers to make sure queries are being optimally

executed.

12 Prof. Karishma Parab ,Computer Engineering dept,GEC

Using MPP Systems for Data
Preparation and Scoring
 4 Primary ways

1.SQL Push Down

2.User-Defined Functions (UDFS)

3.Embedded Processes

4.Predictive Modelling Markup Language (PMML) Scoring

13 Prof. Karishma Parab ,Computer Engineering dept,GEC

SQL Push Down
 SQL is the native language of an MPP system and it’s efficient
for a wide range of requirements.

 Many Data Preparation task can be translated into SQL and

push it down to the database.

 Analytic tools will often translate the logic from a model into
SQL for the user or user can code an SQL script.

 So the data preparation or scoring processes end up being

executed purely with SQL.

14 Prof. Karishma Parab ,Computer Engineering dept,GEC

User-Defined Functions
 It allows user to define logic that can be executed in same manner
as SQL function does.

 It involves compiling code into new database functions that can be

called from an SQL query.

 User-defined functions are coded in languages like C++ or Java.

 analytic professionals don’t know how to program, so we can use

analytic tools for generating appropriate UDF and load into db.

15 Prof. Karishma Parab ,Computer Engineering dept,GEC

Embedded Processes
 Its an analytic tool’s engine actually running on the database
itself.

 An embedded process is therefore capable of running

programs directly inside the database.

 There is no translation required.

 This method requires the least amount of changes to the

code.

16 Prof. Karishma Parab ,Computer Engineering dept,GEC

Predictive Modeling Markup
Language
 Predictive modeling markup language (PMML) is a way to
pass model results from one tool to another.

 The type of information included in a PMML feed includes a

model type, the variable names and formats, and the
parameter values.

 Drawback : The exact same variables in the exact same

format must be available in the system where the PMML is
being deployed.

17 Prof. Karishma Parab ,Computer Engineering dept,GEC

Content
 Cloud Computing – Public Cloud, Private Cloud

 Grid Computing

 Analytical Process

 Analytic Sandbox, Benefits

1 Prof. Karishma Parab ,Computer Engineering dept,GE

Cloud Computing
 It is the delivery of different services through the internet.

 These resources includes tools and applications like data

storage, servers, databases, networking and software.

 Instead of keeping files on proprietary storage you store it on a

remote database.

 So user is not required to be at specific place to access the data

as data is stored remotely.

2 Prof. Karishma Parab ,Computer Engineering dept,GE

Criteria for cloud environment
 3 criteria

 Enterprises incur no infrastructure or capital costs, only

operational costs.(pay-per-use basis).

 Capacity can be scaled up or down dynamically, and

immediately.

 The underlying hardware can be anywhere

geographically.(multi-tenancy mode)

3 Prof. Karishma Parab ,Computer Engineering dept,GE

Characteristics of Cloud
environment
 5 characteristics of a cloud environment.
1. On-demand self-service

2. Broad network access

3. Resource pooling

4. Rapid elasticity

5. Measured service

4 Prof. Karishma Parab ,Computer Engineering dept,GE

Types of cloud environments
Two types of cloud environment

 Public clouds

 Private clouds

5 Prof. Karishma Parab ,Computer Engineering dept,GE

Public Clouds

 Users are basically loading their data onto a host system

 Users are then allocated resources to use this data.

 So users get charged accordingly.

6 Prof. Karishma Parab ,Computer Engineering dept,GE

Advantages of public cloud
 The bandwidth is as needed, as user pay only for what they use.

 No need to buy system of high capacity and then having risk of half of
the capacity unused.

 Simply pay for extra resources if needed for processing.

 Once access is granted user load their data and start analyzing.

 Easy to share data with others regardless of location. Since public

cloud is outside the firewall

7 Prof. Karishma Parab ,Computer Engineering dept,GE

Disadvantages of Public Cloud
 There are few performance guarantees in a public cloud.(multiple
users can request for same resources)

 Can cause high variability in performance.

 Security of data

 It can get expensive if a cloud isn’t used wisely(bad query)

 If an audited trail of data and where it sits is required, it is not

possible to have that in a public cloud

8 Prof. Karishma Parab ,Computer Engineering dept,GE

Private Cloud

 Owned exclusively by one organization and typically housed

behind a corporate firewall.

 A private cloud is going to serve the exact same function as a

public cloud, but just for the people or teams within a given
organization.

9 Prof. Karishma Parab ,Computer Engineering dept,GE

Public Clouds versus Private Clouds

10 Prof. Karishma Parab ,Computer Engineering dept,GE

Advantages of private cloud

 Organization has complete control over data and system

security.

 Data won’t leave a complete firewall ,so data is at low risk.

11 Prof. Karishma Parab ,Computer Engineering dept,GE

Disadvantages of private cloud

 It is necessary to purchase and own the entire cloud

infrastructure before allocating it out to users.

12 Prof. Karishma Parab ,Computer Engineering dept,GE

GRID Computing

 Used for some computations and algorithms which cannot be

converted into SQL or embedded in UDF.

 So you pull out a data into a more traditional analytics environment

and run analytic tools against that data in the traditional way.

 So large servers are utilized to do this task , it leads to expanding the

size and number of servers.

13 Prof. Karishma Parab ,Computer Engineering dept,GE

Grid Computing
 A grid configuration can help both cost and performance.

 Instead of having a single high-end server, a large number of

lower-cost machines are put in place.

 So jobs are given to individual servers, and are processed

simultaneously

 Each machine may only be able to handle a fraction of the work

of the original server and can potentially handle only one job at
a time.

14 Prof. Karishma Parab ,Computer Engineering dept,GE

Grid Computing

 Disadvantage - Big jobs won’t run as quickly on the cheaper

machines as on a larger server.

 So the innovation would be to use a high performance analytics

architectures, where the various machines in the grid are aware
of each other and can share information.

15 Prof. Karishma Parab ,Computer Engineering dept,GE

Analytic Process
 Updating technology for scalability won’t be beneficial if your
using old analytic processes.

 As analytic professionals begin constantly using a database

platform for their work.

 So you provide workspace or sandbox within the database

system for analytic professional.

16 Prof. Karishma Parab ,Computer Engineering dept,GE

Analytic Sandbox
 Analytic sandbox provides permissions and access to use
enterprise data warehouse or data marts effectively by the
analytic professionals.

 An analytic sandbox provides a set of resources with which in-

depth analysis can be done to answer critical business questions.

 An analytic sandbox is ideal for data exploration, development

of analytical processes, proof of concepts, and prototyping.

17 Prof. Karishma Parab ,Computer Engineering dept,GE

Sandbox
 A sandbox is basically a small set of users which creates data
within the sandbox which is segregated from the production
database.

 Sandbox users will be allow to load their data which is required

for a project, which may not be a part of enterprise data model.

 Data in the sandbox will have limited life ,i.e. build a data
needed for project and delete it as soon as project is done.

18 Prof. Karishma Parab ,Computer Engineering dept,GE

Analytic Sandbox Benefits

 Benefits from the view of an analytic professional

 Benefits from the view of IT

19 Prof. Karishma Parab ,Computer Engineering dept,GE

Benefits from the view of an
analytic professional:
 Independence (no need to ask permissions)

 Flexibility (can use business intelligence, statistical analysis,

visualization tools that they need)

 Efficiency(no need to migrate the data)

 Freedom (no need to focus on administration system, you can just

shift the maintenance task to IT)

 Speed (parallel processing)

20 Prof. Karishma Parab ,Computer Engineering dept,GE

Benefits from the view of IT
 Centralization

 Streamlining (both development and deployment)

 Simplicity

 Control(production environment is safe if experiment went

wrong)

 Costs (data marts in one central system)

21 Prof. Karishma Parab ,Computer Engineering dept,GE

Content
 Internal External & Hybrid Sandbox

 Data analytic tool

 Analysis vs Reporting

1 Prof. Karishma Parab ,Computer Engineering dept,GE

Internal Sandbox
 The sandbox is physically located on the production system.
 The sandbox database itself is not a part of the production
database.
 The sandbox is a separate database container within the system.

2 Prof. Karishma Parab ,Computer Engineering dept,GE

Advantages
 Easy to setup(use of existing resources and infrastructure)

 Ability to directly join production data with sandbox data.

 Production data and all of the sandbox data are within the
production system so easy to link resources.

 An internal sandbox is very cost-effective since no new

hardware is needed

3 Prof. Karishma Parab ,Computer Engineering dept,GE

Disadvantages
 There will be an additional load on the existing enterprise data
warehouse or data mart.

 The sandbox will use both space and CPU resources (potentially
a lot of resources).

 Internal sandbox can be constrained by production policies and

procedures.

4 Prof. Karishma Parab ,Computer Engineering dept,GE

External Sandbox
 For an external sandbox, a physically separate analytic sandbox is
created for testing and development of analytic processes.

5 Prof. Karishma Parab ,Computer Engineering dept,GE

Advantages

 The sandbox is a standalone environment, dedicated to

advanced analytics development.

 Allows for flexibility in design and usage.

 External sandbox reduces workload ,because only analytic

professionals are using the system,

6 Prof. Karishma Parab ,Computer Engineering dept,GE

Disadvantage

 Additional cost of the stand-alone system that serves as the

sandbox platform.

 There will be some data movement from the production system

into the sandbox before developing a new analysis.

7 Prof. Karishma Parab ,Computer Engineering dept,GE

Hybrid Sandbox
 A hybrid sandbox environment is the combination of an internal
sandbox and an external sandbox.

 It allows analytic professionals the flexibility to use the power of the

production system when needed, and also external system.

8 Prof. Karishma Parab ,Computer Engineering dept,GE

Advantages

 Same as internal and external sandbox plus flexibility of

approach in analysis.

9 Prof. Karishma Parab ,Computer Engineering dept,GE

Disadvantages

 To maintain both an internal and external sandbox

environment.

 It will also be necessary to establish some guidelines on when

each sandbox option is used.

10 Prof. Karishma Parab ,Computer Engineering dept,GE

Modern Data Analytic Tools
 Basically used to identify patterns and establish relationship
between the data.

 Basic Statistical Concept

 Probability

 Does Sampling and calculates estimates based on sample

 Elements of Inference

11 Prof. Karishma Parab ,Computer Engineering dept,GE

 Statistical model structure

 Data analysis involve how variables are related.

 Bayesian approach

 Supervised classification method

 SVM

12 Prof. Karishma Parab ,Computer Engineering dept,GE

Reporting Vs. Analysis
 Ultimate goal for reporting and analysis is to increase sales and
reduce costs

 Purpose

 Task

 Output

 Delivery Value

13 Prof. Karishma Parab ,Computer Engineering dept,GE

Purpose
 Reporting: The process of organizing data into informational summaries in
order to monitor how different areas of a business are performing.

 Analysis: The process of exploring data and reports in order to extract

meaningful insights, which can be used to better understand and improve
business performance.

 Reporting translates raw data into information. Analysis transforms

data and information into insights.

 Reporting shows you what is happening while analysis focuses on

explaining why it is happening and what you can do about it.

14 Prof. Karishma Parab ,Computer Engineering dept,GE

Tasks

 Reporting Task - building, configuring, consolidating, organizing,

formatting, and summarizing

 Analysis Task - questioning, examining, interpreting, comparing,

and confirming

15 Prof. Karishma Parab ,Computer Engineering dept,GE

Outputs
 Reporting follows a push approach,

 Reports are given to the users which than extract meaningful

information and take appropriate actions for themselves .

 Three main types of reporting: canned reports, dashboards,

and alerts.

 Canned Reports
 Can be accessed with in the tools.
 They are static with fixed metrics and dimensions

16 Prof. Karishma Parab ,Computer Engineering dept,GE

 Dashboards

 These custom-made reports combine different KPIs and reports to

provide a comprehensive, high-level view of business performance
for specific audiences.

 Dashboards may include data from various data sources and are
also usually fairly static.

17 Prof. Karishma Parab ,Computer Engineering dept,GE

 Alerts

 These conditional reports are triggered when data falls outside of

expected ranges or some other pre-defined criteria is met.

 Once people are notified of what happened, they can take

appropriate action as necessary.

18 Prof. Karishma Parab ,Computer Engineering dept,GE

 Analysis follows a pull approach, where particular data is
pulled by an analyst in order to answer specific business
questions

 There are two main types: ad hoc responses and analysis

presentations.

19 Prof. Karishma Parab ,Computer Engineering dept,GE

 Ad-hoc responses

 These receives urgent request to answer variety of questions ,these

urgent request are time sensitive and demand quick response.

 Short and concise report

20 Prof. Karishma Parab ,Computer Engineering dept,GE

 Analysis presentation
 Some business questions are more complex in nature and require
more time to perform an analysis.

 This analysis results in two key sections: key

findings and recommendations.

 Key finding-This finds out most meaningful and actionable

information.

 Key recommendations – provides guidance on what action to take

based on analysis.

21 Prof. Karishma Parab ,Computer Engineering dept,GE

Delivery
 Report
 People can access reports through an analytics tool, Excel
spreadsheet, widget, or have them scheduled for delivery into their
mailbox, mobile device, FTP site, etc.

 Automatically delivery of reports on regular basis by computer

system.

22 Prof. Karishma Parab ,Computer Engineering dept,GE

 Analysis

 Humans use their superior reasoning and analytic skills to extract

information from data and provides recommendation for
organization.

23 Prof. Karishma Parab ,Computer Engineering dept,GE

value
 It’s important to understand the relationship between reporting
and analysis in driving value.
 Value for successful web analytics
 The “Path to Value” diagram

24 Prof. Karishma Parab ,Computer Engineering dept,GE

25 Prof. Karishma Parab ,Computer Engineering dept,GE

Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
43 pages
BDA
No ratings yet
BDA
148 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Dbms Exp 2 Theory
No ratings yet
Dbms Exp 2 Theory
10 pages
Data Dictionary Requirements
No ratings yet
Data Dictionary Requirements
5 pages
Unit 1.1 - Introduction To Big Data Analytics
No ratings yet
Unit 1.1 - Introduction To Big Data Analytics
19 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
Documentum Content Server 6.7 DQL Reference
No ratings yet
Documentum Content Server 6.7 DQL Reference
415 pages
MDM 4
No ratings yet
MDM 4
159 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Big Data Analytics - AAM - Unit 1
No ratings yet
Big Data Analytics - AAM - Unit 1
178 pages
Tripwire Enterprise 9.1 User Guide
No ratings yet
Tripwire Enterprise 9.1 User Guide
6 pages
Bda Unit 1
No ratings yet
Bda Unit 1
74 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Cluster Logical Volume Manager
No ratings yet
Cluster Logical Volume Manager
126 pages
Unit 1
No ratings yet
Unit 1
74 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
48 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Bda U1
No ratings yet
Bda U1
78 pages
Unit-1: Ajay Kumar Assistant Professor Computer Scinece & Engineering
No ratings yet
Unit-1: Ajay Kumar Assistant Professor Computer Scinece & Engineering
52 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Unit 1
No ratings yet
Unit 1
59 pages
PL/SQL Stands For Procedural
No ratings yet
PL/SQL Stands For Procedural
81 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
BDA (18CS72) Module-1
No ratings yet
BDA (18CS72) Module-1
36 pages
Preventive Maintenance Guide
No ratings yet
Preventive Maintenance Guide
144 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Module 1 - Data Science Introduction - Detailed
No ratings yet
Module 1 - Data Science Introduction - Detailed
131 pages
Bigdata Fundamentals
No ratings yet
Bigdata Fundamentals
82 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
UNIT-3 DBMS Normalization and FD
No ratings yet
UNIT-3 DBMS Normalization and FD
66 pages
UNIT V Transaction and Indexing
No ratings yet
UNIT V Transaction and Indexing
26 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Hamid Seminar
No ratings yet
Hamid Seminar
57 pages
Bigdata Units
No ratings yet
Bigdata Units
80 pages
Big Data
No ratings yet
Big Data
23 pages
Webinar - Booting x86 Systems Into Windows Embedded Compact 7
No ratings yet
Webinar - Booting x86 Systems Into Windows Embedded Compact 7
33 pages
Unit 1
No ratings yet
Unit 1
76 pages
Data Science: Lecture #1
No ratings yet
Data Science: Lecture #1
22 pages
Performing Periodic Backup Level II Unit No. 2
No ratings yet
Performing Periodic Backup Level II Unit No. 2
20 pages
Creating A SCD Type 2 Mapping Using The Informatica PowerCenter Mapping Wizard
0% (1)
Creating A SCD Type 2 Mapping Using The Informatica PowerCenter Mapping Wizard
16 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Unit 1 Question&answers
No ratings yet
Unit 1 Question&answers
36 pages
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
2cqr Library Automation
No ratings yet
2cqr Library Automation
31 pages
Resetting NTFS Files Security and Permission in Windows 7 - Lallous' Lab
No ratings yet
Resetting NTFS Files Security and Permission in Windows 7 - Lallous' Lab
29 pages
Rman Q A
No ratings yet
Rman Q A
16 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Big Data: Dr.S.Lovelyn Rose Associate Professor PSG College of Technology Coimbatore
No ratings yet
Big Data: Dr.S.Lovelyn Rose Associate Professor PSG College of Technology Coimbatore
25 pages
Module 6
No ratings yet
Module 6
13 pages
Module 1.1 - Introduction To Big Data
No ratings yet
Module 1.1 - Introduction To Big Data
18 pages
Big Data
No ratings yet
Big Data
20 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Big Data Analtics (Unit 1)
No ratings yet
Big Data Analtics (Unit 1)
31 pages
12c Dataguard Switchover Best Practices Using DGMGRL (Dataguard Broker Command Prompt)
No ratings yet
12c Dataguard Switchover Best Practices Using DGMGRL (Dataguard Broker Command Prompt)
7 pages
01 Introduction
No ratings yet
01 Introduction
23 pages
Unit 1
No ratings yet
Unit 1
107 pages
IBM Spectrum Protect V8.1.8 C1000-051 Dumps
No ratings yet
IBM Spectrum Protect V8.1.8 C1000-051 Dumps
11 pages
Unit-2 R Programing
No ratings yet
Unit-2 R Programing
26 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
23 pages
CH 05
No ratings yet
CH 05
28 pages
50 Excel Interview Questions 1685459119
No ratings yet
50 Excel Interview Questions 1685459119
7 pages
INTRODUCTION: SLIDE 1 (Smriti)
No ratings yet
INTRODUCTION: SLIDE 1 (Smriti)
17 pages
BIG DATA INTRODUCTION Hadoop
No ratings yet
BIG DATA INTRODUCTION Hadoop
24 pages
Bda Unit I LM
No ratings yet
Bda Unit I LM
14 pages
Class Xii Computer Science Final Mock Test Paper QP
No ratings yet
Class Xii Computer Science Final Mock Test Paper QP
10 pages
BD 1
No ratings yet
BD 1
15 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Abap Managed Database Procedure 1731386745
No ratings yet
Abap Managed Database Procedure 1731386745
5 pages
Financial Tracking System PDF
No ratings yet
Financial Tracking System PDF
11 pages
Computer Science B.SC 1 To 6th Sem Syllabus 2024-25
No ratings yet
Computer Science B.SC 1 To 6th Sem Syllabus 2024-25
42 pages
04 5 Etc Passwd
No ratings yet
04 5 Etc Passwd
6 pages
Bebeam Read
No ratings yet
Bebeam Read
5 pages
Data Quality: The Other Face of Big Data: Barna Saha Divesh Srivastava
No ratings yet
Data Quality: The Other Face of Big Data: Barna Saha Divesh Srivastava
4 pages
API To Create Item Cost in OPM
No ratings yet
API To Create Item Cost in OPM
4 pages
Analytics Engineer
No ratings yet
Analytics Engineer
2 pages
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet