0% found this document useful (0 votes)
30 views30 pages

Reporting V/s Analysis: Dr. Anil Kumar Dubey

The document distinguishes between reporting and analysis, highlighting that reporting organizes data into summaries for monitoring business performance, while analysis explores data to extract insights for improvement. It discusses various big data analytic tools like Apache Hadoop, Cassandra, and Spark, emphasizing their capabilities in handling large datasets and providing real-time insights. Additionally, it outlines the lifecycle phases of big data analytics and different types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views30 pages

Reporting V/s Analysis: Dr. Anil Kumar Dubey

The document distinguishes between reporting and analysis, highlighting that reporting organizes data into summaries for monitoring business performance, while analysis explores data to extract insights for improvement. It discusses various big data analytic tools like Apache Hadoop, Cassandra, and Spark, emphasizing their capabilities in handling large datasets and providing real-time insights. Additionally, it outlines the lifecycle phases of big data analytics and different types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Reporting v/s Analysis

Dr. Anil Kumar Dubey


Associate Professor,
Computer Science & Engineering Department,
ABES EC, Ghaziabad
Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Uttar
Pradesh, Lucknow
Reporting v/s Analysis
 Reporting: process of organizing data into
informational summaries in order to monitor how
different areas of a business are performing.

 Reporting translates raw data into information.

 Analysis: process of exploring data and reports in


order to extract meaningful insights, which can be
used to better understand and improve business
performance.

 Analysis transforms data and information into insights.


Conti…
 Data reporting: Gathering data into one place and
presenting it in visual representations.

 Data analysis: Interpreting your data and giving it


context.
Conti…
 Analysistransforms data and information into insights.
 Reporting helps companies to monitor their online
business and be alerted to when data falls outside of
expected ranges.
Conti…
Reporting Analysis
Purpos Monitor and alert Interpret and recommend
e actions
Tasks Build, Configure, Consolidate, Question, Examine, Interpret,
Organize, Format, Summarize Compare, Confirm
Output Canned reports, Dashboards, Ad hoc responses, Analysis,
s Alert presentations (findings +
recommendations)
Deliver Accessed via tool, Scheduled prepared and shared by analyst
y for delivery
Value Distills data into, information provides deeper insights into
for further analysis, Alerts business, Offers
company to exceptions in recommendations to drive
data action
Modern Data
APACHE Hadoop
Analytic Tools
Java-based open-source platform that is being used
to store and process big data.
Built on a cluster system that allows system to
process data efficiently and let data run parallel.
Process both structured and unstructured data from
one server to multiple computers.
Hadoop also offers cross-platform support for its
users.
Today, it is best big data analytic tool and is
popularly used by many tech giants such as Amazon,
Microsoft, IBM, etc.
Conti…
Cassandra
 APACHE Cassandra is an open-source NoSQL distributed
database that is used to fetch large amounts of data.
 It’s one of most popular tools for data analytics and has
been praised by many tech companies due to its high
scalability and availability without compromising speed
and performance.
 It is capable of delivering thousands of operations
every second and can handle petabytes of resources
with almost zero downtime.
 It was created by Facebook back in 2008 and was
published publicly.
Conti…
Qubole
It’s an open-source big data tool that helps in
fetching data in a value of chain using ad-hoc
analysis in machine learning.
Qubole is a data lake platform that offers end-to-
end service with reduced time and effort which
are required in moving data pipelines.
It is capable of configuring multi-cloud services
such as AWS, Azure, and Google Cloud.
Besides, it also helps in lowering the cost of
cloud computing by 50%.
Conti…
Xplenty
It is a data analytic tool for building a data pipeline
by using minimal codes in it.
With the help of its interactive graphical interface,
it provides solutions for ETL etc.
The best part of using Xplenty is its low investment
in hardware & software and its offers support
via email, chat, telephonic and virtual
meetings.
Xplenty is a platform to process data for analytics
over the cloud and segregates all the data
Conti…
Spark
 APACHE Spark is another framework that is used to
process data and perform numerous tasks on a large
scale.
 It is widely used among data analysts as it offers easy-to-
use APIs that provide easy data pulling methods and it
is capable of handling multi-petabytes of data as
well.
 Spark made a record of processing 100 terabytes of data
in just 23 minutes which broke the previous world record
of Hadoop (71 minutes).
 This is the reason why big tech giants are moving towards
spark now and is highly suitable for ML and AI today.
Conti…
Mongo DB
Came in limelight in 2010, is a free, open-source
platform and a document-oriented (NoSQL)
database that is used to store a high volume of
data.
It uses collections and documents for storage
and its document consists of key-value pairs
which are considered a basic unit of Mongo DB.
It is so popular among developers due to its
availability for multi-programming languages
such as Python, Jscript, and Ruby.
Conti…
Apache Storm
 A storm is a robust, user-friendly tool used for data
analytics, especially in small companies.
 The best part about the storm is that it has no language
barrier (programming) in it and can support any of them.
 It was designed to handle a pool of large data in fault-
tolerance and horizontally scalable methods.
 When we talk about real-time data processing, Storm
leads the chart because of its distributed real-time big
data processing system, due to which today many tech
giants are using APACHE Storm in their system. Some of
the most notable names are Twitter, Zendesk, NaviSite,
etc.
Conti…
SAS
 One of the best tools for creating statistical modeling
used by data analysts.
 By using SAS, a data scientist can mine, manage,
extract or update data in different variants from
different sources.
 Statistical Analytical System or SAS allows a user to
access the data in any format (SAS tables or Excel
worksheets).
 Besides that it also offers a cloud platform for business
analytics called SAS Viya and also to get a strong grip
on AI & ML, they have introduced new tools and
Conti…
Data Pine
Datapine is an analytical used for BI and was
founded back in 2012 (Berlin, Germany).
It’s mainly used for data extraction (for small-
medium companies fetching data for close
monitoring).
With the help of its enhanced UI design, anyone
can visit and check the data as per their
requirement and offer in 4 different price brackets,
starting from $249 per month.
They do offer dashboards by functions, industry,
Conti…
Rapid Miner
It’s a fully automated visual workflow design tool
used for data analytics.
It’s a no-code platform and users aren’t required to
code for segregating data.
Though it’s an open-source platform but has a
limitation of adding 10000 data rows and a single
logical processor.
With the help of Rapid Miner, one can easily deploy
their ML models to the web or mobile (only when
the user interface is ready to collect real-time
Analytic Processes and Tools
Big data analytics tools should be able to handle
the volume, variety, and velocity of data.

They should also be able to process data in real-


time or near-real-time so that decisions can be
made based on the most up-to-date information.

Big Data analytics is a process used to extract


meaningful insights, such as hidden patterns,
unknown correlations, market trends, and
customer preferences.
Conti…
Big Data analytics provides various advantages—it
can be used for better decision making, preventing
fraudulent activities, among other things.

Zoho Analytics is a powerful big data analytics tool


that enables you to analyze massive data sets,
whether on the cloud or on-premise.

Zoho Analytics can connect to multiple data


sources, including business applications, files and
feeds, offline databases, cloud databases, and cloud
drives.
Conti…
Create business dashboards and insightful
reports utilizing AI and ML technologies.

Provide key business metrics on-demand with


our robust big data analytics software.
Lifecycle Phases of Big Data Analytics
Eight Stages are as:

Stage 1 - Business case evaluation


The Big Data analytics lifecycle begins with a
business case, which defines the reason and goal
behind the analysis.

Stage 2 - Identification of data


Here, a broad variety of data sources are
Conti…
Stage 3 - Data filtering
All of the identified data from the previous
stage is filtered here to remove corrupt data.

Stage 4 - Data extraction


Data that is not compatible with the tool is
extracted and then transformed into a
compatible form.
Conti…
Stage 5 - Data aggregation
In this stage, data with the same fields across
different datasets are integrated.

Stage 6 - Data analysis


Data is evaluated using analytical and
statistical tools to discover useful information.
Conti…
Stage 7 - Visualization of data
With tools like Tableau, Power BI, and QlikView,
Big Data analysts can produce graphic
visualizations of the analysis.

Stage 8 - Final analysis result


Here, final results of analysis are made
available to business stakeholders who will
take action.
Different Types of Big Data Analytics
Descriptive Analytics
Summarizes past data into a form that people can
easily read. This helps in creating reports, like a
company’s revenue, profit, sales, and so on. Also, it
helps in the tabulation of social media metrics.

Use Case: The Dow Chemical Company analyzed its


past data to increase facility utilization across its
office and lab space. Using descriptive
analytics, Dow was able to identify underutilized
space. This space consolidation helped the company
save nearly US $4 million annually.
Conti…
Diagnostic Analytics
 This is done to understand what caused a problem in the
first place. Techniques like drill-down, data mining, and data
recovery are all examples. Organizations use diagnostic
analytics because they provide an in-depth insight into a
particular problem.

 Use Case: An e-commerce company’s report shows that


their sales have gone down, although customers are adding
products to their carts. This can be due to various reasons
like the form didn’t load correctly, the shipping fee is too
high, or there are not enough payment options available.
This is where you can use diagnostic analytics to find the
Conti…
Predictive Analytics
 This type of analytics looks into the historical and present
data to make predictions of the future. Predictive analytics
uses data mining, AI, and machine learning to analyze
current data and make predictions about the future. It
works on predicting customer trends, market trends, and
so on.

 Use Case: PayPal determines what kind of precautions


they have to take to protect their clients against fraudulent
transactions. Using predictive analytics, the company uses
all the historical payment data and user behavior data and
builds an algorithm that predicts fraudulent activities.
Conti…
Prescriptive Analytics
 This type of analytics prescribes the solution to a
particular problem. Perspective analytics works with
both descriptive and predictive analytics. Most of the
time, it relies on AI and machine learning.

 Use Case: Prescriptive analytics can be used to


maximize an airline’s profit. This type of analytics is
used to build an algorithm that will automatically
adjust the flight fares based on numerous factors,
including customer demand, weather, destination,
holiday seasons, and oil prices.
Big Data Analytics Tools
Some analytics tools :
Hadoop - helps in storing and analyzing data

MongoDB - used on datasets that change


frequently

Talend - used for data integration and


management

Cassandra - a distributed database used to


Conti…
 Spark - used for real-time processing and analyzing
large amounts of data

 STORM - an open-source real-time computational


system

 Kafka - a distributed streaming platform that is used


for fault-tolerant storage

 R-Programming - R is a free open source software


programming language and a software environment
Conti…
Datawrapper- online data visualization tool
for making interactive charts

Tableau Public - communicates the insights


of the data through data visualization

Content Grabber - data extraction tool,


suitable for people with advanced
programming skills
THANK
YOU

You might also like