The document distinguishes between reporting and analysis, highlighting that reporting organizes data into summaries for monitoring business performance, while analysis explores data to extract insights for improvement. It discusses various big data analytic tools like Apache Hadoop, Cassandra, and Spark, emphasizing their capabilities in handling large datasets and providing real-time insights. Additionally, it outlines the lifecycle phases of big data analytics and different types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
30 views30 pages
Reporting V/s Analysis: Dr. Anil Kumar Dubey
The document distinguishes between reporting and analysis, highlighting that reporting organizes data into summaries for monitoring business performance, while analysis explores data to extract insights for improvement. It discusses various big data analytic tools like Apache Hadoop, Cassandra, and Spark, emphasizing their capabilities in handling large datasets and providing real-time insights. Additionally, it outlines the lifecycle phases of big data analytics and different types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30
Reporting v/s Analysis
Dr. Anil Kumar Dubey
Associate Professor, Computer Science & Engineering Department, ABES EC, Ghaziabad Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Uttar Pradesh, Lucknow Reporting v/s Analysis Reporting: process of organizing data into informational summaries in order to monitor how different areas of a business are performing.
Reporting translates raw data into information.
Analysis: process of exploring data and reports in
order to extract meaningful insights, which can be used to better understand and improve business performance.
Analysis transforms data and information into insights.
Conti… Data reporting: Gathering data into one place and presenting it in visual representations.
Data analysis: Interpreting your data and giving it
context. Conti… Analysistransforms data and information into insights. Reporting helps companies to monitor their online business and be alerted to when data falls outside of expected ranges. Conti… Reporting Analysis Purpos Monitor and alert Interpret and recommend e actions Tasks Build, Configure, Consolidate, Question, Examine, Interpret, Organize, Format, Summarize Compare, Confirm Output Canned reports, Dashboards, Ad hoc responses, Analysis, s Alert presentations (findings + recommendations) Deliver Accessed via tool, Scheduled prepared and shared by analyst y for delivery Value Distills data into, information provides deeper insights into for further analysis, Alerts business, Offers company to exceptions in recommendations to drive data action Modern Data APACHE Hadoop Analytic Tools Java-based open-source platform that is being used to store and process big data. Built on a cluster system that allows system to process data efficiently and let data run parallel. Process both structured and unstructured data from one server to multiple computers. Hadoop also offers cross-platform support for its users. Today, it is best big data analytic tool and is popularly used by many tech giants such as Amazon, Microsoft, IBM, etc. Conti… Cassandra APACHE Cassandra is an open-source NoSQL distributed database that is used to fetch large amounts of data. It’s one of most popular tools for data analytics and has been praised by many tech companies due to its high scalability and availability without compromising speed and performance. It is capable of delivering thousands of operations every second and can handle petabytes of resources with almost zero downtime. It was created by Facebook back in 2008 and was published publicly. Conti… Qubole It’s an open-source big data tool that helps in fetching data in a value of chain using ad-hoc analysis in machine learning. Qubole is a data lake platform that offers end-to- end service with reduced time and effort which are required in moving data pipelines. It is capable of configuring multi-cloud services such as AWS, Azure, and Google Cloud. Besides, it also helps in lowering the cost of cloud computing by 50%. Conti… Xplenty It is a data analytic tool for building a data pipeline by using minimal codes in it. With the help of its interactive graphical interface, it provides solutions for ETL etc. The best part of using Xplenty is its low investment in hardware & software and its offers support via email, chat, telephonic and virtual meetings. Xplenty is a platform to process data for analytics over the cloud and segregates all the data Conti… Spark APACHE Spark is another framework that is used to process data and perform numerous tasks on a large scale. It is widely used among data analysts as it offers easy-to- use APIs that provide easy data pulling methods and it is capable of handling multi-petabytes of data as well. Spark made a record of processing 100 terabytes of data in just 23 minutes which broke the previous world record of Hadoop (71 minutes). This is the reason why big tech giants are moving towards spark now and is highly suitable for ML and AI today. Conti… Mongo DB Came in limelight in 2010, is a free, open-source platform and a document-oriented (NoSQL) database that is used to store a high volume of data. It uses collections and documents for storage and its document consists of key-value pairs which are considered a basic unit of Mongo DB. It is so popular among developers due to its availability for multi-programming languages such as Python, Jscript, and Ruby. Conti… Apache Storm A storm is a robust, user-friendly tool used for data analytics, especially in small companies. The best part about the storm is that it has no language barrier (programming) in it and can support any of them. It was designed to handle a pool of large data in fault- tolerance and horizontally scalable methods. When we talk about real-time data processing, Storm leads the chart because of its distributed real-time big data processing system, due to which today many tech giants are using APACHE Storm in their system. Some of the most notable names are Twitter, Zendesk, NaviSite, etc. Conti… SAS One of the best tools for creating statistical modeling used by data analysts. By using SAS, a data scientist can mine, manage, extract or update data in different variants from different sources. Statistical Analytical System or SAS allows a user to access the data in any format (SAS tables or Excel worksheets). Besides that it also offers a cloud platform for business analytics called SAS Viya and also to get a strong grip on AI & ML, they have introduced new tools and Conti… Data Pine Datapine is an analytical used for BI and was founded back in 2012 (Berlin, Germany). It’s mainly used for data extraction (for small- medium companies fetching data for close monitoring). With the help of its enhanced UI design, anyone can visit and check the data as per their requirement and offer in 4 different price brackets, starting from $249 per month. They do offer dashboards by functions, industry, Conti… Rapid Miner It’s a fully automated visual workflow design tool used for data analytics. It’s a no-code platform and users aren’t required to code for segregating data. Though it’s an open-source platform but has a limitation of adding 10000 data rows and a single logical processor. With the help of Rapid Miner, one can easily deploy their ML models to the web or mobile (only when the user interface is ready to collect real-time Analytic Processes and Tools Big data analytics tools should be able to handle the volume, variety, and velocity of data.
They should also be able to process data in real-
time or near-real-time so that decisions can be made based on the most up-to-date information.
Big Data analytics is a process used to extract
meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Conti… Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things.
Zoho Analytics is a powerful big data analytics tool
that enables you to analyze massive data sets, whether on the cloud or on-premise.
Zoho Analytics can connect to multiple data
sources, including business applications, files and feeds, offline databases, cloud databases, and cloud drives. Conti… Create business dashboards and insightful reports utilizing AI and ML technologies.
Provide key business metrics on-demand with
our robust big data analytics software. Lifecycle Phases of Big Data Analytics Eight Stages are as:
Stage 1 - Business case evaluation
The Big Data analytics lifecycle begins with a business case, which defines the reason and goal behind the analysis.
Stage 2 - Identification of data
Here, a broad variety of data sources are Conti… Stage 3 - Data filtering All of the identified data from the previous stage is filtered here to remove corrupt data.
Stage 4 - Data extraction
Data that is not compatible with the tool is extracted and then transformed into a compatible form. Conti… Stage 5 - Data aggregation In this stage, data with the same fields across different datasets are integrated.
Stage 6 - Data analysis
Data is evaluated using analytical and statistical tools to discover useful information. Conti… Stage 7 - Visualization of data With tools like Tableau, Power BI, and QlikView, Big Data analysts can produce graphic visualizations of the analysis.
Stage 8 - Final analysis result
Here, final results of analysis are made available to business stakeholders who will take action. Different Types of Big Data Analytics Descriptive Analytics Summarizes past data into a form that people can easily read. This helps in creating reports, like a company’s revenue, profit, sales, and so on. Also, it helps in the tabulation of social media metrics.
Use Case: The Dow Chemical Company analyzed its
past data to increase facility utilization across its office and lab space. Using descriptive analytics, Dow was able to identify underutilized space. This space consolidation helped the company save nearly US $4 million annually. Conti… Diagnostic Analytics This is done to understand what caused a problem in the first place. Techniques like drill-down, data mining, and data recovery are all examples. Organizations use diagnostic analytics because they provide an in-depth insight into a particular problem.
Use Case: An e-commerce company’s report shows that
their sales have gone down, although customers are adding products to their carts. This can be due to various reasons like the form didn’t load correctly, the shipping fee is too high, or there are not enough payment options available. This is where you can use diagnostic analytics to find the Conti… Predictive Analytics This type of analytics looks into the historical and present data to make predictions of the future. Predictive analytics uses data mining, AI, and machine learning to analyze current data and make predictions about the future. It works on predicting customer trends, market trends, and so on.
Use Case: PayPal determines what kind of precautions
they have to take to protect their clients against fraudulent transactions. Using predictive analytics, the company uses all the historical payment data and user behavior data and builds an algorithm that predicts fraudulent activities. Conti… Prescriptive Analytics This type of analytics prescribes the solution to a particular problem. Perspective analytics works with both descriptive and predictive analytics. Most of the time, it relies on AI and machine learning.
Use Case: Prescriptive analytics can be used to
maximize an airline’s profit. This type of analytics is used to build an algorithm that will automatically adjust the flight fares based on numerous factors, including customer demand, weather, destination, holiday seasons, and oil prices. Big Data Analytics Tools Some analytics tools : Hadoop - helps in storing and analyzing data
MongoDB - used on datasets that change
frequently
Talend - used for data integration and
management
Cassandra - a distributed database used to
Conti… Spark - used for real-time processing and analyzing large amounts of data
STORM - an open-source real-time computational
system
Kafka - a distributed streaming platform that is used
for fault-tolerant storage
R-Programming - R is a free open source software
programming language and a software environment Conti… Datawrapper- online data visualization tool for making interactive charts
Tableau Public - communicates the insights
of the data through data visualization
Content Grabber - data extraction tool,
suitable for people with advanced programming skills THANK YOU