UNIT-1 BigData
UNIT-1 BigData
A Big Data Platform functions as a structured repository for large volumes of data. These
platforms utilize a combination of data management hardware and software tools to store
and manage aggregated data sets, often in the cloud. They organize and maintain this
extensive information in a coherent and accessible manner to derive meaningful insights,
with the integration of Big Data and AI playing a significant role in enhancing data
processing and analysis.
Typically, these platforms blend various Data Management tools to handle data on a large
scale, usually leveraging cloud storage.
b) Apache Spark: Apache Spark is known for its speed and efficiency in analysing data.
It's like a powerful tool that helps organisations quickly make sense of their data and
extract valuable insights from it.
c) Apache Flink: Apache Flink is another data processing platform, similar to Spark, that
specialises in real-time Data Analysis. It's used for tasks where speed and low latency are
critical, like monitoring online activities or financial transactions.
d) Amazon Web Services (AWS) Big Data services: AWS offers a suite of Big Data
services that run in the cloud. These services make it easier for companies to store,
process, and analyse data without the need for extensive infrastructure management.
e) Google Cloud Platform (GCP) Big Data services: Similar to AWS, Google Cloud
Platform provides a range of Big Data services in the cloud. These services help
organisations leverage Google's computing power and data analytics capabilities.
f) Microsoft Azure Big Data services: Microsoft Azure offers various Big Data
services, including data storage, processing, and analytics tools. These services are
designed to help businesses work with their data efficiently and effectively.
Intelligent Data Analysis (IDA)
Intelligent Data Analysis (IDA) refers to advanced methods for analyzing large
datasets to identify patterns, trends, and relationships. It combines techniques from
fields such as statistics, machine learning, and artificial intelligence to extract
meaningful insights from raw data.
Benefits of IDA
Tableau:
A user-friendly data visualization platform known for its drag-and-drop interface
and ability to handle large datasets.
Excel:
A widely accessible tool for basic data cleaning, manipulation, and visualization.
Python:
A versatile programming language with extensive data analysis libraries like Pandas
and Matplotlib for data manipulation and visualization.
Apache Spark:
An open-source big data processing engine suitable for large-scale data analysis and
real-time streaming.
Qlik:
A platform for interactive data exploration and analysis with strong data integration
features.
SAS:
A comprehensive statistical analysis software with advanced capabilities for
predictive modeling and business intelligence.
Google Analytics:
A web analytics tool that tracks website traffic and user behavior.
Reporting Vs analytics?
Reporting
Data reporting is about taking the available information (e.g. your dataset), organizing
it, and displaying it in a well-structured and digestible format we call “reports”. You
can present data from various sources, making it available for anyone to analyze it.
Reporting is a great way to help the internal teams and experts answer the question
of what is happening.
Analytics
Analytics is about diving deeper into your data and reports in order to look for
insights. It’s actually an attempt to answer why something is happening. Analytics
powers up decision-making as the main goal is to make sense of the data explaining
the reason behind the reported numbers.
Analytics vs. reporting: Key differences
Reporting Analytics
Types of reports
Long Reports: Long reports are usually longer than 10 pages and are typically used
on formal occasions.
Short Reports: Short reports are the exact opposite. They are less than 10 pages and
tend to withhold less data, usually shared in informal occasions (e.g. quickly sharing a
set of data).
Internal Reports: Internal reports are created and shared either within the same
organization or even the same department.
External Reports: External reports are built with the aim of being shared outside the
organization.
Vertical Reports: Vertical reports are typically internal reports that are shared across
different levels of the hierarchy of the organization (e.g. sharing a report with your
manager or stakeholders).
Lateral Reports: Lateral reports are the ones that are shared horizontally within the
organization. Take, for example, a report shared between two different departments
(e.g. HR and finance).
Periodic Reports: We call periodic reports the ones that are created periodically (e.g.
on a monthly basis). The report will have the exact same format, but the data within
are changing based on the interval.
Examples of reports
Types of analytics
Descriptive: Descriptive analytics is when you assess historical data and try to identify specific patterns.
The main goal is to answer what happened and if it was expected or not, making comparisons with other
timeframes.
Diagnostic: When we know what’s going on, the next step is to understand why. So
you may have performed some descriptive analytics techniques and you were able to
identify that sales went up by 12%. Diagnostic analytics is there to help identify why
this happened and what actually worked for your business.
Predictive: Predictive analytics involves sophisticated techniques that can help you
use the patterns observed and make forecasts about future performance, e.g., financial
data analytics. While this may require specific expertise, it’s extremely useful in order
to be better prepared for the future.
Prescriptive: Last but not least, prescriptive analytics techniques can help you
identify the best course of action. This type of analytics is frequently used by
marketers to draft their strategies and achieve better results.