0% found this document useful (0 votes)
7 views10 pages

UNIT-1 BigData

A Big Data Platform serves as a repository for large data volumes, utilizing hardware and software tools for data management, often in the cloud. Key features include scalability, distributed processing, real-time stream computing, and advanced analytics capabilities. The document also discusses the data analytics process, popular tools, and the distinctions between reporting and analytics.

Uploaded by

Vimal Cheyyar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

UNIT-1 BigData

A Big Data Platform serves as a repository for large data volumes, utilizing hardware and software tools for data management, often in the cloud. Key features include scalability, distributed processing, real-time stream computing, and advanced analytics capabilities. The document also discusses the data analytics process, popular tools, and the distinctions between reporting and analytics.

Uploaded by

Vimal Cheyyar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Big Data Platform

A Big Data Platform functions as a structured repository for large volumes of data. These
platforms utilize a combination of data management hardware and software tools to store
and manage aggregated data sets, often in the cloud. They organize and maintain this
extensive information in a coherent and accessible manner to derive meaningful insights,
with the integration of Big Data and AI playing a significant role in enhancing data
processing and analysis.
Typically, these platforms blend various Data Management tools to handle data on a large
scale, usually leveraging cloud storage.

Features of Big Data Platforms


Big Data Platforms are designed to handle and analyse vast amounts of data efficiently.
Let’s explore some key features you can expect from these platforms:
1) Scalability: They can scale horizontally to manage increasing volumes of data without
compromising performance.
2) Distributed Processing: They use distributed computing to process large datasets
across multiple nodes, ensuring faster data processing.
3) Real-time Stream Computing: Capable of processing data in real-time, that is crucial
for applications requiring immediate insights.
4) Machine Learning and Advanced Analytics: They offer built-in tools for Machine
Learning and Advanced Analytics to derive actionable insights from data.
5) Data Analytics and Visualisation: Provide tools for Data Analysis and visualisation
to help users make sense of complex data.

Functions of Big Data Platforms


Big Data Platforms follow a structured process to enable companies to harness data for
informed decision-making. This process involves several key steps:
a) Data Collection: This initial step systematically gathers data from various sources
such as databases, social media, and sensors. Methods like web scraping, data feeds,
APIs, and data integration tools are used to collect data, which is then stored in a central
repository, often a data lake or warehouse, for easy access and further analysis.
b) Data Storage: After collection, data must be stored efficiently for retrieval and
processing. Big Data Platforms typically use distributed storage systems like Hadoop
Distributed File System (HDFS), Google Cloud Storage, or Amazon S3. Understanding
the differences between Hadoop vs MapReduce is important in this context, as Hadoop
provides the storage framework while MapReduce handles the processing. This
architecture ensures high availability, fault tolerance, and scalability.
c) Data Processing: Collected data is processed to extract valuable insights through
operations such as cleaning, transforming, and aggregating. Platforms like Apache
Hadoop and Apache Spark enable rapid computations and complex data transformations.
d) Data Analysis: This step involves examining and interpreting large data volumes to
extract meaningful insights and patterns using machine learning algorithms, data mining
techniques, or visualisation tools. The results inform data-driven decisions, optimise
processes, and identify opportunities.
e) Data Quality Assurance: Ensuring data accuracy, consistency, integrity, relevance,
and security is crucial. Techniques like data quality management, lineage tracking, and
cataloguing help maintain robust data quality, giving organisations confidence in their
decision-making data.
f) Data Management: This involves organising, storing, and retrieving large data
volumes. Techniques such as data backup, recovery, and archiving ensure fault tolerance
and optimised data retrieval for various use cases.

Figure.1 Functions of Big data platform

Popular Big Data Platforms


Big Data Platforms are capable of handling massive amounts of data and turning it into
some valuable information. Here, we'll introduce you to a list of those platforms:
Figure 2. Popular Big data platforms
a) Apache Hadoop: Apache Hadoop is an excellent platform for keeping and processing
large volumes of data. It's like a robust storage and data processing system that
companies use to handle and manage massive datasets.

b) Apache Spark: Apache Spark is known for its speed and efficiency in analysing data.
It's like a powerful tool that helps organisations quickly make sense of their data and
extract valuable insights from it.

c) Apache Flink: Apache Flink is another data processing platform, similar to Spark, that
specialises in real-time Data Analysis. It's used for tasks where speed and low latency are
critical, like monitoring online activities or financial transactions.

d) Amazon Web Services (AWS) Big Data services: AWS offers a suite of Big Data
services that run in the cloud. These services make it easier for companies to store,
process, and analyse data without the need for extensive infrastructure management.

e) Google Cloud Platform (GCP) Big Data services: Similar to AWS, Google Cloud
Platform provides a range of Big Data services in the cloud. These services help
organisations leverage Google's computing power and data analytics capabilities.

f) Microsoft Azure Big Data services: Microsoft Azure offers various Big Data
services, including data storage, processing, and analytics tools. These services are
designed to help businesses work with their data efficiently and effectively.
Intelligent Data Analysis (IDA)

Intelligent Data Analysis (IDA) refers to advanced methods for analyzing large
datasets to identify patterns, trends, and relationships. It combines techniques from
fields such as statistics, machine learning, and artificial intelligence to extract
meaningful insights from raw data.

Features of Intelligent Data Analysis

1. Pattern Recognition: IDA helps identify trends, patterns, and anomalies in


datasets that might be overlooked by traditional analysis methods.
2. Forecasting: Based on historical data, IDA enables the prediction of future
events, which is crucial for areas such as production planning and predictive
maintenance.
3. Decision Support: Through IDA, businesses can gain data-driven insights
that support more informed decision-making, providing a solid foundation for
operational strategies.

Benefits of IDA

1. Better Decisions: Companies can make informed decisions based on accurate


and up-to-date data analysis.
2. Competitive Advantage: By identifying market opportunities, trends, and
risks, businesses can gain a competitive edge.
3. Increased Efficiency: IDA helps optimize business processes by identifying
inefficiencies and improving overall operations.

Need for big data


Big data is important because it helps businesses understand and use data to make
better decisions, improve operations, and identify new opportunities.
 Faster decision making: Analyze real-time data to make decisions and respond to
market changes quickly
 Improved operations: Streamline resource management and improve operational
efficiency
 Better product development: Understand customer needs and prioritize features
 Increased market intelligence: Track purchase behavior and market trends
 Risk management: Plan risk management strategies and identify potential risks
 Fraud detection: Analyze transaction data to detect suspicious patterns and prevent
fraud

Data analytics process and Tools


The data analytics process typically involves defining a business question, collecting
data, cleaning and preparing it, analyzing the data, visualizing insights, and
interpreting the results; common tools used in this process include: Power BI,
Tableau, Excel, Python, Apache Spark, Qlik, SAS, and Google Analytics, which can
be used for data manipulation, visualization, statistical analysis, and machine learning
depending on the specific needs of the analysis.
Data analytics process invloves:
Define the business question:
Clearly identify the problem or question you want to answer with data analysis.
Data collection:
Gather relevant data from various sources like databases, APIs, surveys, or web
scraping.
Data cleaning and preparation:
Remove inconsistencies, missing values, and outliers to ensure data quality.
Exploratory data analysis (EDA):
Analyze the data to understand patterns, distributions, and relationships between
variables through visualization techniques.
Feature engineering:
Create new variables or modify existing ones to improve the predictive power of the
model.
Model building and analysis:
Apply statistical methods, machine learning algorithms, or other analytical
techniques to extract insights.
Data visualization:
Present findings in a clear and understandable way using charts, graphs, and
dashboards.
Interpretation and communication:
Explain the results, draw conclusions, and communicate insights to stakeholders.
Popular data analytics tools:

Data analytics process and Tools


Microsoft Power BI:
A comprehensive business intelligence tool with strong data visualization
capabilities and integration with other Microsoft products.

Tableau:
A user-friendly data visualization platform known for its drag-and-drop interface
and ability to handle large datasets.

Excel:
A widely accessible tool for basic data cleaning, manipulation, and visualization.

Python:
A versatile programming language with extensive data analysis libraries like Pandas
and Matplotlib for data manipulation and visualization.

Apache Spark:
An open-source big data processing engine suitable for large-scale data analysis and
real-time streaming.

Qlik:
A platform for interactive data exploration and analysis with strong data integration
features.
SAS:
A comprehensive statistical analysis software with advanced capabilities for
predictive modeling and business intelligence.

Google Analytics:
A web analytics tool that tracks website traffic and user behavior.

Reporting Vs analytics?

Reporting

Data reporting is about taking the available information (e.g. your dataset), organizing
it, and displaying it in a well-structured and digestible format we call “reports”. You
can present data from various sources, making it available for anyone to analyze it.

Reporting is a great way to help the internal teams and experts answer the question
of what is happening.

Analytics

Analytics is about diving deeper into your data and reports in order to look for
insights. It’s actually an attempt to answer why something is happening. Analytics
powers up decision-making as the main goal is to make sense of the data explaining
the reason behind the reported numbers.
Analytics vs. reporting: Key differences

the key differences across three pillars:

Reporting Analytics

Focuses on why something is


Purpose Focuses on what is happening
happening

Cleaning, organizing and Exploring, analyzing, and questioning


Tasks
summarizing your data your data

Transforms your data into Transforms the information into


Value
information insights & recommendations.

Types of reports

 Long Reports: Long reports are usually longer than 10 pages and are typically used
on formal occasions.
 Short Reports: Short reports are the exact opposite. They are less than 10 pages and
tend to withhold less data, usually shared in informal occasions (e.g. quickly sharing a
set of data).
 Internal Reports: Internal reports are created and shared either within the same
organization or even the same department.
 External Reports: External reports are built with the aim of being shared outside the
organization.
 Vertical Reports: Vertical reports are typically internal reports that are shared across
different levels of the hierarchy of the organization (e.g. sharing a report with your
manager or stakeholders).
 Lateral Reports: Lateral reports are the ones that are shared horizontally within the
organization. Take, for example, a report shared between two different departments
(e.g. HR and finance).
 Periodic Reports: We call periodic reports the ones that are created periodically (e.g.
on a monthly basis). The report will have the exact same format, but the data within
are changing based on the interval.

Examples of reports

 Financial Reports: Financial reports provide an overview of the most important


financial metrics of the organization (e.g. profits and losses, expenses and revenue).
 Marketing Reports: Marketing reports are used to evaluate marketing efforts.
Usually containing how much money was invested and what the return was in terms
of traffic.
 Sales Reports: Sales reports focus more on the revenue side of things. Usually, they
report the number of sales, the revenue, and items sold.
 Management Reports: The main goal of management reporting is to provide the
information investors need. This may include the return on investment, the share
price, and profits and losses for a given time frame.
Learn more about white-label reporting in our dedicated guide with examples.

Types of analytics

 Descriptive: Descriptive analytics is when you assess historical data and try to identify specific patterns.
The main goal is to answer what happened and if it was expected or not, making comparisons with other
timeframes.
 Diagnostic: When we know what’s going on, the next step is to understand why. So
you may have performed some descriptive analytics techniques and you were able to
identify that sales went up by 12%. Diagnostic analytics is there to help identify why
this happened and what actually worked for your business.
 Predictive: Predictive analytics involves sophisticated techniques that can help you
use the patterns observed and make forecasts about future performance, e.g., financial
data analytics. While this may require specific expertise, it’s extremely useful in order
to be better prepared for the future.
 Prescriptive: Last but not least, prescriptive analytics techniques can help you
identify the best course of action. This type of analytics is frequently used by
marketers to draft their strategies and achieve better results.

You might also like