Reviewerku

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Chapter 1: Understanding Big Data

Concepts and Terminology

1. Datasets: Collections or groups of related data. Examples:


o Tweets in a file
o Image files in a directory
o Database rows in a CSV file
o Weather observations in XML files
2. Data Analysis: The process of examining data to find facts, relationships, patterns,
insights, and trends to support better decision-making. Example: Analyzing ice cream
sales data to determine how sales relate to temperature.
3. Data Analytics: A broader discipline that includes data analysis, management of the
complete data lifecycle, development of analysis methods, and use of scalable distributed
technologies.
o Descriptive Analytics: Answers questions about past events (e.g., sales volume
over 12 months).
o Diagnostic Analytics: Determines the cause of past events (e.g., why Q2 sales
were less than Q1 sales).
o Predictive Analytics: Predicts future events based on past data (e.g., chances of a
customer defaulting on a loan).
o Prescriptive Analytics: Suggests actions based on predictive analytics (e.g., the
best time to trade a stock).
4. Business Intelligence (BI): Analyzes data from business processes to gain insights and
improve organizational performance.
5. Key Performance Indicators (KPI): Metrics used to gauge success within a business
context, often displayed on dashboards.

Big Data Characteristics

1. Volume: The substantial amount of data processed, requiring significant storage and
processing capacity.
2. Velocity: The speed at which data is generated and processed. High-velocity data
requires quick and efficient processing solutions.
3. Variety: The different formats and types of data (structured, unstructured, semi-
structured).
4. Veracity: The quality or fidelity of data. High-quality data (high signal-to-noise ratio) is
crucial for accurate analysis.
5. Value: The usefulness of data for an enterprise. Value depends on the quality and
timeliness of data processing and analysis.

Different Types of Data

1. Structured Data: Conforms to a data model or schema and is often stored in tabular
form (e.g., banking transactions, customer records).
2. Unstructured Data: Does not conform to a data model or schema. Examples include text
files (tweets, blog posts) and binary files (images, videos).
3. Semi-structured Data: Has some structure but is not relational. Examples include XML
and JSON files, EDI files, spreadsheets, RSS feeds.
4. Metadata: Provides information about a dataset’s characteristics and structure (e.g.,
XML tags, file attributes).

Key Points Summary

1. Datasets are collections of related data.


2. Data Analysis helps find insights and support decision-making.
3. Data Analytics encompasses the entire data lifecycle and includes various types of
analytics.
4. Business Intelligence (BI) and Key Performance Indicators (KPI) provide insights
into business performance.
5. Big Data Characteristics: Volume, Velocity, Variety, Veracity, and Value.
6. Types of Data: Structured, Unstructured, Semi-structured, and Metadata.

Concepts and Terminology

 Big Data: Field focused on analyzing, processing, and storing large datasets from various
sources when traditional methods are inadequate.
 Big Data Characteristics: Involves large, diverse, complex, and often unstructured data.

Different Types of Data

 Structured Data: Data that is organized and easily searchable (e.g., databases).
 Unstructured Data: Data without a pre-defined format (e.g., text, images).
 Semi-structured Data: Contains both structured and unstructured elements (e.g., JSON
files).

Big Data Analytics Lifecycle

 Involves identifying, procuring, preparing, and analyzing large amounts of raw,


unstructured data to extract meaningful information.

Types of Analytics

1. Descriptive Analytics
o Answers questions about past events.
o Example questions: What was the sales volume over the past 12 months?
o Provides historical data insights via reports or dashboards.
2. Diagnostic Analytics
o Determines the cause of past events.
o Example questions: Why were Q2 sales less than Q1 sales?
o Involves complex queries and multi-dimensional data analysis.
3. Predictive Analytics
o Predicts future events based on past data.
o Example questions: What are the chances a customer will default on a loan if they
miss a payment?
o Uses models to forecast outcomes and identify risks/opportunities.
4. Prescriptive Analytics
o Suggests actions based on predictive analytics results.
o Example questions: Which drug provides the best results among three options?
o Involves simulation of scenarios and incorporates both internal and external data.

Applications and Benefits of Big Data

 Operational Optimization
 Actionable Intelligence
 Identification of New Markets
 Accurate Predictions
 Fault and Fraud Detection
 More Detailed Records
 Improved Decision-Making
 Scientific Discoveries

Business Intelligence (BI)

 Definition: BI helps organizations gain insights into performance by analyzing data from
business processes and information systems.
 Purpose: Enables management to correct issues and enhance organizational performance.
 Data Consolidation: Typically, data is consolidated into an enterprise data warehouse
for analytical queries.
 Dashboards: BI outputs are often displayed on dashboards for easy access and analysis
by managers.

Key Performance Indicators (KPI)

 Definition: Metrics used to gauge success within a specific business context.


 Alignment: Linked to strategic goals and objectives.
 Usage: Identifies business performance problems and demonstrates regulatory
compliance.
 KPI Dashboard: Displays multiple KPIs and compares actual measurements with
threshold values.

Different Types of Data

1. Human-Generated Data
o Definition: Data resulting from human interaction with systems.
oExamples: Social media, blog posts, emails, photo sharing.
2. Machine-Generated Data
o Definition: Data generated by software programs and hardware devices.
o Examples: Web logs, sensor data, telemetry data.

Data Types

1. Structured Data
o Definition: Data that conforms to a data model or schema.
o Storage: Often stored in relational databases.
o Examples: Banking transactions, invoices, customer records.
2. Unstructured Data
o Definition: Data that does not conform to a data model or schema.
o Growth: Makes up 80% of enterprise data and has a faster growth rate.
o Examples: Text files, media files (video, image, audio).
3. Semi-Structured Data
o Definition: Data with a defined level of structure but not relational.
o Formats: Hierarchical or graph-based.
o Examples: XML, JSON, sensor data.
4. Metadata
o Definition: Information about a dataset’s characteristics and structure.
o Importance: Crucial for processing, storage, and analysis in Big Data
environments.
o Examples: XML tags, attributes of a digital photograph.

Chapter 2: Business Motivations and Drivers for Big Data Adoption

Marketplace Dynamics

The evolving business landscape has forced companies to shift from internal efficiency and cost-
cutting to external focus and innovation. This change, driven by market disruptions and
economic fluctuations, emphasizes the need for companies to leverage external data and
advanced analytics.

Key Points:

 Economic crises like the dot-com bubble burst in 2000 and the global recession in 2008
prompted businesses to improve efficiency and cut costs.
 Post-recession, companies have focused on innovation to gain competitive advantages
and grow revenue.
 Integration of external data with internal data enhances business intelligence and
decision-making.

Business Architecture
Modern enterprise architecture increasingly integrates business architecture with technology
architecture. This holistic approach aligns strategic vision with operational execution through
well-defined linkages between abstract and concrete business elements.

Key Points:

 Business architecture includes elements like mission, vision, strategy, and key
performance indicators (KPIs).
 A layered system approach divides the organization into strategic, tactical, and
operational layers, each with distinct roles and metrics.
 Big Data enriches business architecture by providing context and insights across
organizational layers.

Business Process Management (BPM)

BPM focuses on optimizing business processes to deliver value efficiently. Big Data and
intelligent BPM systems enhance process management by enabling adaptive, goal-driven
execution.

Key Points:

 Business processes describe work activities, relationships, and responsible actors within
an organization.
 BPM systems (BPMS) integrate process models, organizational roles, business rules, and
user interfaces to create cohesive business applications.
 Combining Big Data analytics with BPM allows for adaptive and responsive process
execution.

Information and Communications Technology (ICT)

Advancements in ICT have accelerated Big Data adoption by providing essential tools and
environments for data analytics, digitization, and scalable computing.

Key Points:

 Data analytics and data science techniques are crucial for extracting insights from large
datasets.
 Digitization replaces physical mediums with digital ones, facilitating data collection and
interaction.
 Affordable technology and commodity hardware make Big Data solutions accessible to
businesses of all sizes.
 Social media and hyper-connected devices provide rich data sources for analysis.
 Cloud computing offers scalable, on-demand resources for Big Data processing.

Internet of Everything (IoE)


The IoE represents the convergence of ICT advancements, marketplace dynamics, business
architecture, and BPM, creating opportunities for innovation and optimized processes.

Key Points:

 IoE combines smart connected devices with business processes to create unique value
propositions.
 Big Data is central to IoE, enabling real-time analysis and decision-making.
 Examples like precision agriculture demonstrate how IoE and Big Data can optimize
workflows and generate value.

You might also like