STREAM PROCESSING 2 Marks Question and Answers
STREAM PROCESSING 2 Marks Question and Answers
1,Stream processing is the need for real-time data analysis and decision-
making. In scenarios where immediate data processing is crucial, such as
fraud detection in banking or real-time monitoring in manufacturing, stream
processing allows for the analysis of data as it arrives, enabling
instantaneous responses.
2 Data Integration
Data migration is the process of selecting, preparing, extracting, and transforming data and
permanently transferring it from one computer storage system to another. Data migration is a
common IT activity.
3, DATA MINING
Data mining is the process of sorting through large data sets to identify patterns and relationships
that can help solve business problems through data analysis. Data mining techniques and tools
help enterprises to predict future trends and make more informed business decisions.
5,Data as a service
Data as a service (DaaS) is a business model where data is made available on demand and
regardless of the consumer's location or infrastructure.service
Unit 2
2, Lambda architecture
L ambda architecture is a data deployment model for processing that consists of a traditional batch data
pipeline and a fast streaming data pipeline for handling real-time data. In addition to the batch layer and
speed layers, Lambda architecture also includes a data serving layer for responding to user queries.
3,BIG DATA
Big data refers to extremely large and diverse collections of structured, unstructured, and semi-
datatructured data that continues to grow exponentially over time.
Real-time analytics
Real-time analytics is the discipline that applies logic and mathematics to data to
provide insights for making better decisions quickly. For some use cases, real time
simply means the analytics is completed within a few seconds or minutes after the
arri
Stream processing
Stream processing allows applications to respond to new data events at the moment
they occur. Rather than grouping data and collecting it at some predetermined
interval, batch processing and stream processing applications collect and process d
A message broker is software that enables applications, systems and services to communicate
with each other and exchange information. The message broker does this by translating messages
Real-time and batch ETL are two data extraction approaches. Real-time ETL
(extract, transform, load) extracts, transforms and loads data as soon as it
becomes available. Batch ETL processes data in batches according to a
predetermined schedule or set of conditions.
Both approaches have their pros and cons. However, which one is best suited for your
business depends on many factors. These factors are speed requirements, the volume
of incoming data, security needs, etc.
By comparing the two approaches, we can understand which one suits our particular
needs. This article will cover each approach individually before looking at them
together. By doing so we should gain further insight into when to opt for either real-time
or batch ETL. Let’s dive right in!
etween formal messaging protocols. immediately as they are generated.
UNIT 3
DATA MODELS AND QUERY LANGUAGES
1, The relational model (RM) is an approach to managing data using a structure and
language consistent with first-order predicate logic, first described in 1969 by English computer
scientist Edgar F. Codd,[1][2] where all data is represented in terms of tuples, grouped
into relations. A database organized in terms of the relational model is a relational database.
The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they
want from it, and let the database management system software take care of describing data
structures for storing the data and retrieval procedures for answering queries
In computing, the network model is a database model conceived as a flexible way of representing
objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in
which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy
or lattice.
The Semantic Web is a vision about an extension of the existing World Wide Web, which
provides software programs with machine-interpretable metadata of the published information
and data. In other words, we add further data descriptors to otherwise existing content and data
on the Web
CODASYL,
CODASYL's members were individuals from industry and government involved in data
processing activity. Its larger goal was to promote more effective data systems analysis, design,
and implementation. The organization published specifications for various languages over the
years, handing these over to official standards bodies (ISO, ANSI, or their predecessors) for
formal standardization.
SPARQL,
pronounced 'sparkle', is the standard query language and protocol for Linked Open
Data on the web or for RDF triplestores. SPARQL, short for “SPARQL Protocol
and RDF Query Language”, enables users to query information from databases or
any data source that can be mapped to RDF.
UNIT 4
EVENT PROCESSING WITH APACHE KAFKA
Producer API
Consumer API
Admin Client API
Connect API
Kafka Streams API