0% found this document useful (0 votes)
66 views8 pages

STREAM PROCESSING 2 Marks Question and Answers

The document covers various aspects of stream processing, data integration, data mining, and data models, emphasizing real-time data analysis and the transition from batch to stream processing. It discusses architectures like Kappa and Lambda, the concept of big data, and different database models including NoSQL and relational databases. Additionally, it introduces Apache Kafka as a platform for event processing and outlines its core APIs for managing streaming applications.

Uploaded by

prakashpoint2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views8 pages

STREAM PROCESSING 2 Marks Question and Answers

The document covers various aspects of stream processing, data integration, data mining, and data models, emphasizing real-time data analysis and the transition from batch to stream processing. It discusses architectures like Kappa and Lambda, the concept of big data, and different database models including NoSQL and relational databases. Additionally, it introduces Apache Kafka as a platform for event processing and outlines its core APIs for managing streaming applications.

Uploaded by

prakashpoint2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

STREAM PROCESSING-CCS368

TWO MARKS QUESTIONS AND ANSWERS

1,Stream processing is the need for real-time data analysis and decision-
making. In scenarios where immediate data processing is crucial, such as
fraud detection in banking or real-time monitoring in manufacturing, stream
processing allows for the analysis of data as it arrives, enabling
instantaneous responses.

The shift from batch processing to stream processing in many domains is


driven by the increasing demand for real-time insights and the growing
volume and velocity of data.

Batch processing vs. stream processing are two different approaches to


handling data. Batch processing involves processing large volumes of data at
once, at scheduled intervals. In contrast, stream processing involves
continuously processing data in real time as it arrives

2 Data Integration
Data migration is the process of selecting, preparing, extracting, and transforming data and
permanently transferring it from one computer storage system to another. Data migration is a
common IT activity.

3, DATA MINING
Data mining is the process of sorting through large data sets to identify patterns and relationships
that can help solve business problems through data analysis. Data mining techniques and tools
help enterprises to predict future trends and make more informed business decisions.

4 Six stages of data preprocessing


 Step 1: Collection. The collection of raw data is the first step of the data processing
cycle. ...
 Step 2: Preparation. ...
 Step 3: Input. ...
 Step 4: Data Processing. ...
 Step 5: Output. ...
 Step 6: Storage.

5,Data as a service
Data as a service (DaaS) is a business model where data is made available on demand and
regardless of the consumer's location or infrastructure.service

Unit 2

1, Kappa Architecture is a software architecture used for processing streaming data.


The main premise behind the Kappa Architecture is that you can perform both real-time
and batch processing, especially for analytics, with a single technology stack. It is
based on a streaming architecture in which an incoming series of data is first stored in a
messaging engine like Apache Kafka. From there, a stream processing engine will read
the data and transform it into an analyzable format, and then store it into an analytics
database for end users to query.

2, Lambda architecture
L ambda architecture is a data deployment model for processing that consists of a traditional batch data
pipeline and a fast streaming data pipeline for handling real-time data. In addition to the batch layer and
speed layers, Lambda architecture also includes a data serving layer for responding to user queries.

3,BIG DATA

Big data refers to extremely large and diverse collections of structured, unstructured, and semi-
datatructured data that continues to grow exponentially over time.

Real-time analytics
Real-time analytics is the discipline that applies logic and mathematics to data to
provide insights for making better decisions quickly. For some use cases, real time
simply means the analytics is completed within a few seconds or minutes after the
arri

Stream processing
Stream processing allows applications to respond to new data events at the moment
they occur. Rather than grouping data and collecting it at some predetermined
interval, batch processing and stream processing applications collect and process d

What is a message broker?

A message broker is software that enables applications, systems and services to communicate
with each other and exchange information. The message broker does this by translating messages

What’s the Difference Between Real-time and Batch ETL?

Real-time and batch ETL are two data extraction approaches. Real-time ETL
(extract, transform, load) extracts, transforms and loads data as soon as it
becomes available. Batch ETL processes data in batches according to a
predetermined schedule or set of conditions.

Both approaches have their pros and cons. However, which one is best suited for your
business depends on many factors. These factors are speed requirements, the volume
of incoming data, security needs, etc.

By comparing the two approaches, we can understand which one suits our particular
needs. This article will cover each approach individually before looking at them
together. By doing so we should gain further insight into when to opt for either real-time
or batch ETL. Let’s dive right in!
etween formal messaging protocols. immediately as they are generated.
UNIT 3
DATA MODELS AND QUERY LANGUAGES

1, The relational model (RM) is an approach to managing data using a structure and
language consistent with first-order predicate logic, first described in 1969 by English computer
scientist Edgar F. Codd,[1][2] where all data is represented in terms of tuples, grouped
into relations. A database organized in terms of the relational model is a relational database.

The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they
want from it, and let the database management system software take care of describing data
structures for storing the data and retrieval procedures for answering queries

2, Document Object Model (DOM)

The Document Object Model (DOM) connects web pages to scripts or


programming languages by representing the structure of a document—such
as the HTML representing a web page—in memory. Usually it refers to
JavaScript, even though modeling HTML, SVG, or XML documents as objects
are not part of the core JavaScript language.

3, KEY VALUE PAIR


A key-value pair consists of two related data elements: A key, which is a constant that defines
the data set (e.g., gender, color, price), and a value, which is a variable that belongs to the
set (e.g., male/female, green, 100). Fully formed, a key-value pair could look like these: gender =
male. color = green.
4, What is a NoSQL database?
NoSQL, also referred to as “not only SQL” or “non-SQL”, is an approach to database design that
enables the storage and querying of data outside the traditional structures found in relational
databases
5, Object–relational mismatch
Object–relational mismatch is a set of difficulties going between data in relational data
stores and data in domain-driven object models. Relational Database Management
Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-
oriented (OO) programming is the default method for business-centric design in programming
languages. The problem lies in neither relational databases nor OO programming, but in the
conceptual difficulty mapping between the two logic models. Both logical models are differently
implementable using database servers, programming languages, design patterns, or other
technologies. Issues range from application to enterprise scale, whenever stored relational data is
used in domain-driven object models, and vice versa. Object-oriented data stores can trade this
prob

6.What is one-to-many and many-to-many relationship?


A many-to-many relationship is really two one-to-many relationships with a third table. A many-
to-many relationship means that for each record in one table there can be many records in
another table and for each record in the second table there can be many in the first.
lem for other implementation difficulties.

7, What are network data models?

In computing, the network model is a database model conceived as a flexible way of representing
objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in
which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy
or lattice.

What is flexible schema?


what "Schema Flexibility" means? Well...it means that you can define a table with some columns
and then dynamically add more columns at run time without the need of redefine the table
structure
Structured query language
Structured query language (SQL) is a programming language for storing and processing information in a
relational database. A relational database stores information in tabular form, with rows and columns
representing different data attributes and the various relationships between the data values.

What Is Data Locality?


The data locality pattern allows us to move computation to data. The data can live in the database
or the file system. The situation is simple as long as our data fits into the disk or memory of our
m
What is Declarative query language
,We just specify the what data we want, but how to achieve that we delegate it to the language
engine. Select * from students where department = "computer science". For example in SQL, the
above program can be written something like. achines. Processing can be local and fast. In big
data applications,

Graph data model


The graph data model is often referred to as being whiteboard-friendly. Typically, when
designing a data model, people draw example data on the whiteboard and connect it to other data
drawn to show how different items connect. The whiteboard model is then re-formatted and
structured to fit normalized tables for a relational model.

CYPHER QUERY LANGUAGE


Cypher is a declarative graph query language that allows for expressive and efficient data
querying in a property graph. Cypher was largely an invention of Andrés Taylor while working

What is a graph query?


Graph queries, for the most part, attempt to identify an explicit pattern within the graph database.
Graph queries have an expressive power to return something at the level of an analytic in a
normal data processing system.
The Semantic Web

The Semantic Web is a vision about an extension of the existing World Wide Web, which
provides software programs with machine-interpretable metadata of the published information
and data. In other words, we add further data descriptors to otherwise existing content and data
on the Web

CODASYL,

The Conference/Committee on Data Systems Languages, was a consortium formed in 1959 to


guide the development of a standard programming language that could be used on
many computers. This effort led to the development of the programming language COBOL, the
CODASYL Data Model, and other technical standards.

CODASYL's members were individuals from industry and government involved in data
processing activity. Its larger goal was to promote more effective data systems analysis, design,
and implementation. The organization published specifications for various languages over the
years, handing these over to official standards bodies (ISO, ANSI, or their predecessors) for
formal standardization.

SPARQL,
pronounced 'sparkle', is the standard query language and protocol for Linked Open
Data on the web or for RDF triplestores. SPARQL, short for “SPARQL Protocol
and RDF Query Language”, enables users to query information from databases or
any data source that can be mapped to RDF.

UNIT 4
EVENT PROCESSING WITH APACHE KAFKA

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source


system developed by the Apache Software Foundation written in Java and Scala. The project aims
to provide a unified, high-throughput, low-latency platform for handling real-time data feeds
Kafka APIs
Apache Kafka® provides five core Java APIs to enable cluster and client management.
This article explores the different Kafka APIs and how they can be used to build and
manage powerful streaming applications.

 Producer API
 Consumer API
 Admin Client API
 Connect API
 Kafka Streams API

 What is admin API?


The Admin API provides programmatic access to several of the App Engine administrative
operations that are found in the Google Cloud console

What is producer API?


Producers publish (write) a stream of events to one or more Kafka topics. The Producer API
enables developers to create their own producers that write data to Kafka topics. The API
provides several options for configuring the behavior of the producer, such as setting the number
of acknowledgments required before considering a message as sent, or setting compression
options to reduce the size of messages.

You might also like