LABORATORY RECORD
NAME :
REGISTER NUMBER :
YEAR/SEMESTER : III/VI
DEPARTMENT :ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
SUBJECT : CCS368 STREAM PROCESSING
ACADEMIC YEAR : 2024-2025(EVEN)
10
Certified to be the bonafide record of work done by Ms.________________________
Register Number ____________________________ of III Year VI Semester,
B.TECH Artificial Intelligence and Data Science course in the practical CCS368 –
STREAM PROCESSING Laboratory during the academic year 2024-2025.
Faculty in-charge Head of the Department
Submitted for the UNIVERSITY PRACTICAL EXAMINATION held at Arunachala College of Engineering
for Women on …………………
Internal Examiner External Examiner
11
INDEX
S.NO. DATE EXPERIMENT NAME PAGE SIGNATURE
NO.
1.
INSTALL MONGODB
2. DESIGN AND IMPLEMENT
SIMPLEAPPLICATION USING
MONGODB
3. QUERY THE DESIGNED SYSTEM
USING MONGODB
4. CREATE A EVENT STREAM WITH
APACHE KAFKA
5. CREATE A REAL-TIME STREAM
PROCESSING APPLICATION USING
SPARK
6.
BUILD A MICRO-BATCH
APPLICATION
7. REAL-TIME FRAUD AND ANOMALY
DETECTION
8. REAL-TIME PERSONALIZATION,
MARKETING AND ADVERTISING
12
Exp.no: 1
Date:
INSTALL MONGODB
AIM:
To install MongoDB and explore the various protocols.
INSTALL MONGODB
To install MongoDB, begin by visiting the official MongoDB website and navigate to the 'Downloads' section.
Select the appropriate version of MongoDB for your operating system (Windows, macOS, or Linux) and
download the installer package. Once the download is complete, follow the installation instructions provided
by MongoDB. For most operating systems, this involves running the installer package and following the
prompts in the installation wizard. MongoDB is a leading NoSQL database solution renowned for its flexibility,
scalability, and ease of use. It employs a document-oriented data model, storing data in JSON-like documents,
which allows for seamless integration with modern development practices. With its distributed architecture,
MongoDB excels in handling large volumes of data and high throughput applications. Its powerful querying
capabilities, including support for complex aggregations and secondary indexes, make it suitable for a wide
range of use cases, from content management to real-time analytics.
Installing Wireshark on Windows:
Follow the below steps to install MongoDB on Windows:
Step 1: Visit the official MongoDB website using any web browser.
1
Step 2: Click on Download, a new webpage will open with different installers of MongoDB.
Step 3: Downloading of the executable file will start shortly. It is a 64 bit file that will take some time.
Step 4: Now check for the terms and conditions and give it, I agree and click next
2
Step 5: It will prompt confirmation to make changes to your system. Click on Next.
Step 6: Setup screen will appear, click on Install.
Step 7: The next screen will be Installing screen.
Step 8: Next step, go to command prompt, and type mangodb –version. The version will be displayed.
3
Step 9: The next step is setting path. Go to windows and search environment variable.
Step 10: Choose the path option, click on new and copy the path link and click ok. Now the MongoDB installed
successfully.
RESULT:
Thus, MongoDB has been installed and the protocols have been verified successfully.
4
Exp.no: 2
Date:
DESIGN AND IMPLEMENT SIMPLE APPLICATION USING MONGODB
AIM:
To Design and implement simple application using MongoDB
PROCEDURE:
Step 1: Application Requirements
Let's say we're building a web-based task management application called "Taskify."
Data Model:
Each task will have fields such as title, description, dueDate, priority, and status.
We’ll have a collection named tasks to store task documents, wach representing a single task.
User Interaction:
Users will log in to the Taskify website.
Upon logging in, they’ll see a dashboard displaying a list of their tasks.
From the dashboard, users can add new tasks,, view task details, update task information, mark tasks
as completed, and delete tasks.
Users can filter tasks based on status (pending/completed) or priority level using deopdown filters.
Users can search foe tasks by entering keywords in a search bar.
Users can create projects and categorize tasks under each project.
Key Functionalities:
Task Management: Users can perform CRUD operations on tasks.
Task Filtering: Users can filter tasks based on status and priority.
Search: Users can search for tasks by title or description.
Project Management: Users can create projects and organize tasks within them.
Step 2: Set Up MongoDB
1. Install MongoDB on your system if you haven't already.
2. Start the MongoDB server.
3. Connect to MongoDB using a MongoDB client or shell.
5
Step 3: Design Data Model
1. Users Collection:
Each user in the system will have a unique identifier (_id).
User documents will contain fields like username, email, and password for authentication
purposes.
Optionally, additional fields like fullName or profilePicture can be included.
2. Tasks Collection:
Each task will have a unique identifier (_id).
Task documents will contain fields such as title, description, dueDate, priority, status, and
userId to associate tasks with users.
Optionally, we can include fields like projectId to associate tasks with projects if project
management functionality is implemented.
Step 4: Create a New Database and Collections
Use the MongoDb shell or client to create a new database and collections based on your data model.
Step 5: Implement CRUD Operations
Write functions or methods to perform CRUD (Create, Read, Update, Delete) operations on your MongoDB
collections. Here's a small algorithm for each operation:
6
Create:
Read:
Update:
Delete:
Step 6: Connect application to MongoDB
Integrate your application with MongoDB by using a MongoDB driver for your programming language (e.g.,
pymongo for Python).
Step 7: Implement Application Logic
7
Write the logic of the application using the CRUD operations defined earlier. Handle user input, perform data
validation, and execute database operations.
Step 8: Test the Application
Test the application thoroughly to ensure that it functions correctly and handles edge cases gracefully.
Step 9: Deploy the Application
Deploy the application to a production environment, making sure it's accessible to users.
OUTPUT:
RESULT:
Thus, the design and implementation of simple application in MongoDB is executed and verified successfully.
8
Exp.no: 3
Date:
QUERY THE DESIGNED SYSTEM USING MONGODB
AIM:
To query the designed system using MongoDB.
ALGORITHM:
Step 1: Connect to MongoDB
Step 2: Choose a Collection
Step 3: Choose a Query Method
Step 4: Construct Query Parameters
Step 5: Execute the Query
PROGRAM:
Step 1: Connect to MongoDB
const mongoose = require('mongoose');
mongoose.connect('mongodb://localhost:27017/taskify', { useNewUrlParser: true, useUnifiedTopology: true })
.then(() => console.log('Connected to MongoDB'))
.catch(error => console.error('Error connecting to MongoDB:', error));
Step 2: Choose a Collection
const Task = mongoose.model('Task', {
title: String,
description: String,
9
dueDate: Date,
priority: String,
status: String,
userId: mongoose.Schema.Types.ObjectId
});
Step 3: Choose a Query Method
Use the ‘find()’ method to retrieve tasks that match our criteria.
Step 4: Construct Query Parameters
const query = { priority: 'High' };
Step 5: Execute the Query
Task.find(query)
.then(tasks => {
// Handle the results
console.log('Tasks with priority High:', tasks);
})
.catch(error => {
console.error('Error querying tasks:', error);
});
OUTPUT:
Tasks with priority High: [
_id: 607c87122bb8541b90831a68,
title: 'Complete project proposal',
description: 'Write a detailed proposal for the upcoming project.',
10
dueDate: 2024-05-10T00:00:00.000Z,
priority: 'High',
status: 'Pending',
userId: 607c86fd2bb8541b90831a67
},
_id: 607c872f2bb8541b90831a69,
title: 'Review code changes',
description: 'Review and provide feedback on the latest code changes.',
dueDate: 2024-05-15T00:00:00.000Z,
priority: 'High',
status: 'Pending',
userId: 607c86fd2bb8541b90831a67
RESULT:
Thus, the experiment Query the designed system using MongoDB is executed and verified successfully.
11
Exp.no: 4
Date:
CREATE A EVENT STREAM WITH APACHE KAFKA
AIM:
To create a event stream with Apache Kafka
PROCEDURE:
Step 1 : Install Apache Kafka
Download and install Apache Kafka from the official website: https://kafka.apache.org/downloads
Follow the installation instructions provided in the documentation for your operating system.
Step 2: Start ZooKeeper
Kafka depends on ZooKeeper for coordination. Start ZooKeeper by running the following command in the Kafka
installation directory:
bin/zookeeper-server-start.sh config/zookeeper.properties
Step 3: Start Kafka Server
Start the Kafka server by running the following command in the Kafka installation directory:
bin/kafka-server-start.sh config/server.properties
Step 4: Create a Topic
Create a Kafka topic to represent your event stream. Topics are used to categorize events. Run the following
command to create a topic named "events":
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic events
Step 5: Produce Events
Write a Kafka producer application to publish events to the "events" topic. Here's an example using Python and
the confluent_kafka library:
12
from confluent_kafka import Producer
def delivery_callback(err, msg):
if err:
print('Message delivery failed:', err)
else:
print('Message delivered to', msg.topic())
p = Producer({'bootstrap.servers': 'localhost:9092'})
# Produce events
for i in range(10):
p.produce('events', f'Event {i}', callback=delivery_callback)
p.flush()
Step 6: Consume Events
Write a Kafka consumer application to subscribe to the "events" topic and process the events. Here's an example
using Python and the confluent_kafka library:
from confluent_kafka import Consumer, KafkaError
c = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'my_consumer_group',
'auto.offset.reset': 'earliest'
})
c.subscribe(['events'])
13
try:
while True:
msg = c.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
if msg.error().code() == KafkaError._PARTITION_EOF:
# End of partition
print('%% %s [%d] reached end at offset %d\n' %
(msg.topic(), msg.partition(), msg.offset()))
elif msg.error():
raise KafkaException(msg.error())
else:
print('Received message: {}'.format(msg.value().decode('utf-8')))
except KeyboardInterrupt:
pass
finally:
# Leave group and commit final offsets
c.close()
Step 7: Run producer and consumer
Run the producer application to publish events to the "events" topic.
Run the consumer application to subscribe to the "events" topic and consume events.
14
OUTPUT:
RESULT:
Thus, the experiment to create a event stream with Apache Kafka is executed and verified successfully.
15
Ex No:5
Date:
Create a Real-time Stream processing application using Spark
Streaming
Aim:
To create a Real-Time Stream processing application using Spark Streaming.
Procedure:
Setup Apache Spark:
Ensure you have Apache Spark installed and configured on your system. You can download it from the
official Apache Spark website and follow the installation instructions provided there.
Choose a Streaming Source:
Determine the source of your streaming data. Common sources include Apache Kafka, Apache Flume,
Kinesis, TCP sockets, or even files in a directory that are continuously updated.
Initialize SparkContext and StreamingContext:
In your Python script, import the necessary modules from PySpark and initialize a SparkContext and
StreamingContext.
Create DStream:
Define a DStream (discretized stream) by connecting to the streaming source.
Define Transformations and Actions:
Apply transformations and actions to the DStream to process the data. This could include operations
like flatMap, map, reduceByKey, etc.
Output the Result:
Decide what to do with the processed data. You can print it to the console, save it to a file, push it to
another system, or perform further analysis.
16
Program:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
# Create a local StreamingContext with two working threads and batch interval of 1 second
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 1)
# Create a DStream connected to hostname:port
lines = ssc.socketTextStream("localhost", 9999)
# Split each line into words
words = lines.flatMap(lambda line: line.split(" "))
# Count each word in each batch
word_pairs = words.map(lambda word: (word, 1))
word_counts = word_pairs.reduceByKey(lambda x, y: x + y)
# Print the result to console
word_counts.pprint()
ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
spark = SparkSession \
.builder \
17
.appName("StructuredNetworkWordCount") \
.getOrCreate()
# Create DataFrame representing data in the stream
lines = spark \
.readStream \
.format("socket") \
.option("host", "localhost") \
.option("port", 9999) \
.load()
# Split the lines into words
words = lines.select(
explode(
split(lines.value, " ")
).alias("word")
# Generate word count
wordCounts = words.groupBy("word").count()
# Start running the query that prints the running counts to the console
query = wordCounts \
.writeStream \
.outputMode("complete") \
.format("console") \
.start()
query.awaitTermination()
18
Output:
Result:
Thus, we have successfully created a Real-time Stream processing application using Spark Streaming.
19
Ex No:6
Date:
Build a Micro-batch application
Aim:
To build a Micro-batch application for a telephone system.
Procedure:
Clearly outline the requirements for your micro batch application.
Select the appropriate technologies for your application based on your requirements and preferences.
For example, you might choose Python with libraries like SQLAlchemy for database access and Pandas
for data manipulation, or Java with Spring Batch framework.
Ensure that you have access to the data source where your telephone call records are stored.
This could be a relational database, a NoSQL database, or any other data storage system.
Design a data model that represents the structure of your telephone call records.
Write code to connect to the data source and retrieve call records in batches.
Implement logic to process each batch of call records.
This could involve calculations, aggregations, filtering, or any other data manipulation operations.
Write the calculated statistics to an output destination.
This could be a database table, a file, a message queue, or any other suitable output method.
Program:
from sqlalchemy import create_engine, select, func
from sqlalchemy.orm import sessionmaker
from datetime import datetime, timedelta
20
from collections import defaultdict
# Define SQLAlchemy engine
engine = create_engine('mysql://username:password@localhost/telephone_system')
# Define SQLAlchemy session
Session = sessionmaker(bind=engine)
session = Session()
# Function to process call records in batches
def process_call_records(batch_size):
# Query call records in batches
offset = 0
while True:
records = session.execute(select(CallRecord).offset(offset).limit(batch_size)).fetchall()
if not records:
break
# Process batch of call records
process_batch(records)
offset += batch_size
# Function to process a batch of call records
def process_batch(records):
user_call_duration = defaultdict(lambda: defaultdict(int))
for record in records:
caller_number = record.caller_number
call_start_time = record.call_start_time
call_end_time = record.call_end_time
call_duration = (call_end_time - call_start_time).total_seconds()
21
# Calculate total call duration per user per day
user_call_duration[caller_number][call_start_time.date()] += call_duration
# Write statistics to output (e.g., another table or file)
write_statistics(user_call_duration)
# Function to write statistics to output
def write_statistics(user_call_duration):
for user, durations_per_day in user_call_duration.items():
for date, total_duration in durations_per_day.items():
print(f"User: {user}, Date: {date}, Total Duration: {total_duration}")
# Call the function to process call records in batches
process_call_records(batch_size=1000)
# Close the session
session.close()
22
Output:
Result:
Thus, we successfully build a Micro-batch application for the telephone system.
23
Ex No:7
Date:
Real-time Fraud and Anomaly Detection
Aim:
To write a program for Real-time Fraud and Anomaly Detection.
Procedure:
Database Setup:
Install MongoDB: Download and install MongoDB from the official website
(https://www.mongodb.com/try/download/community).
Start MongoDB: Start the MongoDB service using the appropriate command for your operating system.
Access MongoDB Shell: Access the MongoDB shell to create a database and collection(s) for storing
transaction data.
Data Ingestion:
Establish a data pipeline to ingest real-time transaction data into MongoDB. This can be done using
various methods such as MongoDB Change Streams, messaging queues (e.g., Kafka), or directly through
API integration with transaction systems.
Ensure that each transaction record includes relevant information such as timestamp, transaction amount,
user ID, transaction type, etc.
Write scripts or applications to continuously insert incoming transaction data into the MongoDB
collection.
Real-time Processing:
Implement real-time processing logic to analyze incoming transactions for anomalies and fraudulent
patterns.
Use MongoDB Aggregation Pipeline to perform real-time aggregation, filtering, and analysis of
transaction data.
Anomaly Detection:
Develop algorithms or rules for detecting anomalies based on transaction attributes, historical patterns,
user behavior, etc.
Define thresholds or rules for identifying suspicious transactions, such as unusually large amounts,
frequent transactions within a short time, transactions from unusual locations, etc.
24
Program:
from pymongo import MongoClient
from datetime import datetime, timedelta
# Connect to MongoDB
client = MongoClient('localhost', 27017)
db = client['fraud_detection']
transactions_collection = db['transactions']
def record_transaction(transaction):
"""Record a transaction in the MongoDB database."""
transactions_collection.insert_one(transaction)
def detect_anomalies():
"""Detect anomalies in transactions."""
# Define time window for detecting anomalies (e.g., last 24 hours)
window_start = datetime.now() - timedelta(hours=24)
# Query transactions within the time window
transactions = transactions_collection.find({"timestamp": {"$gte": window_start}})
for transaction in transactions:
# Implement your anomaly detection algorithm here
# For demonstration purposes, let's assume any transaction amount above $1000 is considered an anomaly
25
if transaction['amount'] > 1000:
print("Anomaly detected: ", transaction)
if name == " main ":
# Simulate transaction data (replace with your real-time data source)
transactions = [
{"timestamp": datetime.now(), "amount": 500},
{"timestamp": datetime.now() - timedelta(hours=12), "amount": 1500},
{"timestamp": datetime.now() - timedelta(hours=20), "amount": 700},
{"timestamp": datetime.now() - timedelta(hours=3), "amount": 1200},
# Record transactions in MongoDB
for transaction in transactions:
record_transaction(transaction)
# Detect anomalies
detect_anomalies()
26
Output:
Anomaly detected: {'_id': ObjectId('609dc45cb127f47b9d18d274'), 'timestamp': datetime.datetime(2024, 5,
1, 3, 58, 52, 985747), 'amount': 1500}
Anomaly detected: {'_id': ObjectId('609dc45cb127f47b9d18d275'), 'timestamp': datetime.datetime(2024, 4,
30, 9, 58, 52, 985805), 'amount': 1200}
Result:
Thus, we have successfully build a Real-Time Fraud and Anamoly Detection.
27
Ex No:8
Date:
Real-time personalization, Marketing, Advertising
Aim:
To write a program for Real-Time personalization, Marketing and Advertising.
Procedure:
Design your MongoDB schema to efficiently store and retrieve this data. Consider using collections for
users, products, campaigns, and events.
Use MongoDB Change Streams to listen for changes in relevant collections. Change Streams allow you
to subscribe to real-time data changes in the database.
Set up triggers to react to changes in user behavior, product updates, or campaign statuses. For example,
when a user makes a purchase, update their profile and trigger relevant marketing actions.
Use MongoDB's aggregation framework to perform real-time analytics on user data.
Use MongoDB to store marketing campaign data, such as email templates, audience segments, and
campaign performance metrics.
Integrate with advertising platforms like Google Ads or Facebook Ads to create targeted advertising
campaigns.
Use MongoDB to store ad creative assets, targeting criteria, and campaign performance data.
Utilize real-time user data to dynamically adjust ad targeting or creative content.
Optimize your MongoDB queries and indexes for performance, especially for real-time analytics and
personalization.
Use MongoDB's built-in tools or third-party monitoring solutions to monitor database metrics, query
performance, and resource utilization.
28
Program:
import pymongo
from pymongo import MongoClient
import datetime
# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['marketing']
# Define collections
users = db['users']
products = db['products']
campaigns = db['campaigns']
events = db['events']
# Function to update user profile
def update_user_profile(user_id, data):
users.update_one({'_id': user_id}, {'$set': data}, upsert=True)
# Function to track user events
def track_event(user_id, event_type, event_data):
event = {
'user_id': user_id,
'type': event_type,
29
'data': event_data,
'timestamp': datetime.datetime.utcnow()
events.insert_one(event)
# Function to retrieve personalized recommendations for a user
def get_personalized_recommendations(user_id):
# Your recommendation algorithm implementation here
# This can involve querying user's past behavior, preferences, etc.
# For simplicity, let's just return some random products for now
return list(products.find().limit(5))
# Function to send personalized marketing email
def send_personalized_email(user_id, subject, body):
# Your email sending implementation here
print(f"Email sent to user {user_id}: Subject - {subject}, Body - {body}")
# Simulate user activity
def simulate_user_activity():
# Simulate user behavior
user_id = 123
product_id = 456
track_event(user_id, 'view_product', {'product_id': product_id})
30
# Update user profile
update_user_profile(user_id, {'last_activity': datetime.datetime.utcnow()})
# Get personalized recommendations
recommendations = get_personalized_recommendations(user_id)
print("Personalized Recommendations:", recommendations)
# Send personalized marketing email
send_personalized_email(user_id, 'Check out our latest products!', '...')
# Main program
if name == " main ":
# Simulate user activity every 10 seconds
while True:
simulate_user_activity()
# Sleep for 10 seconds
time.sleep(10)m:
31
Output:
Personalized Recommendations: [{'_id': 1, 'name': 'Product A', 'price': 100}, {'_id': 2, 'name': 'Product B',
'price': 150}, {'_id': 3, 'name': 'Product C', 'price': 200}, {'_id': 4, 'name': 'Product D', 'price': 120}, {'_id': 5,
'name': 'Product E', 'price': 180}]
Email sent to user 123: Subject - Check out our latest products!, Body - ...
Result:
Thus, we successfully build a model for Real-time personalization, Marketing, Advertising.
32