0% found this document useful (0 votes)
5 views

Assignment 2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Assignment 2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CS-3001 Computer Networks

BSDS – Fall 2024

Assignment 2

Deadline: Sunday, October 13, 2024, 11:59 pm

Submission Instructions:

• This is a group assignment. Each group can have a maximum of 2 students.


• Create a single ZIP file containing all deliverables.
• Name your ZIP file as cs-3001_asg2_XXi-XXXX_YYii-YYYY.zip, replacing the Xs and Ys with
your respective roll numbers.
• Submit the ZIP file on Google Classroom within the specified deadline.
• Submissions other than Google Classroom will not be accepted.
• Correct and timely submission of the assignment is the responsibility of every student;
hence no relaxation will be given to anyone.
• Plagiarism is strictly prohibited. If any part of your work is found to be plagiarized, you will
receive zero marks for the entire assignment category.

Federated Learning for Linear Regression


Federated learning is a distributed approach to machine learning where data remains
decentralized, and models are trained locally on edge devices. Instead of aggregating raw data to
a central server, the local devices (or clients) train machine learning models independently on
their own datasets. After training, they share only the model parameters with the central server,
which aggregates these parameters to create a global model. This framework is designed to
preserve data privacy while enabling collaborative learning across multiple devices. Federated
learning is widely used in industries like healthcare, mobile technologies, and finance, where
privacy and security are paramount.

In this assignment, you will implement federated learning for a linear regression model with a
single independent variable using C/C++ client-server processes. You will learn how to handle
distributed machine learning, model parameter exchange between clients and server, and model
aggregation.

Centralized Linear Regression

You will first implement a standard linear regression model in C/C++ using a single dataset. You
will combine the nine separate training subsets of the provided dataset and train the model using
gradient descent. After training the model, you will evaluate its performance on the test subset
by calculating the root mean squared error (RMSE).
Steps:

• Combine the nine subsets into a single training set.


• Train a linear regression model with the following form:
𝑦𝑦 = 𝑤𝑤𝑤𝑤 + 𝑏𝑏
Where w is the weight, b is the bias, x is the number of hours studied, and y is the
predicted performance index.
• Evaluate the model's performance on the tenth subset (test set) using RMSE:
𝑁𝑁
1
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = � �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2
𝑁𝑁
𝑖𝑖=1

• Output the RMSE score of the model on the test set.

Federated Linear Regression

In this part of the assignment, you will design a federated learning framework. This involves
developing both client and server programs in C/C++. The clients will each train a linear
regression model locally on one of the nine training subsets. Once training is complete, the
clients will send the trained model parameters (weights and bias) to the server. The server will
compute a weighted average of the received parameters to create a global model, which will then
be used to make predictions on the test set. The final performance of the model will be evaluated
using RMSE.

Client Program Steps:

• Take a subset of the dataset (text file) as input.


• Train a local linear regression model using the same form as described above. Use
gradient descent to optimize the parameters (weight and bias).
• Send the trained model parameters (weight and bias) to the server.

Server Program Steps:

• Wait to receive model parameters from all nine clients.


• Perform weighted averaging of the parameters:
9 9
1 1
𝑤𝑤𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 = � 𝑤𝑤𝑖𝑖 , 𝑏𝑏𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 = � 𝑏𝑏𝑖𝑖
9 9
𝑖𝑖=1 𝑖𝑖=1
• Use the global model to predict performance on the test dataset and evaluate the results
using RMSE.
• Display the final RMSE for the global model.

Dataset Description

You are provided with a dataset that contains 10000 records of students' performance and study
hours. It consists of ten subsets of data, of which nine will be used for training and the tenth
subset will be used as the test set to evaluate the performance of the trained model. Each subset
is provided as a text file containing multiple rows, with each row representing a single student
record. Each record consists of two values:
1. Study Hours: A floating-point number representing the number of hours the student
studied.
2. Performance Index: An integer representing the student's performance index, a score
ranging between 1 and 100 that reflects their overall academic performance.

Example:

SH, PI
The first row represents a student who studied for 3.5 hours
3.5 78
5.0 85 and achieved a performance index of 78.
2.2 60
The second row shows a student who studied for 5.0 hours
7.1 92
and had a performance index of 85, and so on.
4.3 75

Deliverables

• [50 marks] C/C++ code for both the centralized model and the federated learning model.
• [25 marks] A brief report, in PDF format, explaining the design and implementation of
both models with screenshots of code and output. The report should also include a
comparison of the two models and their results.
• [25 marks] A 10-minute screen recording demonstrating the functionality of your
programs. Explain the code and provide a demo of the client-server federated learning
process, including the final RMSE results. Each group member should explain their
individual contribution. Upload the video on Google Drive and add its link to the report.

Note: You will get zero marks in the whole assignment if any of the deliverables are missing.

You might also like