0% found this document useful (0 votes)
12 views20 pages

Si 2

Uploaded by

ratnalasaiganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views20 pages

Si 2

Uploaded by

ratnalasaiganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

A

Summer Internship-II
Report on
“Predicting Student Grades Using Multinomial Logistic Regression”

Submitted in Partial Fulfillment of the Requirements


for the award of the degree of

Bachelor of Technology
in
Electronics & Computer Engineering (ECM)
by
RATNALA SAI GANESH
21311A1976
B. Tech IV-Year I - Sem
Under the Guidance / Supervision of
Mrs. LATHA MADURI
Assistant Professor
Dept. of ECM

Department of Electronics & Computer Engineering


Sreenidhi Institute of Science & Technology
(Autonomous)
2024-2025

i
DEPARTMENT OF ELECTRONICS & COMPUTER ENGINEERING
SREENIDHI INSTITUTE OF SCIENCE AND TECHNOLOGY
(AUTONOMOUS)

CERTIFICATE

This is to certify that the Summer Industry Internship entitled “DATASCIENCE MASTER
VIRTUAL INTERNSHIP” being submitted by RATNALA SAI GANESH 21311A1976 in
partial fulfilment for the award of Bachelor of Technology degree in Electronics & Computer
Engineering to Sreenidhi Institute of Science and Technology, Yamnampet, Ghatkesar
Telangana, is a report of review work carried out by his/her during the academic year 2024 - 2025
under our guidance and supervision.

Mrs. LATHA MADURI DR. D. MOHAN


ASSISTANT PROFESSOR, HOD, PROFESSOR, ECM
ECM Dept. ECM Dept.

ii
iii
DECLARATION

This is to certify that the work reported in the present summer internship project titled
“Predicting Student Grades Using Multinomial Logistic Regression” is a record of work
done by me in the Department of Electronics and Computer Engineering, Sreenidhi
Institute of Science and Technology, Yamnampet, Ghatkesar.

The report is based on the internship done entirely by me and not copied from any other source.

RATNALA SAI GANESH


21311A1976

iv
ACKNOWLEDGMENT

I convey my sincere thanks to Dr T. Ch. Siva Reddy, Principal, Sreenidhi Institute of Science and
Technology, Ghatkesar for providing resources to complete this internship.

I am very thankful to Dr D. Mohan, Head of the ECM Department, Sreenidhi Institute of Science
and Technology, Ghatkesar for providing an initiative to this project and giving valuable timely
suggestions over my project work and for their kind cooperation in the completion of the internship.

I convey my sincere thanks to Mrs. Latha Maduri, Assistant Professor ECM Department and all
the faculties of the ECM department, Sreenidhi Institute of Science and Technology, for their
continuous help, cooperation, and support in completing this internship.

Finally, I extend my sense of gratitude to the almighty, my parents, all my friends and teaching and non-
teaching staff, who directly or indirectly helped us in this endeavour.

RATNALA SAI GANESH


21311A1976

v
ABSTRACT

Predicting student academic performance is crucial for identifying at-risk students and designing targeted
interventions to enhance educational outcomes. This research explores the application of Multinomial
Logistic Regression (MLR) to predict student grades across multiple categories. Leveraging historical
academic data, MLR provides a probabilistic framework to model the relationship between categorical
outcomes (grades) and predictor variables, such as demographic information, attendance records, socio-
economic background, and prior academic achievements. The study evaluates the effectiveness of MLR in
handling non-linear relationships and categorical grade outcomes while emphasizing its interpretability in
educational contexts.

The analysis highlights key predictors of academic performance, such as parental education level, study
habits, and teacher-student interactions. The model achieves robust classification accuracy and demonstrates
scalability to larger datasets, making it suitable for integration into institutional learning management
systems. Comparative performance metrics against alternative machine learning models underscore the
simplicity and efficiency of MLR for grade prediction.

The findings suggest that MLR can serve as a valuable decision-support tool for educators and
administrators, enabling data-driven strategies to improve learning experiences. Future work will explore
incorporating more dynamic variables, such as emotional well-being and co-curricular participation, to
enhance predictive accuracy.

Keywords: Multinomial Logistic Regression, Student Grades, Academic Performance Prediction,


Educational Data Mining, Predictive Analytics, Data-Driven Education

vi
INDEX

S.NO TITLE PAGE


NUMBER

1 INTRODUCTION 1

2 EXISTING SYSTEM 2

3 PROPSED SYSTEM 3

4 SOURCE CODE 4-5

5 EXPLANATION 6

6 ADVANTAGES 7

7 DISADVANTAGES 8

8 RESULTS 9-10

9 CONCLUSION AND FUTURESCOPE 11

10 BIBLIOGRAPHY 12

vii
LIST OF FIGURES

S.NO TITLE PAGE NUMBER

2.1 EXISTING SYSTEM 2

3.1 PROPOSED SYSTEM 3

8.1 LINE GRAPH 9

8.2 BAR GRAPH 10

viii
1. INTRODUCTION

The prediction of student academic performance has become a pivotal area of research in the domain of
educational data mining and analytics. Educational institutions increasingly rely on data-driven insights to
identify at-risk students, optimize learning strategies, and enhance overall academic outcomes. Among
various statistical and machine learning methods, Multinomial Logistic Regression (MLR) stands out as an
effective and interpretable approach for predicting categorical outcomes, such as grades.

Academic performance is influenced by a complex interplay of factors, including demographic attributes,


socio-economic background, attendance, prior academic records, and behavioral patterns. The capability to
accurately model these factors and predict performance can aid educators and administrators in providing
timely interventions, customizing learning experiences, and improving the quality of education delivery.

MLR is particularly well-suited for predicting academic performance since it handles multi-class
classification problems where the dependent variable has more than two categories. Unlike binary logistic
regression, MLR models the probabilities of multiple outcomes simultaneously, offering a comprehensive
understanding of how predictor variables influence different grade categories. Moreover, its probabilistic
nature allows stakeholders to interpret the model outcomes with clarity, making it a preferred choice in
educational settings.

This study focuses on applying MLR to predict student grades using historical academic and demographic
data. By analyzing these predictors, the research aims to highlight the effectiveness of MLR in educational
applications and demonstrate how it can facilitate proactive decision-making. Additionally, the study
compares MLR with other predictive models to underscore its simplicity, efficiency, and accuracy in multi-
class classification problems [1].

1
2. EXISTING SYSTEM

The prediction of student grades has traditionally relied on rule-based approaches and heuristic methods,
which often lack scalability and accuracy. These systems typically utilize fixed thresholds for assessing
performance indicators, such as attendance rates or test scores, to categorize students into predefined groups
(e.g., pass/fail). While easy to implement, such systems fail to account for the complexity and non-linearity
inherent in real-world academic data.

Recent advancements in educational data mining and machine learning have introduced sophisticated
methods for predicting student performance. These methods include decision trees, support vector machines
(SVM), and ensemble learning algorithms like random forests and gradient boosting. Although these
approaches provide higher predictive accuracy, they often lack interpretability, making it challenging for
educators to extract actionable insights.

Multinomial Logistic Regression (MLR) offers a balanced alternative, combining predictive accuracy with
ease of interpretation. Unlike traditional linear regression models, MLR handles categorical dependent
variables with multiple classes (e.g., grades such as A, B, and C). It assigns probabilities to each class based
on predictor variables such as demographics, socio-economic status, prior academic performance, and
behavioral traits. Additionally, MLR models are computationally efficient and can be easily integrated into
existing educational frameworks.

Existing systems leveraging MLR and similar approaches often incorporate data visualization tools to
provide educators with actionable dashboards. However, limitations persist, such as insufficient
incorporation of real-time data (e.g., emotional well-being or extracurricular activities) and difficulties in
handling missing or inconsistent data. Despite these challenges, MLR remains a valuable tool for academic
performance prediction due to its balance of simplicity and effectiveness. [2]

Fig 2.1 Existing Model


2
3. PROPOSED SYSTEM

The proposed system builds upon the limitations of existing systems by implementing a robust framework
for predicting student grades using Multinomial Logistic Regression (MLR), enhanced with efficient data
processing and real-time prediction capabilities. This system focuses on improving accuracy, scalability,
and interpretability, ensuring it meets the needs of educational institutions.
1. Data Collection: Raw data, including academic records, attendance, socio-economic details, and
behavioral patterns, is stored in a centralized database.
2. Data Preprocessing: A Data Preprocessing Module cleanses and standardizes the data to handle
missing values, inconsistencies, and outliers. Feature scaling and encoding techniques are applied to
make the data suitable for MLR.
3. Model Training: The Model Training Module employs MLR as the primary algorithm for multi-
class classification, categorizing grades into predefined levels (e.g., A, B, C). The model is trained
iteratively, optimizing for accuracy and computational efficiency. Comparative models (e.g., SVM,
Random Forest) are also tested to validate the superiority of MLR for this application.
4. Classification and Prediction: Upon receiving a Student Request via the Live Server the system
utilizes the trained MLR model to predict the student’s grade. The prediction probabilities are also
provided, allowing educators to assess the likelihood of each grade category.
5. User Interface: An intuitive User Interface enables students, teachers, and administrators to access
the system. Predicted grades are presented alongside visual insights, such as factor importance and
recommendations for improvement.
This proposed system not only improves prediction accuracy but also empowers stakeholders to take
proactive steps for student performance enhancement. [3]

Fig 3.1: Proposed System


3
4. SOURCE CODE

# Load necessary libraries


library(ggplot2)
library(nnet) # For multinomial logistic regression# Simulate
data for 10 students
set.seed(42)
students_data <- data.frame(
Marks = c(80, 90, 60, 45, 85, 70, 92, 50, 76, 65), # Marks out of 100
Attendance = c(95, 80, 70, 60, 90, 85, 88, 75, 80, 72), # Attendance percentage
Study_Time = c(8, 9, 6, 4, 10, 7, 9, 5, 6, 8), # Hours of study per week
Syllabus_Covered = c(90, 85, 75, 60, 80, 95, 92, 70, 77, 80) # Percentage of syllabuscovered
)
# Assign grades based on Marks
students_data$Grade <- factor(ifelse(students_data$Marks >= 85, 'A',
ifelse(students_data$Marks >= 70, 'B', ifelse(students_data$Marks >= 50, 'C', 'D'))),
levels = c('D', 'C', 'B', 'A'))
# View the data
print(students_data)
# Step 2: Train a model (Multinomial Logistic Regression using 'nnet' package)
model <- multinom(Grade ~ Marks + Attendance + Study_Time + Syllabus_Covered,data =
students_data)
# Step 3: Predict Grades
predictions <- predict(model, newdata = students_data)# Show
predictions
print("Predicted Grades:")
print(predictions)
# Step 4: Visualizations
# 1. Bar plot of Predicted Grade Distribution
ggplot(data = data.frame(Predicted_Grade = predictions), aes(x = Predicted_Grade)) +
geom_bar(fill = 'skyblue', color = 'black') +
labs(title = "Predicted Grade Distribution", x = "Grade", y = "Count") +theme_minimal()

4
# 2. Scatter plot of Marks vs Predicted Grade ggplot(students_data, aes(x =
Marks, y = Grade, color = Grade)) +geom_point(size = 4) +
labs(title = "Marks vs Predicted Grade", x = "Marks", y = "Predicted Grade") +
theme_minimal()

# 3. Line plot of Syllabus Covered vs Study Time ggplot(students_data, aes(x =


Syllabus_Covered, y = Study_Time)) +geom_line(color = 'blue') +
geom_point(size = 4, color = 'red') +
labs(title = "Syllabus Covered vs Study Time", x = "Syllabus Covered (%)", y = "StudyTime
(hours/week)") + theme_minimal()

5
5. EXPLANATION
o Step 1: Load Libraries:
- The script begins by loading the required libraries. `ggplot2` is imported for creating
visualizations, while the `nnet` package is used for training a multinomial logistic regression
model. These libraries provide the tools to handle data analysis and visualization seamlessly.
The `set.seed(42)` function is used to ensure reproducibility of results during data simulation.
o Step 2: Simulate Data
- A synthetic dataset is created to mimic student academic performance. The dataset includes
four key features: Marks, Attendance, Study_Time, and Syllabus Covered, representing
different factors influencing grades. A new column, Grade, is generated based on a logical
categorization of marks:
 'A' for marks ≥ 85,
 'B' for 70 ≤ marks < 85,
 'C' for 50 ≤ marks < 70,
 'D' for marks < 50.
- The `factor()` function ensures the grades are ordered, which is crucial for model training.
o Step 3: Train a Multinomial Logistic Regression Model
- The multinomial logistic regression model is built using the `multinom()` function, where the
Grade column is the dependent variable, and Marks , Attendance , Study_Time , and
Syllabus_Covered are the independent variables. This step enables the model to learn the
relationship between student performance metrics and their corresponding grades.
o Step 4: Make Predictions
- Using the trained model, predictions are made on the same dataset. The `predict()` function is
utilized to generate grade predictions for each student, which are then printed for review.
These predictions represent the system's assessment of each student's likely grade based on
the input features.
o Step 5: Visualizations
- Three visualizations are created to analyze and interpret the results:
 Bar Plot : Displays the distribution of predicted grades, highlighting the number of
students in each grade category.
 Scatter Plot : Shows the relationship between Marks and Predicted Grade ,
providing insights into how marks influence the assigned grade.
 Line Plot : Examines the correlation between Syllabus Covered and Study Time ,
emphasizing trends in how study efforts align with syllabus completion.[4]

6
6 . ADVANTAGES OF PROPOSED SYSTEM

1. Improved Prediction Accuracy: By leveraging Multinomial Logistic Regression (MLR), the


system effectively handles multi-class grade predictions with higher accuracy compared to
traditional methods. It captures the relationships between predictor variables and multiple grade
categories comprehensively.
2. Interpretability: MLR provides probabilities for each grade category, making the predictions easy
to interpret for educators and administrators. This transparency aids in better decision-making and
fosters trust in the system.
3. Real-Time Predictions: The system processes student data dynamically through the Live Server,
ensuring real-time grade prediction and immediate feedback. This feature enables prompt
interventions for students requiring additional support.
4. Scalability: Designed to handle large datasets, the system can process and predict grades for
institutions with thousands of students without significant performance degradation.
5. Data-Driven Insights: The integration of visualization tools provides actionable insights into key
factors affecting student performance. Educators can identify trends and tailor strategies to enhance
academic outcomes.
6. Comprehensive Data Utilization: By combining academic records, socio-economic factors,
attendance, and behavioral traits, the system ensures a holistic approach to performance prediction.
7. User-Friendly Interface: The intuitive interface makes it accessible for students, teachers, and
administrators, enabling them to view predictions, analyze influencing factors, and take necessary
actions.
8. Efficient Data Processing: The Data Preprocessing Module handles missing and inconsistent data
effectively, ensuring the integrity and quality of the input used for predictions.
9. Comparison with Other Models: The inclusion of comparative algorithms (e.g., SVM, Random
Forest) ensures that MLR is validated as the most suitable approach, combining efficiency with
simplicity.
10. Educational Intervention Support: The system highlights students at risk, enabling timely
interventions, personalized learning plans, and resource allocation.

This system is a step forward in modernizing educational analytics, offering a practical, interpretable, and
scalable solution for academic performance prediction.

7
7. DISADVANTAGES OF PROPSED SYSTEM

1. Dependence on Data Quality: The system's accuracy is heavily reliant on the quality of input data.
Missing, inconsistent, or biased data can lead to inaccurate predictions, limiting its effectiveness.
2. Limited Handling of Real-Time Behavioral Data: While the system uses historical data
effectively, it may not fully incorporate real-time behavioral factors such as emotional well-being or
sudden changes in academic performance, which can significantly influence grades.
3. Complexity in Feature Selection: Identifying and selecting the most relevant features for
Multinomial Logistic Regression can be challenging. Irrelevant or redundant features may affect the
model’s performance and interpretation.
4. Difficulty with High-Dimensional Data: Although MLR is efficient for small to medium-sized
datasets, its performance may degrade when dealing with high-dimensional data, as it requires
significant computational resources.
5. Limited Generalizability: The system is trained on specific datasets, which may limit its
applicability to institutions with vastly different academic structures, grading systems, or student
demographics.
6. Overfitting Risk: The model may overfit the training data, especially if it is not regularized
appropriately. This can reduce its ability to generalize to new, unseen data.
7. Static Nature of Models: Once trained, the model requires periodic updates with new data to
maintain accuracy. This manual retraining process can be time-consuming.
8. Inadequate for Complex Relationships: MLR assumes a linear relationship between predictor
variables and the log-odds of outcomes, which may not capture complex, non-linear dependencies in
student performance data.
9. Lack of Comprehensive Stakeholder Inputs: While predictions are useful, the system may lack
features to incorporate qualitative insights from educators, such as classroom observations or teacher
evaluations.
10. Privacy and Ethical Concerns: Collecting and processing sensitive student data, such as socio-
economic background or behavioral records, raises privacy and ethical issues. Robust security
measures are required to ensure data confidentiality.

Despite these limitations, the system’s design and implementation can be refined to overcome most
challenges, making it a reliable tool for academic performance prediction .

8
8. RESULTS & OUTPUTS

The dataset for the 10 students, after assigning grades based on the marks, looks like this:

No Marks Attendance Study_Time Syllabus_Covered Grade


1 80 95 8 90 B
2 90 80 9 85 A
3 60 70 6 75 C
4 45 60 4 60 D
5 85 90 10 80 A
6 70 85 7 95 B
7 92 88 9 92 A
8 50 75 5 70 C
9 76 80 6 77 B
10 65 72 8 80 C

Predicted Grades:
[1] B A C D A B A C B C
Levels: D C B A

Fig: 8.1. Line plot for Syllabus covered vs Study time

9
Fig: 8.2. Predicted Grade Distribution

10
9. CONCLUSION AND FUTURE SCOPE

8.1 CONCLUSION

In conclusion, data science is a powerful tool that is revolutionizing the way businesses operate
and make decisions. By uncovering transformative patterns, driving innovation, and enabling
real-time optimization, data science provides businesses with the insights needed to stay
competitive and achieve growth. It enhances decision-making through data-driven insights and
allows for personalized customer experiences, fostering greater satisfaction and loyalty. Whether
it’s improving operational efficiency, predicting trends, or creating new products, data science
has become an essential strategy for businesses across industries. As data continues to grow in
importance, companies that leverage data science effectively will be better equipped to adapt to
change, solve complex problems, and unlock new opportunities. Embracing data science not only
helps businesses thrive in today’s fast-paced environment but also ensures they are prepared for
future challenges.

8.2 FUTURE SCOPE

The future scope of data science is vast and continues to evolve as technology advances. Some
key areas where data science is expected to have a significant impact in the coming years include:

1. Artificial Intelligence (AI) and Machine Learning (ML) Integration: Data science will
continue to integrate with AI and ML, enabling smarter algorithms, automated decision-
making, and predictive analytics. Businesses will increasingly rely on these technologies to
enhance automation, optimize operations, and improve customer experiences.
2. Big Data and Real-Time Analytics: With the growing amount of data generated daily, the
ability to process and analyze large datasets in real time will become even more critical. Data
science will play a pivotal role in making sense of big data and providing actionable insights
instantaneously.
3. Natural Language Processing (NLP): As NLP technology advances, businesses will be able
to analyze and interpret vast amounts of text data more effectively. This includes applications
like sentiment analysis, chatbots, and automated content creation, improving customer
engagement and operational efficiency.

11
10. BIBLIOGRAPHY

[1] Provost, F., & Fawcett, T. (2013). Data Science and its Relationship to Big Data and Data-

Driven Decision Making. Big Data, 1(1), 51–59.

https://doi.org/10.1089/big.2013.1508

[2] Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12), 64–73.

https://doi.org/10.1145/2500499

[3] Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a

revolution that will transform supply chain design and management. Journal of Business

Logistics, 34(2), 77–84.

https://doi.org/10.1111/jbl.12010

[4] Agarwal, R., & Dhar, V. (2014). Editorial—Big Data, Data Science, and Analytics: The

Opportunity and Challenge for IS Research. Information Systems Research, 25(3), 443–448.

https://doi.org/10.1287/isre.2014.0546

[5] R for Data science. (n.d.). Google Books.

https://books.google.co.in/books?hl=en&lr=&id=TiLEEAAAQBAJ&oi=fnd&pg=PT9&dq=d

ata+science&ots=ZJr_gewVoO&sig=NT0XhXWI570DzWXu-

LXRGZVfLYQ&redir_esc=y#v=onepage&q=data%20science&f=false

12

You might also like