NIRF Documentationnn
NIRF Documentationnn
MACHINE LEARNING
Project Report
Bachelor of Technology
in
COMPUTER SCIENCE & ENGINEERING
by
P. GANESH GANGULY - 20L31A05I4
P. LOKESH NARAYANA - 20L31A05I9
MD. ZAKIR HUSSAIN - 20L31A05E8
N.K.S RAGHAVENDRA - 20L31A05G2
N.V LEKHENDRA - 20L31A05F6
1
VIGNAN’S INSTITUTE OF INFORMATION TECHNOLOGY(A)
Department of Computer Science & Engineering
CERTIFICATE
This is to certify that the major project entitled “COLLEGE NIRF RANK PREDICTION
USING MACHINE LEARNING” is a bonafide record of project work carried out under my
supervision by P. Ganesh Ganguly (20L31A05I4), P. Lokesh Narayana (20L31A05I9),
MD. Zakir Hussain (20L31A05E8), N.K.S Raghavendra(20L31A05G2), N.V Lekhendra
(20L31A05F6) during the academic year 2023 – 2024, in partial fulfilment of the requirements
for the award of the degree of Bachelor of Technology in Computer Science & Engineering
of VIGNAN’S INSTITUTE OF INFORMATION TECHNOLOGY (Autonomous). The
results embodied in this major project report have not been submitted to any other University
or Institute for the award of any Degree.
External Examiner
2
DECLARATION
We hereby declare that the project report entitled “College NIRF Rank Prediction using
Machine Learning” had done by us and has not been submitted either in part or whole for the
award of any degree or any other university.
Date:
Place:
3
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to acknowledge the assistance and cooperation we have
received from several persons while undertaking this Major Project. We owe a special debt
of gratitude to Mr. Ramaraju S.V.S.V.P, Assistant Professor Department of Computer Science
& Engineering, for his constant support and guidance throughout the course of our work. His
sincerity, thoroughness and perseverance have been a constant source of inspiration for us.
We also take the opportunity to acknowledge the contribution of Associate Professor Mr. B
Dinesh Reddy, Head of Department, Computer Science & Engineering, for his full support
and assistance during the development of the project. We also acknowledge the
contribution of all faculty members of the department for their kind assistance and
cooperation during the development of our project. Last but not the least, we acknowledge
our friends for their contribution in the completion of the project.
4
ABSTRACT
The National Institutional Ranking Framework (NIRF) is an annual ranking system initiated by
the Indian government to rank higher education institutions based on several parameters such
as teaching, research, and outreach activities. In this project, we propose to develop a machine
learning model that can predict the NIRF rank of an institution. Then based on the score of
previous years, we predict the rank by giving the performance indicators to the model. The
paper focuses on the use of Random Forest Regressor based Machine learning technique to
predict NIRF rank. Factors considered are Teaching, Learning and Resources (TLR) score,
Research and Professional Practice (RPC) score, Graduation Outcome (GO) score, Outreach
and Inclusivity (OI) score and Perception Score for particular college. The model is evaluated
using standard strategic indicator: Root Mean Square Error. The low value of this indicator
show that the model is efficient in predicting NIRF rank. We got score of 93% and RMSE of
15.47. We have completed ML model save and load operations using Joblib. We have created
a flask server for model deployment and deployed on Render as web service. We conducted
comprehensive evaluations on frequently used machine learning models and conclude that our
proposed solution outperforms due to the comprehensive feature engineering that we built. The
system achieves overall high accuracy for College NIRF rank prediction.
5
TABLE OF CONTENTS
1. Introduction 8 – 10
3. Literature Survey 16 – 18
4. Software Environment 19 – 39
5.1 Methodology 40
6. Conclusion 47 – 48
7. References 49 – 50
6
LIST OF FIGURES
3. System Architecture 15
7
INTRODUCTION
1.1 Introduction:
in India. The Framework was approved by the MHRD and launched by Minister of
The Framework uses several parameters for ranking purposes like resources, research,
and stakeholder perception. These parameters have been grouped into five clusters and
these clusters were assigned certain weightages. The weightages depend on the type
rankings. The methodology draws from the overall recommendations and broad
1. Teaching, Learning and Resources: This parameter checks the core activities in
8
4. Outreach and Inclusivity - Lays special emphasis on the representation of
women.
The NIRF ranking is determined by a complex process that involves the analysis of
teaching, research, graduation outcomes, outreach, and perception. The institutions are
then ranked based on their overall score, which is calculated using a weighted average
of these metrics.
determining the final rank. Machine learning algorithms can be used to build
predictive models that can accurately predict the NIRF rank of educational institutions
.By predicting the NIRF rank of educational institutions, stakeholders such as students,
parents, and educational institutions can make informed decisions about which
9
• To provide stakeholders such as students, parents, and educational institutions
with a reliable tool to make informed decisions regarding the choice of
educational institutions.
10
DESIGN AND METHODOLOGY
The methodology for predicting the NIRF rank of the Indian institutions using
machine learning algorithms typically involves the following steps:
1. Data Acquisition: The first step is to collect data on various performance metrics
for the educational institutions, such as research output, teaching quality, graduation
outcomes, and perception. The data can be collected from various sources such as
NIRF reports, university websites, and government databases. We have collected
dataset from Kaggle.
3. Feature Selection: The next step is to select the most relevant features that have
the most significant impact on the NIRF rank. Feature selection techniques such as
correlation analysis, principal component analysis (PCA), and recursive feature
elimination (RFE) can be used to identify the most important features.
4. Data Splitting: The dataset is then split into training and testing sets. The training
set is used to train the model, while the testing set is used to evaluate its
performance.
11
6. Model Evaluation: The trained model is evaluated using various performance
metrics such as root mean square error (RMSE), mean absolute error (MAE), and
R-squared. The evaluation helps to determine the accuracy and reliability of the
model and identify areas for improvement.
7. Model Deployment: Once the predictive model has been trained and evaluated, it
can be deployed for NIRF rank prediction. The model can be integrated into an
existing educational analytics platform or developed as a standalone application.
In conclusion, developing a machine learning model to predict the NIRF rank of Indian
institutions involves several crucial steps. Initially, data must be gathered from various
sources, such as Kaggle, to ensure a comprehensive understanding of each institution's
performance metrics. Once collected, the data undergoes preprocessing to clean up any
errors or inconsistencies, making it suitable for analysis. Feature selection is then
conducted to identify the most influential factors affecting NIRF rankings, streamlining
the model's focus for better accuracy.
12
2.1 Random Forest Regression:
Every decision tree has high variance, but when we combine all of them together in
parallel then the resultant variance is low as each decision tree gets perfectly trained on
that sample data, and hence the output does not depend on one decision tree but on
multiple decision trees. In the case of a classification problem, the final output is taken
by using the majority voting classifier. In the case of a regression problem, the final
output is the mean of all the outputs. This part is called Aggregation
13
The basic idea behind this is to combine multiple decision trees in determining the final
output rather than relying on individual decision trees. Random Forest has multiple
decision trees as base learning models. We randomly perform row sampling and feature
sampling from the dataset forming sample datasets for every model. This part is called
Bootstrap.
1. Bagging: It creates a different training subset from sample training data with
replacement & the final output is based on majority voting. For example, Random
Forest.
14
2.2 System Architecture:
This system architecture is designed for predictive modelling, starting from data
acquisition from Kaggle, after data pre-processing, feature ranking algorithms are used
to assess feature importance. Features are then selected based on their rank, and a
regression algorithm is applied to build a predictive model. Root Mean Square Error
(RMSE) is used to evaluate model performance, and the Random Forest algorithm can
be an alternative or complementary method for prediction. Overall, this architecture
enables the creation of accurate regression models for various predictive tasks.
15
LITERATURE SURVEY
INTRODUCTION:
This chapter provides an overview of related works in College NIRF Prediction using
Machine Learning.
The article titled "ML Use For Forecasting The NIRF Ranking Of Engineering
Colleges In India And PCA To Find The Correct Weightage For The Best Result"
explores the application of Machine Learning (ML) and Principal Component
Analysis (PCA) to optimize the National Institutional Ranking Framework (NIRF)
for engineering colleges in India. It evaluates NIRF criteria, proposes weightage
adjustments, and utilizes ML for rank prediction. PCA analysis complements ML
findings, suggesting modifications for enhanced accuracy. Insights highlight
disparities in funding and parameter weightage. The study advocates for refining
NIRF weightage to improve evaluation precision, showcasing the potential of ML
and PCA in ranking assessments.
2.2 Gadi Himaja, Gadu Srinivasa Rao and Gali Akarsh Naidu:
16
created using Flask, enabling easy access to the model's predictions for users
without programming expertise. This comprehensive approach offers valuable
insights and practical tools for ranking assessment in the education sector.
The article "FFT Based Ensembled Model to Predict Ranks of Higher Educational
Institutions" introduces a new way to predict how well universities and colleges rank
internationally. It's like guessing where your favorite team might end up in a
tournament. The tool, called EnFftRP, combines different methods to make better
guesses. By using a mix of six basic models and a special math technique called Fast
Fourier Transformation (FFT), it's able to make predictions more accurately.
Researchers tested this tool on data from 2005 to 2018 and found it did a great job better
than other methods. This means it's really good at guessing how well universities and
colleges will rank. It's like having a super-smart coach who can tell you where your
team stands among all the others. This tool is a big deal because it helps universities
and colleges understand how they're doing on a global scale.
17
2.5 Anika Tabassum, Mahamudul Hasan and Shibbir Ahmed:
"An Analytical Approach Towards the Prediction of Undefined Parameters for the
National Institutional Ranking Framework" explores a method to predict undefined
parameters in the National Institutional Ranking Framework (NIRF) for Higher
Education Institutions (HEIs) in India. NIRF ranks HEIs based on five key
parameters, some of which have undefined functions. This research aims to identify
the best-fitting regression machine learning model to approximate these undefined
functions. By studying various regression models and analyzing real NIRF data,
the study seeks to assist stakeholders in better understanding how NIRF scores are
calculated. This understanding can lead to more effective planning and decision-
making for improving HEI rankings. Through experimentation with real NIRF
data, the research offers insights into predicting and enhancing NIRF scores,
contributing to the continuous improvement of higher education institutions in
India.
18
SOFTWARE ENVIRONMENT
To run the provided project, you'll need to ensure that you have the necessary software
installed. Below are the steps to install the required software components:
1. Python: Make sure Python is installed on your system. You can download and
install Python from the official Python website: python.org. It's recommended to
install Python 3.x, as the provided code is compatible with Python 3.
2. pip: Pip is a package manager for Python. It's usually installed automatically when
you install Python. You can verify if pip is installed by running the following
command pip --version in your terminal or command prompt.
3. NumPy, pandas, joblib, Pillow, Matplotlib, Flask, Plotly: You can install these
Python libraries and frameworks using pip. Open your terminal or command
prompt and run the following command:
• pip install numpy pandas joblib Pillow matplotlib Flask plotly
You'll need a text editor or an integrated development environment (IDE) to write and
edit your code. Some popular choices include Visual Studio Code, PyCharm, Sublime
Text, Atom, etc. Once you have installed the required software and libraries, you can
proceed to run the provided Flask application.
Make sure you have the necessary dataset file (engineering.csv) and the trained model
file (college_rank_predictor.pkl) in your project directory. To run the Flask application,
navigate to your project directory in the terminal or command prompt and run the
following command: python app.py
19
This command will start the Flask development server, and you should see output
indicating that the server is running. You can then open a web browser and go to
http://127.0.0.1:5000/ to access your Flask application.
That's it! You have successfully installed the required software and run the provided
project.
NUMPY:
PANDAS:
Matplotlib:
20
Scikit-learn:
Scikit-learn provides a wide range of algorithms for tasks like classification (predicting
categories), regression (predicting continuous values), and clustering (grouping similar
data points). Using this train machine learning models on your data and assess their
performance using metrics like accuracy and precision. It is designed with a user-
friendly interface, allowing you to experiment with different algorithms and fine-tune
your models efficiently.
FLASK:
Joblib:
Pillow:
Pillow provides functions to open, edit, resize, and save images. You can crop, rotate,
and apply various filters. It allows you to draw basic shapes, text, and even create new
images from scratch. Due to its rich functionalities, Pillow is a popular choice for
various tasks involving image processing in Python applications, from simple editing
to complex computer vision projects.
21
Plotly
22
3. Splitting the dataset into training and testing sets:
• Scikit-learn's train_test_split() function is used for this purpose.
23
6. Evaluating the model's performance:
• Scikit-learn provides various metrics functions for evaluation, such as mean
squared error (MSE), root mean squared error (RMSE), mean absolute error
(MAE), and R-squared.
• These metrics can be calculated using functions from the sklearn.metrics
module.
24
8. Re-training the model with optimized hyperparameters:
• The model can be re-trained with the best hyperparameters obtained from the
tuning process.
These are the steps involved in building a Random Forest Regression model for NIRF
rank prediction using the mentioned libraries in Python. Each library plays a specific
role in different stages of the process, contributing to the overall workflow of model
building and evaluation.
25
4.3.2 ML model Testing Implementation:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df=pd.read_csv("./dataset/engineering.csv")
df.sample(6)
df.shape
df.info()
df.isnull().sum()
df.describe()
df.duplicated().sum()
clean_df=df.drop(["institute_id","name","city","state"],axis=True)
clean_df.sample(6)
X = clean_df.drop('rank', axis=1)
y = clean_df['rank']
print('Shape of X = ', X.shape)
print('Shape of y = ', y.shape)
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=51)
print('Shape of X_train = ', X_train.shape)
print('Shape of y_train = ', y_train.shape)
print('Shape of X_test = ', X_test.shape)
print('Shape of y_test = ', y_test.shape)
from sklearn.ensemble import RandomForestRegressor
regressorRFR =RandomForestRegressor(n_estimators=100, criterion='squared_error')
regressorRFR.fit(X_train, y_train)
regressorRFR.score(X_test, y_test)
y_pred2=regressorRFR.predict(X_test)
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
mse = mean_squared_error(y_test, y_pred2)
rmse = np.sqrt(mse)
print('MSE = ', mse)
26
print('RMSE = ', rmse)
from sklearn.model_selection import cross_val_score
cross_val_score(regressorRFR, X_train, y_train, cv=5, ).mean()
int(regressorRFR.predict([X_test.iloc[18, :]])[0].round())
y_test.iloc[18]
import joblib
joblib.dump(regressorRFR, "college_rank_predictor.pkl")
model = joblib.load("college_rank_predictor.pkl")
model.predict([X_test.iloc[18, :]])[0]
feature_importances = model.feature_importances_
feature_names = X.columns
feature_importance = dict(zip(feature_names, feature_importances)
sorted_feature_importance = sorted(feature_importance.items(), key=lambda x: x[1],
reverse=True)
for feature, importance in sorted_feature_importance:
print(f'{feature}: {importance}')
df_2016=pd.read_csv("./db/2016/EngineeringRanking_2016.csv")
df_2017=pd.read_csv("./db/2017/EngineeringRanking_2017.csv")
df_2018=pd.read_csv("./db/2018/EngineeringRanking_2018.csv")
df_2019=pd.read_csv("./db/2019/EngineeringRanking_2019.csv")
df_2020=pd.read_csv("./db/2020/EngineeringRanking_2020.csv")
df_2021=pd.read_csv("./db/2021/EngineeringRanking_2021.csv")
df_2016['year'] = 1
df_2016['year'] = 2
df_2017['year'] = 3
df_2018['year'] = 4
df_2019['year'] = 5
df_2020['year'] = 6
df_2021['year'] = 7
df_combined=pd.concat([df_2016,df_2017,df_2018,df_2019,df_2020,df_2021],
ignore_index=True)
excel_file_path_combined = 'combined_data.xlsx'
csv_file_path_combined = 'combined_data.csv'
27
df_combined.to_csv(csv_file_path_combined, index=False)
print(f"Combined DataFrame has been saved to {csv_file_path_combined}")
print(df_combined.columns)
clean_df=df_combined.drop(["InstituteId","InstituteName","City","State","Score"]
,axis=True)
clean_df.sample(6)
X = clean_df.drop('Rank', axis=1)
y = clean_df['Rank']
print('Shape of X = ', X.shape)
print('Shape of y = ', y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=51)
print('Shape of X_train = ', X_train.shape)
print('Shape of y_train = ', y_train.shape)
print('Shape of X_test = ', X_test.shape)
print('Shape of y_test = ', y_test.shape)
model_combined = RandomForestRegressor(n_estimators=100, random_state=42)
model_combined.fit(X_train, y_train)
print(X_train)
predictions_combined = model_combined.predict(X_test)
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
mae_combined = mean_absolute_error(y_test, predictions_combined)
mse_combined = mean_squared_error(y_test, predictions_combined)
r2_combined = r2_score(y_test, predictions_combined)
print(f'Combined Data Mean Absolute Error: {mae_combined}')
print(f'Combined Data Mean Squared Error: {mse_combined}')
print(f'Combined Data R-squared: {r2_combined}')
import joblib
joblib.dump(model_combined, "college_rank_predictor1.pkl")
model = joblib.load("college_rank_predictor1.pkl")
feature_importances_combined = model_combined.feature_importances_
feature_names_combined = X.columns
28
feature_importance_combined=dict(zip(feature_names_combined,feature_importanc
es_combined)
sorted_feature_importance_combined= sorted(feature_importance_combined.items(),
key=lambda x: x[1], reverse=True)
for feature, importance in sorted_feature_importance_combined:
print(f'{feature}: {importance}')
import pandas as pd
import matplotlib.pyplot as plt
top_college_features = { 'tlr': 90,'rpc': 85,'go': 95,'oi': 80,'perception': 92}
def get_user_features():
user_features = {}
print("Please enter the features of your college:")
for feature in top_college_features.keys():
value = float(input(f"Enter value for {feature}: "))
user_features[feature] = value
return user_features
def predict_rank(features):
predicted_rank = 5 # Example prediction
return predicted_rank
user_features = get_user_features()
predicted_rank = predict_rank(user_features)
differences={feature:user_features[feature]-top_college_features[feature]for feature
in top_college_features}
df = pd.DataFrame({'Top College': top_college_features, 'Your College': user_features,
'Differences': differences})
fig, ax = plt.subplots(figsize=(12, 6))
df[['Top College', 'Your College']].plot(kind='bar', ax=ax, color=['blue', 'orange'],
width=0.4)
df['Differences'].plot(kind='bar', ax=ax, color='red', alpha=0.5, width=0.2)
ax.set_ylabel('Feature Values / Differences')
ax.set_title('Comparison of Feature Values with Top-Ranked College')
ax.annotate(f'Predicted Rank: {predicted_rank}', xy=(0.5, 0), xytext=(0, -40),
xycoords='axes fraction', textcoords='offset points', ha='center', va='top',
29
fontsize=12, color='red', bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5))
plt.xticks(rotation=45) # Rotate x-axis labels for better readability
plt.grid(True)
plt.legend(['Top College', 'Your College', 'Differences'])
plt.show()
1. Data Splitting: The dataset is split into training and testing sets using the
train_test_split function from sklearn.model_selection. This step ensures that the
model's performance can be evaluated on unseen data.
2. Model Evaluation Metrics: Several evaluation metrics are calculated to assess the
performance of the trained model on the test set. These metrics include Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared. These
metrics provide insights into how well the model generalizes to new, unseen data.
6. User Input Prediction: Although not directly related to testing the model's
performance, the function predict_rank allows users to input features of a college
and obtain a predicted rank using the trained model. This functionality can be
considered as a form of testing the model's deployment and usability.
30
4.3.3 FRONT-END IMPLEMENTATION:
<!DOCTYPE html>
<html>
<head>
<title>College NIRF Rank Predictor</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]
alpha1/dist/css/bootstrap.min.css" rel="stylesheet"
integrity="sha384GLhlTQ8iRABdZLl6O3oVMWSktQOp6b7In1Zl3/Jr59b6EGGoI1
aFkw7cmDA6j6gD" crossorigin="anonymous">
<style>
.plot-container {
display: flex;
flex-wrap: wrap;
justify-content: space-around;
margin-bottom: 20px;
}
.plot {
flex: 0 1 30%;
margin-bottom: 20px;
}
</style>
</head>
<body>
<div>
<img src="static/images/NIRF.png" class="w3-border w3-padding"
alt="BANNER" style="width:100%">
</div>
<center>
<h1>College NIRF Rank Predictor</h1>
<br>
{% if message %}
31
<div class="mb-3" style="width: 300px; border: 5px solid red; padding: 10px;
margin: 0;">
{{message}}
</div>
{% endif %}
<div class="mb-3" style="width: 300px; border: 5px solid gray; padding: 10px;
margin: 0;">
<form method="POST">
<label for="exampleFormControlInput1" class="form-label ">Teaching,
Learning and Resources (TLR) Score : </label>
<input type="number" step="0.01" name="tlr" placeholder="Score Range(1-
100)"><br>
<label for="exampleFormControlInput1" class="form-label">Research and
Professional Practice (RPC) Score : </label>
<input type="number" step="0.01" name="rpc" placeholder="Score Range(1-
100)" ><br>
<label for="exampleFormControlInput1" class="form-label">Graduation
Outcome (GO) Score :</label>
<input type="number" step="0.01" name="go" placeholder="Score Range(1-
100)"><br>
<label for="exampleFormControlInput1" class="form-label">Outreach and
Inclusivity (OI) Score :</label>
<input type="number" step="0.01" name="oi" placeholder="Score Range(1-
100)"><br>
<label for="exampleFormControlInput1" class="form-label">Perception Score
:</label>
<input type="number" step="0.01" name="perception" placeholder="Score
Range(1-100)"><br><br>
<label for="exampleFormControlInput1" class="form-label">Enter the rank to
compare with :</label>
<input type="number" step="0.01" name="rta" placeholder=""><br><br>
<input type="submit" value="Predict" class="btn btn-info"><br>
</form>
32
</div>
<div style="width: 500px; border: 5px solid rgb(5, 145, 143); padding: 10px;
margin: 0;">
{% if prediction is not none %}
<p >
<h3>The Predicted NIRF college rank is: <u><b>{{ prediction
}}</u></b></h3></p>
{% endif %}
</div>
{{ chart_html | safe }}<br>
{% if plot_filenames %}
<div class="plot-container">
{% for plot_filename in plot_filenames[:3] %}
<div class="plot">
<iframe src="{{ url_for('static', filename=plot_filename) }}" width="100%"
height="400px"></iframe>
</div>
{% endfor %}
</div>
<div class="plot-container">
{% for plot_filename in plot_filenames[3:] %}
<div class="plot">
<iframe src="{{ url_for('static', filename=plot_filename) }}" width="100%"
height="400px"></iframe>
</div>
{% endfor %}
</div>
<div>
{% endif %}
<br><br>
<p>
<h4><u>NOTE:</u></h4><br>
33
NIRF(National Institutional Ranking Framework) is an initiative of the Indian
government to rank higher educational institutions in India based on various
parameters such as teaching, learning, research, outreach, and perceptio & This
Machine Learning model is trained on the 2020 NIRF Ranking dataset</p>
</div>
</center>
</body>
</html>
1. Title and Styling: The HTML document starts with a title indicating the purpose
of the page, "College NIRF Rank Predictor." It imports the Bootstrap CSS
framework to style the page elements.
2. Banner and Header: The page includes an image of the NIRF logo as a banner.
Below the banner, a centered header displays the title "College NIRF Rank
Predictor."
3. Form for User Input: Users can input scores for five parameters - Teaching,
Learning and Resources (TLR) Score, Research and Professional Practice (RPC)
Score, Graduation Outcome (GO) Score, Outreach and Inclusivity (OI) Score, and
Perception Score. Additionally, users can enter a rank to compare with.
5. Charts and Plots: The page renders charts and plots related to the prediction
results. It divides plots into separate containers for better organization and
presentation.
6. Note Section: A note section provides information about NIRF and the machine
learning model, including its training data source.
34
4.3.4 BACK-END IMPLEMENTATION:
35
differences = {feature: user_features[feature] - top_college_features[feature] for
feature in top_college_features}
difference_count = sum(1 for diff in differences.values() if diff != 0)
# Predict rank (replace this with your actual prediction code)
tlr = float(request.form.get("tlr"))
rpc = float(request.form.get("rpc"))
go = float(request.form.get("go"))
oi = float(request.form.get("oi"))
perception = float(request.form.get("perception"))
prediction = model.predict([[tlr, rpc, go, oi, perception]])
prediction = prediction -1
plot_filenames = []
for feature, value in user_features.items():
fig = got.Figure()
# Add features of top-ranked college
fig.add_trace(got.Bar(x=[feature], y=[top_college_features[feature]],
name='Top College',marker_color='blue'))
# Add features of user-entered college
fig.add_trace(got.Bar(x=[feature],y=[value],name='Your College',
marker_color='orange'))
# Add differences
bar_color = 'green' if differences[feature] > 0 else 'red'
fig.add_trace(got.Bar(x=[feature],y=[differences[feature]],name='Differences',
marker_color=bar_color ))
# Update layout
fig.update_layout(title=f'Comparison of {feature} with Top-Ranked College',
xaxis_title='Features',
yaxis_title='Values / Differences',
barmode='group'
)
# Save plot as HTML file
plot_filename = f"plot_{feature}.html"
36
plot_filenames.append(plot_filename)
fig.write_html(f"static/{plot_filename}")
# Create Plotly chart
fig = got.Figure()
# Add features of top-ranked college
fig.add_trace(got.Bar(x=list(top_college_features.keys()),
y=list(top_college_features.values()),
name='Top College',
marker_color='blue'
))
# Add features of user-entered college
fig.add_trace(got.Bar(x=list(user_features.keys()),
y=list(user_features.values()),
name='Your College',marker_color='orange'))
bar_colors = ['green' if diff > 0 else 'red' for diff in differences.values()]
fig.add_trace(got.Bar(x=list(differences.keys()),y=list(differences.values()),
name='Differences',marker_color=bar_colors))
# Update layout
fig.update_layout(title='Comparison of Feature Values with Top-Ranked
College',
xaxis_title='Features',yaxis_title='Values / Differences',barmode='group' )
# Convert Plotly chart to HTML
chart_html = fig.to_html(full_html=False, include_plotlyjs='cdn')
# print(prediction)
Return render_template('index1.html',chart_html=chart_html,
plot_filenames=plot_filenames, prediction=int(prediction[0].round()))
return render_template('index1.html')
if __name__ == '__main__':
app.run(debug=True)
37
Flask is a lightweight web application framework in Python that can be used for
deploying machine learning models for NIRF rank prediction. Here are the steps
involved in deploying a Random Forest Regression model using Flask:
1. Develop the Random Forest Regression model using Python libraries such as
scikit-learn and pandas.
2. Save the trained model as a file using Python's joblib library.
3. Create a new Flask application and import the necessary libraries and the trained
model file.
4. Define a route in Flask that will handle incoming requests to predict the NIRF
rank.
5. An instance of the Flask application is created with app = Flask(__name__).
6. Routes are defined using @app.route('/'). The / route corresponds to the root URL
of the application.
7. The index() function is the view function for the root URL. It handles both GET
and POST requests.
8. The render_template() function is used to render HTML templates, passing data
to the templates.
9. request.form is used to access form data submitted by the user.
10. In the route function, pre-process the incoming data and pass it through the trained
model to make a prediction.
11. Return the predicted NIRF rank as a response to the client.
12. Test the Flask application locally to ensure that it is working correctly.
13. The app.run(debug=True) statement starts the Flask development server.
38
Plotly: Plotly is a graphing library that allows you to create interactive plots and charts.
In this code:
Overall, Flask and Plotly are used together in this code to create a web application that
allows users to compare their college's features with those of the top-ranked college
and visualize the differences in an interactive manner
39
SYSTEM INTERFACE AND RESULTS
5.1 METHODOLOGY
3. Model Prediction
4. Display Results
5. Store Plots
7. CSS Styling
8. Dependencies
• HTML Form: Allows users to input data such as Teaching, Learning and
Resources (TLR) Score, Research and Professional Practice (RPC) Score,
Graduation Outcome (GO) Score, Outreach and Inclusivity (OI) Score,
Perception Score, and the rank to compare with.
40
3. Model Prediction:
• Machine Learning Model: Trained model loaded using joblib for predicting
NIRF college rank based on user input.
• Prediction Logic: Predicts the NIRF college rank using the trained model
and user-provided feature scores.
4. Display Results:
These modules collectively allow users to input their data, predict the NIRF college
rank, visualize comparisons, and display the results on the webpage.
41
5.3 OUTPUT SCREENSHOTS
To execute the project open command prompt and navigate to the project folder
location and run the following command:
>python app.py
Now navigate to the 127.0.0.1 which is the flask development server, then the
following interface appears on the web browser:
42
Now enter the scores of the institution to predict it’s NIRF ranking, also enter the NIRF
rank of the institution to compare:
43
Figure 7. Comparison of feature values with a specified rank for analysis
44
Figure 9. Comparison of RPC - Research and Professional Practices
46
CONCLUSION
In conclusion, the NIRF rank prediction project aims to leverage machine learning
techniques to predict the National Institutional Ranking Framework (NIRF) rank of
Indian higher education institutions. The project's purpose is to provide insights into
the factors that contribute to an institution's NIRF rank, identify areas for improvement,
and help policymakers allocate resources to enhance the overall quality of higher
education in India. By building a Random Forest Regression model using scikit-learn,
the project demonstrates the potential of machine learning to predict NIRF rankings
with a high degree of accuracy.
The model has been trained and evaluated using a large dataset of Indian educational
institutions, and its performance has been measured using evaluation metrics such as
root mean squared error (RMSE). The future scope of the project is vast and
encompasses several potential avenues for further development, such as incorporating
more data sources, enriching data with text analysis, incorporating temporal trends,
exploring alternative machine learning models, and building a user-friendly interface.
Overall, the NIRF rank prediction project is a valuable contribution to the improvement
of the Indian higher education system, and its predictive model provides actionable
insights for institutions and policymakers.
The future scope of the NIRF rank prediction project is vast and encompasses several
potential avenues for further development and improvement. Here are some possible
directions for future work:
47
2. Incorporating Time-Series Analysis: NIRF rankings evolve over time, reflecting
changes and improvements in educational institutions. Implementing time-series
analysis techniques can enable the model to capture temporal trends and predict
future rankings based on historical data.
3. Dynamic Updating of Model: Implementing a system for dynamically updating
the prediction model with the latest NIRF ranking data can ensure that the
predictions remain accurate and reflective of current trends in educational quality.
6. Enriching Data with Text Analysis: The model could potentially leverage natural
language processing techniques to extract insights from unstructured data sources
such as institutional websites, research papers, and news articles. This could
provide a more comprehensive picture of an institution's strengths and weaknesses.
48
REFERENCES
[2] Bhatia, A., & Singh, S. P. (2021). Predicting NIRF Ranking using Machine
Learning. In Proceedings of the 3rd International Conference on Computing
Methodologies and Communication (pp. 547-553). Springer.
[3] Jha, P. C., & Aggarwal, M. (2019). Predicting NIRF Ranking of Indian Universities
and Institutes using Machine Learning Techniques. Journal of Data Science, 17(4),
611-626.
[5] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical
learning: data mining, inference, and prediction. Springer.
[9] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[10] Kingma, D. P., & Ba, J. (2014). Adam: a method for stochastic optimization.
arXiv preprint arXiv:1412.6980.
[11] Nigam, A., & Singh, S. (2020). Predicting NIRF Ranking of Indian Engineering
Institutions using Machine Learning Techniques. International Journal of Engineering
Research and Technology, 13(2), 96-102.
49
[12] Kumar, A., & Kumar, M. (2021). NIRF Ranking Prediction using Ensemble
Machine Learning Techniques. In 2021 4th International Conference on Computing,
Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE.
[13] Jain, A., & Sood, S. K. (2020). NIRF Ranking Prediction of Indian Universities
using Machine Learning Algorithms. International Journal of Computer Applications,
180(7), 1-5.
[14] Agrawal, A., & Singh, S. P. (2020). Predicting NIRF Ranking of Indian
Universities and Institutes using Supervised Learning Techniques. In 2020 3rd
International Conference on Computing, Communication and Security (ICCCS) (pp.
1-6). IEEE.
[15] Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly
Media, Inc.
50