Micro Project Report Format VISHAL & MILAN
Micro Project Report Format VISHAL & MILAN
On
COLLEGE MANAGEMENT SYSTEM
By
THAKOR MILAN , TALPADAVISHALKUMAR
Enrollment No: 23604031697,236040316096
Date:23.4.25
Place: b&b institute of Technlology , vv nagar , Anand
1. Introduction
1.1 A College Management System (CMS)
1.2 administrative tasks of a college or educational institution
1.2.1 reduce manual effort, and provide a centralized
1.2.2 platform for information management.
2. Features Of system
2.1 To digitalize the administrative processes in colleges.
2.2 To provide real-time access to information for students, faculty, and
staff.
2.3 To manage academic and non-academic operations efficientl
2.4 improve communication between departments and stakeholders
References
1. Introduction
3. System Architecture
Architecture Type: MVC (Model-View-Controller)
Faculty Management
Add courses/subjects
Allocate subjects to faculty
Link subjects with students
Attendance Management
Fee Management
Grade Management: Recording and managing student grades, generating report cards,
and calculating GPAs.
User Authentication: Secure login and authorization mechanisms for different user
roles.
Reporting: Generating various reports on students, courses, attendance, and finances.
These systems aim to automate routine tasks, improve efficiency, and facilitate data-
driven decision-making within colleges.
it is usually insightful to take a look at examples from the dataset. The sample email contains a
URL, an email address (at the end), numbers, and dollar amounts. While many emails would
contain similar types of entities (e.g., numbers, other URLs, or other email addresses), the specific
entities (e.g., the specific URL or specific dollar amount) will be different in almost every email.
Therefore, one method often employed in processing emails is to “normalize” these values, so
that all URLs are treated the same, all numbers are treated the same, etc. For example, we could
replace each URL in the email with the unique string “httpaddr” to indicate that a URL was
present.
This has the effect of letting the spam classifier make a classification decision based on whether
any URL was present, rather than whether a specific URL was present. This typically improves
the performance of a spam classifier, since spammers often randomize the URLs, and thus the
odds of seeing any particular URL again in a new piece of spam is very small.
In processEmail, the following email preprocessing and normalization steps have been
implemented:
Lower-casing: The entire email is converted into lower case, so that captialization is
ignored (e.g., IndIcaTE is treated the same as Indicate).
Stripping HTML: All HTML tags are removed from the emails. Many emails often
come with HTML formatting; we remove all the HTML tags, so that only the content
remains.
Normalizing URLs: All URLs are replaced with the text “httpaddr”.
Normalizing Email Addresses: All email addresses are replaced with the text
“emailaddr”.
Normalizing Numbers: All numbers are replaced with the text “number”.
Normalizing Dollars: All dollar signs ($) are replaced with the text “dollar”.
Word Stemming: Words are reduced to their stemmed form. For example, “discount”,
“discounts”, “discounted” and “discounting” are all replaced with “discount”. Sometimes,
the Stemmer actually strips off additional characters from the end, so “include”,
“includes”, “included”, and “including” are all replaced with “includ”.
Removal of non-words: Non-words and punctuation have been removed. All white
spaces (tabs, newlines, spaces) have all been trimmed to a single space character.
The result of these preprocessing steps looks like the following paragraph:
anyon know how much it cost to host a web portal well it depend on how mani visitor your expect thi can be
anywher from less than number buck a month to a coupl of dollarnumb you should checkout httpaddr or
perhap amazon ecnumb if your run someth big to unsubscrib yourself from thi mail list send an email to
emailaddr
While preprocessing has left word fragments and non-words, this form turns out to be much easier
to work with for performing feature extraction
After preprocessing the emails, there is a list of words for each email. The next step is to choose
which words will be used in the classifier and which will be left out.
For simplicity reasons, only the most frequently occuring words as the set of words considered
(the vocabulary list) have been chosen. Since words that occur rarely in the training set are only in
a few emails, they might cause the model to overfit the training set. The complete vocabulary list
is in the file vocab.txt. The vocabulary list was selected by choosing all words which occur at least
a 100 times in the spam corpus, resulting in a list of 1899 words. In practice, a vocabulary list with
about 10,000 to 50,000 words is often used.
Given the vocabulary list, each word can be now mapped in the preprocessed emails into a list of
word indices that contains the index of the word in the vocabulary list. For example, in the sample
email, the word “anyone” was first normalized to “anyon” and then mapped onto the index 86 in
the vocabulary list.
The code in processEmail performs this mapping. In the code, a given string str which is a single
word from the processed email is searched in the vocabulary list vocabList. If the word exists, the
index of the word is added into the word_indices variable. If the word does not exist, and is
therefore not in the vocabulary, the word can be skipped.
file_contents
"> Anyone knows how much it costs to host a web portal ?\n>\nWell, it depends on how many visitors
you're expecting.\nThis can be anywhere from less than 10 bucks a month to a couple of $100. \nYou
should checkout http://www.rackspace.com/ or perhaps Amazon EC2 \nif youre running something
big..\n\nTo unsubscribe yourself from this mailing list, send an email to:\ngroupname-
[email protected]\n\n"
import re
from string import punctuation
from nltk.stem.snowball import SnowballStemmer
return vocabList
# hdrstart = email_contents.find("\n\n")
# if hdrstart:
# email_contents = email_contents[hdrstart:]
# Handle Numbers.
# Look for one or more characters between 0-9.
email_contents = re.sub('[0-9]+', 'number', email_contents)
# Handle URLS.
# Look for strings starting with http:// or https://.
email_contents = re.sub('(http|https)://[^\s]*', 'httpaddr', email_contents)
# Handle $ sign.
# Look for "$" and replace it with the text "dollar".
email_contents = re.sub('[$]+', 'dollar', email_contents)
# Process file
l=0
#
=============================================================
=======
# Print to screen, ensuring that the output lines are not too long.
if l + len(token) + 1 > 78:
print()
l=0
print(token, end=' ')
l = l + len(token) + 1
# Print footer.
print('\n\n=========================\n')
return word_indices
# Extract features.
word_indices = processEmail(file_contents)
# Print stats.
print('Word Indices: \n')
print(word_indices)
print('\n\n')
==== Processed Email ====
anyon know how much it cost to host a web portal well it depend on how mani
visitor your expect this can be anywher from less than number buck a month to
a coupl of dollarnumb you should checkout httpaddr or perhap amazon ecnumb if
your run someth big to unsubscrib yourself from this mail list send an email
to emailaddr
=========================
Word Indices:
[86, 916, 794, 1077, 883, 370, 1699, 790, 1822, 1831, 883, 431, 1171, 794, 1002, 1895, 592, 238, 162,
89, 688, 945, 1663, 1120, 1062, 1699, 375, 1162, 479, 1893, 1510, 799, 1182, 1237, 810, 1895, 1440,
1547, 181, 1699, 1758, 1896, 688, 992, 961, 1477, 71, 530, 1699, 531]
CONCLSUION :
This project successfully implemented a spam detection system using Naïve Bayes.
The model achieved over 98% accuracy, making it highly effective for classifying
SMS messages. Its simplicity and speed make it a suitable choice for real-time
applications such as spam filters in messaging apps or email systems.
References
[1] https://abcd.com
[2] https://data-flair.training/blogs/python-anaconda-tutorial/
[3] https://www.tutorialspoint.com/machine_learning_with_python/index.htm