Internship
Internship
Report on
INNOVATION/ ENTREPRENEURSHIP/ SOCIETAL INTERNSHIP (21INT68)
Submitted in partial fulfillment towards requirements for the award of the degree of
Bachelor of Engineering
in
Submitted by
Dr. Manjuprasad B
2023-2024
GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN
This is to certify that the Internship program is a Bonafide work carried out by “Fiza Naaz”
(4GW21CI011) in partial fulfillment for the award of degree of Bachelor of Engineering in
CSE (Artificial Intelligence and Machine Learning) of the Visveswaraya Technological
University,Belagavi, during the year 2023-24. The Report has been approved as it satisfies
the academicrequirements with respect to the prescribed for Bachelor of Engineering Degree.
……………………………..
Dr. Manjuprasad B
Guide & HoD
ACKNOWLEGMENT
Without acknowledging those who made it possible, the happiness and satisfaction that come
with completing a task successfully would be lacking.
I am thankful to Smt. Vanaja B Pandit, Honorary Secretary, GSSSIETW, Mysuru, and the
Management of GSSSIETW, Mysuru for providing necessary support to carry out this internship.
I am thankful to Dr. Shivakumar M, Principal, GSSSIETW, Mysuru, for all the support he has
rendered.
I would like to thank my guide Dr. Manjuprasad B, Prof. & Head for his constant monitoring,
guidance & motivation throughout the tenure of this internship.
I am extremely pleased to thank my teammate, parents, family members, CSE(AIML) staff and
classmates for their continuous support, inspiration and encouragement, for their helping hand and
also last but not the least, I thank all the members who supported directly or indirectly in my
academic process.
Fiza Naaz
(4GW21CI011)
i
Executive Summary
This report details an internship project about "Result Analysis Automation Using Python Fitz and
PyMySQL" conducted as part of the INNOVATION/ENTREPRENEURSHIP/SOCIETAL
INTERNSHIP (21INT68) Bachelor of Engineering degree in Computer Science and Engineering
(Artificial Intelligence and Machine Learning) at GSSS Institute of Engineering & Technology for
Women. The project aims to automate result analysis for the Department/Staff of GSSSIETW by
efficiently parsing PDFs to extract academic results. The extracted data, including semester, name,
USN, subject codes and names, internal marks, external marks, and total marks, is systematically
stored in a MySQL database for efficient retrieval and analysis. The system categorizes results into
predefined grading categories (FCD, FC, SC, Fail) and dynamically calculates pass/fail rates per
subject, providing valuable analytics for academic performance assessment. The results are
presented in an Excel sheet for clarity.
Additionally, this report includes a summary of a hands-on Full Stack Web Development training
program for 5th-semester students, held as part of an Innovation/Entrepreneurship/Societal-based
internship. The program provided comprehensive knowledge in the three tiers of full-stack web
development, delivered by professionals from both academia and industry. In the database training,
students were trained in database fundamentals, including MySQL installation, workspace
management, and basic operations such as creating databases and tables, and adding/removing
values. For backend development with Node.js, students learned about servers and ports, installed
Visual Studio Code, and created backend frameworks to connect to databases, using Postman for
server requests. The frontend development training focused on Vanilla.js, where students
developed user-friendly front-end frameworks, hosted them on live servers, and learned about
HTML and CSS properties. They also linked the frontend with the backend for a seamless user
experience.
ii
TABLE OF CONTENTS
Acknowledgement i
Executive summary ii
Table of Contents iii
List of figures v
iii
Chapter 5 – Project Report: Placement Management System 30-31
5.1 Introduction 30
5.2 Objectives of the Project 30
5.3 Scope 31
References 39
Copyright Cetificate 40
iv
LIST OF FIGURES
v
Internship report
Chapter-1
Overview
Objective: The primary goal of this project is to automate the analysis of academic results,
addressing the challenge of efficiently parsing PDF documents to extract critical academic data.
Scope: The project is for the Department/Staff of GSSSIETW to upload and analyze academic
results. This project interface guides the users through the PDF upload process, utilizing server-
side Python scripts to parse the PDFs and extract data such as semester, name, USN, all the subject
codes and names that are present in the pdf, internal marks, external marks, and total marks.
Significance: The extracted data like semester, name, USN, all the subject codes and names that
are present in the pdf, internal marks, external marks, and total marks are extracted and stored
systematically in a MySQL database, enabling efficient retrieval and analysis. The project
calculates the SGPA and categorizes the results into predefined grading categories (FCD, FC, SC,
Fail) and dynamically calculates pass/fail rates per subject. This provides valuable analytics for
academic performance assessment. This project also finds the top 3 toppers.
Outcome: The automated result analysis system revolutionizes the management of academic data,
offering efficient and accurate result analysis, which is presented in a clear and concise Excel
format. By implementing this system, the Department/Staff can easily do the result analysis
process, significantly reducing the time and effort required for manual analysis and hence
enhancing the overall efficiency of academic performance assessment.
Chapter-2
Internship Training
2.1 Internship Overview
During my internship, I had the enriching opportunity to contribute to the development of a social
application aptly named "Dark web" This project served as a practical and comprehensive
exploration of various facets of web development, ranging from client-side vanilla API to the
intricacies of controllers, routes, and the utilization of MySQL for database management. The
overarching goal of the "Dark web" application was to create a dynamic and engaging platform
that facilitated user interaction and content sharing. The duration of my internship was dedicated
to mastering the essential components that constitute a robust web application.
One of the key learning objectives was to understand and implement client-side vanilla API. This
involved delving into the intricacies of asynchronous communication, enabling the application to
seamlessly interact with external servers. Through hands-on experience, I gained a profound
understanding of API integration, ensuring that data flow between the client and server was both
efficient and responsive. The integration of client-side components was a major milestone. This
ensured a responsive and intuitive user interface, enhancing the overall user experience.
In parallel, I honed my skills in designing controllers, the linchpin of application logic. Crafting
these controllers involved meticulous planning to ensure the smooth flow of data and user
interactions. Each controller played a crucial role in orchestrating the application's behavior, and
I took on the responsibility of designing and implementing these components to enhance the
overall functionality of "Dark Web" The implementation of efficient controllers contributed to
the creation of a structured and navigable application.
• Example Code:
➢ Designing Controllers:
• Crafting controllers involved meticulous planning to ensure the smooth flow of
data and userinteractions. Each controller played a crucial role in orchestrating
the application's behavior.
• Example Code:
2.5 Challenges
During the development of the "Dark Web" application, several challenges emerged:
1. Asynchronous Communication Complexity:
Ensuring smooth data flow between the client and server
2. Integrating MySQL with Node.js
Setting up and managing the MySQL database within the Node.js environment
posed some difficulties
3. User Interface Responsiveness:
Achieving a responsive and intuitive user interface was crucial for enhancing user
experience
Future Enhancements:
• Enhanced Security Features: Implement advanced security measures, such as encryption
and secure authentication protocols, to protect user data and ensure privacy.
• Scalability Improvements: Optimize the application's architecture to handle increased
traffic and data volume, ensuring consistent performance as the user base grows.
• Feature Expansion: Introduce new features based on user feedback, such as enhanced
content sharing options, real-time notifications, and social media integration, to increase
user engagement and satisfaction.
• Performance Optimization: Continuously monitor and optimize the application's
performance, focusing on reducing load times and improving overall efficiency.
• By addressing these challenges and achieving these accomplishments, the "Dark Web"
application project provided a comprehensive learning experience, equipping me with the
skills and knowledge necessary for future web development.
Chapter-3
Full Stack Workshop
3.1 Introduction to Full Stack Development
The workshop provided an introduction to full stack web development and database management.
The students were guided through the installation of MySQL (an open-source relational database
management system) and VS Code (a code editor). The MySQL Shell for VSCode extension was
installed to enable interactive editing and execution of SQL for MySQL databases.
Chapter-4
Project
4.1 Problem Statement Overview
The primary problem addressed in this project is the manual and time-consuming process of
analyzing academic results. The project parses PDFs to extract critical academic data, including
Name, USN, Sem, Subject codes, Subject names, Internal marks, External marks, Total Marks,
and the results. Then calculate the SGPA of each student. The current manual approach is prone
to errors, inefficient, and lacks a systematic method for storing and analyzing the data, making it
challenging to assess academic performance effectively.
• Ensure the interface is intuitive and enhances the overall user experience.
5. Data Analysis and Reporting:
• Provide detailed analytics for academic performance assessment.
• Present the results clearly and concisely in an Excel format for easy interpretation
and reporting.
2. The tutorial Extract PDF Content with Python by NeuralNine [2] demonstrates techniques
for extracting data from PDF files using Python. This resource offers practical guidance
on utilizing Python libraries such as PyPDF2 for parsing PDF content, which is crucial
for extracting academic result data from uploaded PDF files in the proposed system.
3. SQL + Python: Master Data Analysis and Create PDF Reports by Coding Is Fun [3]
explores the integration of SQL databases with Python for data analysis and report
generation. This tutorial provides insights into utilizing SQL queries to manipulate and
analyze data stored in MySQL databases, which aligns with the data management and
analysis requirements of the proposed system.
4. Anushree Raj and Rio D'Souza's paper on the implementation of MySQL in Python [4]
presents a detailed exploration of using MySQL databases in Python applications. This
resource delves into connecting to MySQL databases, executing queries, and handling
data retrieval and manipulation tasks, providing valuable guidance for integrating MySQL
with Python in the proposed system.
4.4 Solution
The solution to the problem involves a multi-step process of extracting, processing, and storing
academic data from PDF files into a structured format using Python. Below is a detailed
explanation of the solution:
1. Database Setup:
Tables Creation: The solution starts by creating necessary database tables to store extracted
data. The tables include:
• `extracted_data` for storing detailed academic records.
• `subject_details` for storing subject-specific information.
• `sgpa_data` for storing the Semester Grade Point Average (SGPA) for students.
• `topper_data` for storing top performers based on SGPA.
2. Data Extraction:
• PDF Extraction: The solution reads PDF files containing academic results using the
PyMuPDF library (`fitz`). It extracts raw text from each page of the PDF documents.
• Text Processing: The raw text is processed to extract specific sections like "University
Seat Number," "Semester," "Student Name," and academic details such as subject
codes, names, and marks. Regular expressions are used to identify and extract these
details from the text.
• Data Structuring: Extracted data is organized into structured records, which are then
used to populate the database tables.
3. Data Insertion:
Inserting Records: The solution inserts the extracted data into the respective database tables
(`extracted_data`, `subject_details`, `sgpa_data`, and `topper_data`). This step involves
inserting both individual records and aggregated data.
4. Data Processing:
• Grade Point Calculation: For each academic record, the solution calculates the grade
point based on total marks and subject credits. A predefined scale is used to determine
the grade points.
• SGPA Calculation: The SGPA is computed for each student based on their
performance across all subjects. The calculation takes into account credits and grade
points for accurate SGPA computation.
• Result Evaluation: Based on the SGPA, the solution determines the final result, such
as "First Class with Distinction," "First Class," "Second Class," or "Fail."
5. Data Export:
Excel File Creation: The processed data is exported to an Excel file using the `pandas` and
`openpyxl` libraries. The solution formats the Excel file with appropriate styles, including
bold headers and cell borders. Top Performers Extraction: After exporting the data, the
solution identifies and lists the top 3 performers based on their SGPA. This information is
added to a separate sheet in the same Excel file.
Methodologies Used
1. Design Thinking:
• Empathize: I began by understanding the needs and challenges associated with
extracting and processing academic data from PDFs. This involved identifying
key user requirements and pain points.
• Define: The project goals were given by the department, including the need to
extract specific data fields from PDFs and store them in a structured format for
analysis. This stage helped in creating clear objectives and scope for the project.
• Ideate: I brainstormed solutions for text extraction, data processing, and
reporting. This included considering various methods and technologies to
handle the extraction and processing tasks effectively.
• Prototype: I developed initial prototypes of the extraction and processing
scripts. These prototypes were tested with sample data to evaluate their
effectiveness and make necessary adjustments.
• Test: I tested the prototypes with real-world data to ensure they met the
requirements. Feedback from testing was used to refine and improve the
solution.
2. Agile Development:
• Iterative Development: I adopted an iterative approach to development,
allowing for continuous refinement of the solution. Each iteration included
designing, implementing, testing, and reviewing the code.
Tools Used
1. Python:
• Libraries: Utilized libraries such as PyMuPDF (fitz) for PDF text extraction,
pandas for data manipulation and analysis, openpyxl for Excel file creation and
formatting, and pymysql for MySQL database interaction.
• Scripts: Developed Python scripts for automating the extraction, processing,
and storage of academic data.
2. MySQL:
• Database Management: Used MySQL for creating and managing the database
schema, storing extracted data, and performing queries. The pymysql library
was employed for connecting to and interacting with the MySQL database.
3. Excel:
• Data Export: Used openpyxl to export processed data to Excel. This included
formatting Excel files with styles such as bold headers and cell borders to
enhance readability.
subject details, SGPA records, and top performers. Indexes were added to improve
query performance. I also included foreign keys to maintain referential integrity.
Code Sinppet
if __name__ == "__main__":
folder_path = 'C:\\Users\\Fiza Naaz\\Desktop\\Result\\5semAIML'
semester_data = defaultdict(list)
subject_details = {}
try:
create_tables()
num_items_in_folder = len([file_name for file_name in os.listdir(folder_path) if
file_name.endswith('.pdf')])
print(f"Number of items in folder: {num_items_in_folder}")
for file_name in os.listdir(folder_path):
if file_name.endswith('.pdf'):
pdf_file_path = os.path.join(folder_path, file_name)
extracted_text = extract_text_from_pdf(pdf_file_path)
extracted_sections = extract_specific_sections(extracted_text)
table_data = extract_table_data(extracted_text, extracted_sections)
semester = extracted_sections.get("Semester", "Unknown Semester")
semester_data[semester].extend(table_data)
for semester, data in semester_data.items():
insert_extracted_data(data)
for row in data:
subject_code = row['Subject Code']
subject_name = row['Subject Name']
total_marks = row['Total Marks']
credits = fetch_credits(subject_code)
row['Grade Point'] = calculate_grade_point(total_marks, credits)
df_list = []
rounded_sgpa = round(sgpa, 2)
insert_sgpa_data(usn, semester, name, rounded_sgpa)
border = Border(left=Side(style='thin'),
right=Side(style='thin'),
top=Side(style='thin'),
bottom=Side(style='thin'))
for row in worksheet.iter_rows(min_row=2, max_row=worksheet.max_row,
min_col=1, max_col=worksheet.max_column):
for cell in row:
cell.border = border
num_rows_excel_sheet = len(df)
print(f"Number of rows in Excel sheet: {num_rows_excel_sheet}")
if num_items_in_folder == num_rows_excel_sheet:
df_sorted = df.sort_values(by='SGPA', ascending=False)
top_3_toppers_data = extract_top_3_toppers(df_sorted)
print(top_3_toppers_data)
insert_topper_data(top_3_toppers_data)
with pd.ExcelWriter(excel_file_path, engine='openpyxl', mode='a') as writer:
if 'Top 3 Toppers' in writer.book.sheetnames:
idx = writer.book.sheetnames.index('Top 3 Toppers')
writer.book.remove(writer.book.worksheets[idx])
top_3_toppers_data.to_excel(writer, sheet_name='Top 3 Toppers', index=False)
print("Excel file created successfully.")
print("Data extraction and storage complete.")
else:
print("Number of items in the folder does not match the number of rows in the Excel
sheet.")
except Exception as e:
print(f"An error occurred: {e}")
Figure 4.8.2 Displaying the subject code credits present in the PDF, validation for number of Students
whose PDF was read and top 3 toppers
Figure 4.8.3 Extracted Data from the pdf stored into the database
Figure 4.8.4 Databases containing the subject name, subject code, credits, year and sem
Chapter-5
Project Report: Result Analysis
5.1 Introduction
The project focuses on developing a comprehensive system for extracting, processing, and
analyzing academic data from PDF files. The system leverages advanced techniques in data
extraction, cleaning, and database management to transform unstructured PDF documents into
structured and actionable insights. The main objectives are to automate the extraction of academic
records, compute relevant metrics such as SGPA, and generate detailed reports for academic
analysis. By integrating data processing with database management and reporting, the project aims
to enhance the efficiency and accuracy of managing academic performance data.
5.3 Scope
1. Data Extraction:
The project covers the extraction of academic data from PDF files provided by the
institution. It includes text extraction from multiple PDF formats and cleaning of extracted
data to ensure consistency and accuracy.
2. Database Design and Management:
The project involves designing a normalized database schema to store various types of
data, including student records, subject details, SGPA calculations, and top performers. It
includes the implementation of database tables, indexes, and relationships.
3. Data Processing and Calculation:
The project encompasses the development of algorithms to compute grade points, SGPA,
and other relevant metrics based on the extracted data. It includes handling of special
cases such as failed subjects and integration of these calculations into the database.
4. Reporting and Export:
The project includes generating detailed reports in Excel format, with properly formatted
data and clear presentation. It covers the creation of main reports as well as additional
sheets for top performers.
5. Error Handling and Validation:
The project includes implementing validation checks and error handling mechanisms to
ensure data integrity and address any discrepancies during extraction and processing.
Chapter-6
Internship Report: Full Stacked Web Development
6.1 Introduction
The Department of Computer Science & Engineering and Information Science & Engineering
organized a Three Days Workshop during a one-month internship on “FULL STACK WEB
DEVELOPMENT” in association with CSI Mysore Chapter for 5th semester Computer Science
& Engineering and Information Science & Engineering students from 14th, 15th, and 17th
November 2023. Mr. Vinay Kumar Venkataramana, Creative Enthusiast, Building PraCodAI,
and NETTED community, AI and Research entrepreneur-working in the AI domain, was the
resource person for the workshop.
➢ Designing Controllers:
• Crafting controllers involved meticulous planning to ensure the smooth flow of
data and userinteractions. Each controller played a crucial role in orchestrating
the application's behavior.
• Example Code:
➢ Defining Routes:
• Carefully structuring routes to create an intuitive and navigable user experience
was anotherkey focus. This involved mapping out the various path users could
take within the application.
• Example Code:
6.4 Accomplishments:
• Integration of Client-Side Components: Ensured a responsive and intuitive user
interface, enhancing the overall user experience.
• Implementation of Efficient Controllers and Well-Defined Routes: Contributed to the
creation of a structured and navigable application.
• Maintenance of a Robust and Secure MySQL Database: Underscored the importance
of data integrity and scalability.
Chapter-7
Learning and Skills Acquired
Chapter-8
Summary Of My Internship
References
[1] Prabhu T Kannan, Srividya K Bansal,"Unimate: A Student Information System",2013
International Conference on Advances in Computing, Communications and Informatics
(ICACCI)-p-1251-1256
[2] Dipin Budhrani1 & Vivek Mulchandani “Student Information Management System”,2018
IJEDR.
[3] Rajnish Tripathii, Raghvendra Singh, Ms. Jaweria Usmani,” Campus Recruitment and
Placement System”, In International Conference on Recent Innovations in Science and
Engineering (ICRISE-18), April, 2018.
Copyright Certificate