0% found this document useful (0 votes)
73 views47 pages

Internship

DBMS internship report
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views47 pages

Internship

DBMS internship report
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Machhe, Belagavi, Karnataka-590018

Report on
INNOVATION/ ENTREPRENEURSHIP/ SOCIETAL INTERNSHIP (21INT68)

Submitted in partial fulfillment towards requirements for the award of the degree of

Bachelor of Engineering

in

CSE (Artificial Intelligence and Machine learning)

Submitted by

Fiza Naaz (4GW21CI011)


Under the Guidance of

Dr. Manjuprasad B

Professor & HOD

DEPARTMENT OF CSE (ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)

GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN


(Affiliated to VTU, Belagavi, Approved by AICTE, New Delhi & Govt. of Karnataka)

K.R.S ROAD, METAGALLI, MYSURU-570016, KARNATAKA

2023-2024
GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN

K.R.S Road, Mysuru-570016, Karnataka


(Affiliated to VTU, Belagavi, Approved by AICTE -New Delhi & Govt. of Karnataka)

DEPARTMENT OF CSE (ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)

This is to certify that the Internship program is a Bonafide work carried out by “Fiza Naaz”
(4GW21CI011) in partial fulfillment for the award of degree of Bachelor of Engineering in
CSE (Artificial Intelligence and Machine Learning) of the Visveswaraya Technological
University,Belagavi, during the year 2023-24. The Report has been approved as it satisfies
the academicrequirements with respect to the prescribed for Bachelor of Engineering Degree.

……………………………..
Dr. Manjuprasad B
Guide & HoD
ACKNOWLEGMENT

Without acknowledging those who made it possible, the happiness and satisfaction that come
with completing a task successfully would be lacking.

I am thankful to Smt. Vanaja B Pandit, Honorary Secretary, GSSSIETW, Mysuru, and the
Management of GSSSIETW, Mysuru for providing necessary support to carry out this internship.

I am thankful to Dr. Shivakumar M, Principal, GSSSIETW, Mysuru, for all the support he has
rendered.

I would like to thank my guide Dr. Manjuprasad B, Prof. & Head for his constant monitoring,
guidance & motivation throughout the tenure of this internship.

I am extremely pleased to thank my teammate, parents, family members, CSE(AIML) staff and
classmates for their continuous support, inspiration and encouragement, for their helping hand and
also last but not the least, I thank all the members who supported directly or indirectly in my
academic process.

Fiza Naaz
(4GW21CI011)

i
Executive Summary

This report details an internship project about "Result Analysis Automation Using Python Fitz and
PyMySQL" conducted as part of the INNOVATION/ENTREPRENEURSHIP/SOCIETAL
INTERNSHIP (21INT68) Bachelor of Engineering degree in Computer Science and Engineering
(Artificial Intelligence and Machine Learning) at GSSS Institute of Engineering & Technology for
Women. The project aims to automate result analysis for the Department/Staff of GSSSIETW by
efficiently parsing PDFs to extract academic results. The extracted data, including semester, name,
USN, subject codes and names, internal marks, external marks, and total marks, is systematically
stored in a MySQL database for efficient retrieval and analysis. The system categorizes results into
predefined grading categories (FCD, FC, SC, Fail) and dynamically calculates pass/fail rates per
subject, providing valuable analytics for academic performance assessment. The results are
presented in an Excel sheet for clarity.

Additionally, this report includes a summary of a hands-on Full Stack Web Development training
program for 5th-semester students, held as part of an Innovation/Entrepreneurship/Societal-based
internship. The program provided comprehensive knowledge in the three tiers of full-stack web
development, delivered by professionals from both academia and industry. In the database training,
students were trained in database fundamentals, including MySQL installation, workspace
management, and basic operations such as creating databases and tables, and adding/removing
values. For backend development with Node.js, students learned about servers and ports, installed
Visual Studio Code, and created backend frameworks to connect to databases, using Postman for
server requests. The frontend development training focused on Vanilla.js, where students
developed user-friendly front-end frameworks, hosted them on live servers, and learned about
HTML and CSS properties. They also linked the frontend with the backend for a seamless user
experience.

© Result Analysis Automation Using Python Fitz and PyMySQL

ii
TABLE OF CONTENTS
Acknowledgement i
Executive summary ii
Table of Contents iii
List of figures v

Chapter 1 - Overview 1-3


1.1 Brief overview of the Result Analysis 1
1.2 Brief overview of the Full Stack Web Development Internship 2-3

Chapter 2 - Internship Training 4-7


2.1 Internship Overview 4
2.2 Client-Side Development and API Integration 4
2.3 Controllers and Application Logic 4
2.4 Project Experience: Social Application Development 5
2.5 Challenges 6
2.6 Accomplishments and Future Enhancements 6-7

Chapter 3 – Full Stack Workshop 8-9


3.1 Introduction to Full Stack Development 8
3.2 SQL and Database Management 8
3.3 Back-end Development 8
3.4 Front-end Development 8
3.5 Student Feedback 9

Chapter 4 – Project 10-29


4.1 Problem Statement Overview 10
4.2 Goals and Objectives of the Project 10
4.3 Literature survey 11-12
4.4 Solution 12-13
4.5 Personal role and Contributions 14-15
4.6 Methodologies and tools 15-17
4.7 Overcoming challenges 17-19
4.8 Outcomes, code snippet and impact of the project 19-29

iii
Chapter 5 – Project Report: Placement Management System 30-31
5.1 Introduction 30
5.2 Objectives of the Project 30
5.3 Scope 31

Chapter 6 – Internship Report: Full Stack Web Development 24-27


6.1 Introduction 32
6.2 Project Experience: Social Application Development 32
6.3 Key Learning Objectives 32-34
6.4 Accomplishments 2
6.5 Potential Enhancements 27

Chapter 7 – Learning and Skills acquired 28-29


7.1 Technical Skills 28
7.2 Soft skills 28
7.3 Industry-specific knowledge 29

Chapter 8 – Summary of My Internship 30

References 39
Copyright Cetificate 40

iv
LIST OF FIGURES

Fig No. List of Figures Page No.

Figure 2.2.1 Fetching user data from external API 5


Figure 2.2.2 Express.js controller 5
Figure 4.8.1 Folders containing Excel sheet to be read 20
Figure 4.8.2 Displaying the subject code credits present in the 26
PDF, validation for number of Students whose
PDF was read and top 3 toppers
Figure 4.8.3 Extracted Data from the pdf stored into the 26
database
Figure 4.8.4 Databases containing the subject name, subject 27
code, credits, year and semester
Figure 4.8.5 Databases containing the calculated SGPA of the 27
students
Figure 4.8.6 Database containing the topper list 28
Figure 6.3.1 Fetching user data from external API code 26
Figure 6.3.2 Express.js designing controller code 26
Figure 6.3.3 Express.js designing routes code 27
Figure 6.3.4 MySQL table query 27

v
Internship report

Chapter-1
Overview

1.1 Brief Overview of the Result Analysis


This report provides a comprehensive overview of an internship project titled "Result Analysis"
undertaken as part of the Bachelor of Engineering degree in Computer Science and Engineering
(Artificial Intelligence and Machine Learning) at GSSS Institute of Engineering & Technology for
Women.

Objective: The primary goal of this project is to automate the analysis of academic results,
addressing the challenge of efficiently parsing PDF documents to extract critical academic data.

Scope: The project is for the Department/Staff of GSSSIETW to upload and analyze academic
results. This project interface guides the users through the PDF upload process, utilizing server-
side Python scripts to parse the PDFs and extract data such as semester, name, USN, all the subject
codes and names that are present in the pdf, internal marks, external marks, and total marks.

Significance: The extracted data like semester, name, USN, all the subject codes and names that
are present in the pdf, internal marks, external marks, and total marks are extracted and stored
systematically in a MySQL database, enabling efficient retrieval and analysis. The project
calculates the SGPA and categorizes the results into predefined grading categories (FCD, FC, SC,
Fail) and dynamically calculates pass/fail rates per subject. This provides valuable analytics for
academic performance assessment. This project also finds the top 3 toppers.

Outcome: The automated result analysis system revolutionizes the management of academic data,
offering efficient and accurate result analysis, which is presented in a clear and concise Excel
format. By implementing this system, the Department/Staff can easily do the result analysis
process, significantly reducing the time and effort required for manual analysis and hence
enhancing the overall efficiency of academic performance assessment.

Dept. of CSE(AI&ML), GSSSIETW Page 1


Internship report

1.2 Brief Overview of the Full Stack Web Development Internship


During the internship, I contributed to the development of a social application named "Dark
website" This project involved practical and comprehensive exploration of web development,
covering client-sidevanilla API, controllers, routes, and MySQL for database management. The
key areas of learning included:

1. Client-Side Vanilla API: Implemented asynchronous communication for efficient


datainteraction between client and server.
2. Designing Controllers: Developed application logic to ensure smooth data flow and
user interactions
3. Defining Routes: Structured routes to create an intuitive user experience.
4. Database Management with MySQL: Worked on data modeling, query optimization,
andensuring data consistency.
The internship provided hands-on experience with full stack web development, mastering
essential components like APIs, controllers, routes and database management.

Key Highlights of the Internship


➢ Integration of Client-Side Components:
1. Implemented client-side vanilla API for seamless communication with external
servers.
2. Gained a profound understanding of API integration, ensuring efficient and
responsive data flow between client and server.
➢ Implementation of Efficient Controllers and Well-Defined Routes:
1. Designed controllers to handle application logic and data flow.
2. Structured routes to create an intuitive and navigable user experience, mapping
out various path users could take within the application.
➢ Database Management with MySQL:
1. Managed data consistency and query optimization in MySQL.
2. Worked on data modeling and ensuring data consistency, which served as

Dept. of CSE(AI&ML), GSSSIETW Page 2


Internship report

the backbone of the application, storing user profiles and content


➢ Practical Application Development:
1. Contributed to the development of the "Dark Website" application, enhancing user
interactionand content sharing.
2. Developed a responsive and intuitive user interface, ensuring a seamless user
experience.

Accomplishments and Future Enhancements


Accomplishments:
• Developed a responsive and intuitive user interface.
• Ensured efficient data flow and interaction through well-designed controllers
and routes.
• Maintained a robust and secure database with MySQL
Future Enhancements:
• Incorporate additional features to enrich user interaction.
• Further optimize database performance for scalability.

Dept. of CSE(AI&ML), GSSSIETW Page 3


Internship report

Chapter-2

Internship Training
2.1 Internship Overview

During my internship, I had the enriching opportunity to contribute to the development of a social
application aptly named "Dark web" This project served as a practical and comprehensive
exploration of various facets of web development, ranging from client-side vanilla API to the
intricacies of controllers, routes, and the utilization of MySQL for database management. The
overarching goal of the "Dark web" application was to create a dynamic and engaging platform
that facilitated user interaction and content sharing. The duration of my internship was dedicated
to mastering the essential components that constitute a robust web application.

2.2 Client-Side Development and API Integration

One of the key learning objectives was to understand and implement client-side vanilla API. This
involved delving into the intricacies of asynchronous communication, enabling the application to
seamlessly interact with external servers. Through hands-on experience, I gained a profound
understanding of API integration, ensuring that data flow between the client and server was both
efficient and responsive. The integration of client-side components was a major milestone. This
ensured a responsive and intuitive user interface, enhancing the overall user experience.

2.3 Controllers and Application Logic

In parallel, I honed my skills in designing controllers, the linchpin of application logic. Crafting
these controllers involved meticulous planning to ensure the smooth flow of data and user
interactions. Each controller played a crucial role in orchestrating the application's behavior, and
I took on the responsibility of designing and implementing these components to enhance the
overall functionality of "Dark Web" The implementation of efficient controllers contributed to
the creation of a structured and navigable application.

Dept. of CSE(AI&ML), GSSSIETW Page 4


Internship report

2.4 Project Experience: Social Application Development

These the key learnings from our internship training:

➢ Client-Side Vanilla API:

• Understanding and implementing client-side vanilla API involved delving into


the intricacies of asynchronous communication, enabling the application to
seamlessly interact with external servers.

• Example Code:

Figure 2.2.1 Fetching user data from external API

➢ Designing Controllers:
• Crafting controllers involved meticulous planning to ensure the smooth flow of
data and userinteractions. Each controller played a crucial role in orchestrating
the application's behavior.
• Example Code:

Figure 2.2.2 Express.js controller

Dept. of CSE(AI&ML), GSSSIETW Page 5


Internship report

2.5 Challenges
During the development of the "Dark Web" application, several challenges emerged:
1. Asynchronous Communication Complexity:
Ensuring smooth data flow between the client and server
2. Integrating MySQL with Node.js
Setting up and managing the MySQL database within the Node.js environment
posed some difficulties
3. User Interface Responsiveness:
Achieving a responsive and intuitive user interface was crucial for enhancing user
experience

2.6 Accomplishments and Future Enhancements


Accomplishments:
• Successful API Integration: Achieved seamless integration of client-side vanilla API,
facilitating efficient and responsive communication between the client and server.
• Robust Controllers: Designed and implemented efficient controllers that ensured smooth
data flow and user interactions, contributing to a structured and navigable application.
• Responsive User Interface: Developed a responsive and intuitive user interface,
enhancing the overall user experience and making the application accessible across
different devices.
• Database Management: Effectively utilized MySQL for database management, ensuring
reliable data storage and retrieval processes.

Dept. of CSE(AI&ML), GSSSIETW Page 6


Internship report

Future Enhancements:
• Enhanced Security Features: Implement advanced security measures, such as encryption
and secure authentication protocols, to protect user data and ensure privacy.
• Scalability Improvements: Optimize the application's architecture to handle increased
traffic and data volume, ensuring consistent performance as the user base grows.
• Feature Expansion: Introduce new features based on user feedback, such as enhanced
content sharing options, real-time notifications, and social media integration, to increase
user engagement and satisfaction.
• Performance Optimization: Continuously monitor and optimize the application's
performance, focusing on reducing load times and improving overall efficiency.
• By addressing these challenges and achieving these accomplishments, the "Dark Web"
application project provided a comprehensive learning experience, equipping me with the
skills and knowledge necessary for future web development.

Dept. of CSE(AI&ML), GSSSIETW Page 7


Internship report

Chapter-3
Full Stack Workshop
3.1 Introduction to Full Stack Development
The workshop provided an introduction to full stack web development and database management.
The students were guided through the installation of MySQL (an open-source relational database
management system) and VS Code (a code editor). The MySQL Shell for VSCode extension was
installed to enable interactive editing and execution of SQL for MySQL databases.

3.2 SQL and Database Management


Detailed explanations were given on various SQL statements and queries. The creation of
databases and tables with examples was demonstrated for basic operations performed in SQL,
namely CRUD (Create, Read, Update, and Delete). The students learned how to create and store
databases in MySQL. A database schema for a social media app was explained, based on which
the necessary database and tables were created and stored.

3.3 Back-end Development


On the second day of the workshop, students were introduced to backend development through
the installation of Node.js (an open-source server environment), NPM (Node Package Manager)
for downloading different Node.js packages, Express.js (a Node framework), and the mysql2
library from NPM for connecting MySQL databases. Postman, an API platform for building and
testing APIs, was also introduced. Code snippets for building APIs using Node.js and Express.js
were explained, and students created and tested these APIs in VS Code and Postman.

Dept. of CSE(AI&ML), GSSSIETW Page 8


Internship report

3.4 Front-end Development


On the third day of the workshop, students focused on the frontend implementation of the social
media web application. They used HTML, CSS, JavaScript, and Bootstrap for frontend
development.Various NPM packages were introduced, including cors, jsonwebtoken, bcryptjs,
and cookie-parser. Detailed explanations were given on creating interactive registration and login
web pages.

3.5 Student Feedback


At the conclusion of the workshop, student feedback was gathered. The students reported
achieving an intermediate understanding of full-stack web development and its tools, including
HTML, CSS, JavaScript, Bootstrap, Node.js, NPM packages, and MySQL. They also noted a
boost in confidence, which they felt would be advantageous for their internships and future
project implementations.

Dept. of CSE(AI&ML), GSSSIETW Page 9


Internship report

Chapter-4
Project
4.1 Problem Statement Overview
The primary problem addressed in this project is the manual and time-consuming process of
analyzing academic results. The project parses PDFs to extract critical academic data, including
Name, USN, Sem, Subject codes, Subject names, Internal marks, External marks, Total Marks,
and the results. Then calculate the SGPA of each student. The current manual approach is prone
to errors, inefficient, and lacks a systematic method for storing and analyzing the data, making it
challenging to assess academic performance effectively.

4.2 Goals and Objectives of the Project


The overarching goal of this project is to develop an automated system for result analysis, aimed
at streamlining and improving the efficiency and accuracy of the academic result processing at
GSSSIETW. The specific objectives include:
1. Automate PDF Parsing:
• Develop a system to automatically parse PDF documents containing academic
results.
• Extract essential data such as subject codes, names, marks, and results.
2. Database Management:
• Store the extracted data systematically in a MySQL database.
• Ensure efficient data retrieval and management.
3. Result Categorization:
• Implement algorithms to categorize results into predefined grading categories
(FCD, FC, SC, Fail).
• Dynamically calculate pass/fail rates per subject.
4. User-Friendly Interface:
• Develop a web interface that allows users to upload PDFs and view results
effortlessly.

Dept. of CSE(AI&ML), GSSSIETW Page 10


Internship report

• Ensure the interface is intuitive and enhances the overall user experience.
5. Data Analysis and Reporting:
• Provide detailed analytics for academic performance assessment.
• Present the results clearly and concisely in an Excel format for easy interpretation
and reporting.

4.3 Literature survey


1. Riajul Kashem's Student Result Management System written in web framework [1] offers
valuable insights into the design and architecture of a similar academic management
system. This reference provides guidance on leveraging the framework for web
application development, which could be beneficial for building the user interface and
backend functionalities of the proposed system.

2. The tutorial Extract PDF Content with Python by NeuralNine [2] demonstrates techniques
for extracting data from PDF files using Python. This resource offers practical guidance
on utilizing Python libraries such as PyPDF2 for parsing PDF content, which is crucial
for extracting academic result data from uploaded PDF files in the proposed system.

3. SQL + Python: Master Data Analysis and Create PDF Reports by Coding Is Fun [3]
explores the integration of SQL databases with Python for data analysis and report
generation. This tutorial provides insights into utilizing SQL queries to manipulate and
analyze data stored in MySQL databases, which aligns with the data management and
analysis requirements of the proposed system.

4. Anushree Raj and Rio D'Souza's paper on the implementation of MySQL in Python [4]
presents a detailed exploration of using MySQL databases in Python applications. This
resource delves into connecting to MySQL databases, executing queries, and handling
data retrieval and manipulation tasks, providing valuable guidance for integrating MySQL
with Python in the proposed system.

Dept. of CSE(AI&ML), GSSSIETW Page 11


Internship report

5. GeeksForGeeks' article on PHP-MySQL database introduction [5] and W3Schools' PHP-


MySQL tutorial [6] offer foundational knowledge on using PHP for interacting with
MySQL databases. These resources cover essential concepts such as establishing database
connections, executing SQL queries, and handling database operations in PHP scripts,
which could be useful for implementing server-side functionalities in the proposed
system.

4.4 Solution
The solution to the problem involves a multi-step process of extracting, processing, and storing
academic data from PDF files into a structured format using Python. Below is a detailed
explanation of the solution:
1. Database Setup:
Tables Creation: The solution starts by creating necessary database tables to store extracted
data. The tables include:
• `extracted_data` for storing detailed academic records.
• `subject_details` for storing subject-specific information.
• `sgpa_data` for storing the Semester Grade Point Average (SGPA) for students.
• `topper_data` for storing top performers based on SGPA.

2. Data Extraction:
• PDF Extraction: The solution reads PDF files containing academic results using the
PyMuPDF library (`fitz`). It extracts raw text from each page of the PDF documents.
• Text Processing: The raw text is processed to extract specific sections like "University
Seat Number," "Semester," "Student Name," and academic details such as subject
codes, names, and marks. Regular expressions are used to identify and extract these
details from the text.
• Data Structuring: Extracted data is organized into structured records, which are then
used to populate the database tables.

Dept. of CSE(AI&ML), GSSSIETW Page 12


Internship report

3. Data Insertion:
Inserting Records: The solution inserts the extracted data into the respective database tables
(`extracted_data`, `subject_details`, `sgpa_data`, and `topper_data`). This step involves
inserting both individual records and aggregated data.

4. Data Processing:
• Grade Point Calculation: For each academic record, the solution calculates the grade
point based on total marks and subject credits. A predefined scale is used to determine
the grade points.
• SGPA Calculation: The SGPA is computed for each student based on their
performance across all subjects. The calculation takes into account credits and grade
points for accurate SGPA computation.
• Result Evaluation: Based on the SGPA, the solution determines the final result, such
as "First Class with Distinction," "First Class," "Second Class," or "Fail."

5. Data Export:
Excel File Creation: The processed data is exported to an Excel file using the `pandas` and
`openpyxl` libraries. The solution formats the Excel file with appropriate styles, including
bold headers and cell borders. Top Performers Extraction: After exporting the data, the
solution identifies and lists the top 3 performers based on their SGPA. This information is
added to a separate sheet in the same Excel file.

6. Validation and Reporting:


• Validation: The solution checks if the number of processed records matches the
number of PDF files processed. This ensures data integrity.
• Reporting: The solution prints information about the number of items processed and
any potential errors encountered during execution.

Dept. of CSE(AI&ML), GSSSIETW Page 13


Internship report

4.5 Personal Role and Contributions


1. Project Planning and Design:
• Requirements Analysis: I began by understanding the requirements for extracting and
processing academic data from PDF files. This involved identifying the necessary data
points and the structure of the database tables.
• System Design: I designed the overall system architecture, including the database
schema and data flow, to ensure that the extracted data could be effectively managed
and analyzed.
2. Development and Implementation:
• Database Setup: I created and configured the database schema, including tables for
extracted data, subject details, SGPA records, and top performers. This involved
writing SQL scripts to ensure data integrity and efficient querying.
• Data Extraction: I developed the Python scripts to extract text from PDF files using
PyMuPDF. This involved implementing text extraction and cleaning processes to
handle various formatting issues.
• Data Processing: I wrote functions to process the extracted text, calculate grade points
and SGPA, and determine student results based on predefined criteria. This included
developing algorithms for accurate calculations and handling special cases (e.g., fail
grades).
• Data Insertion: I implemented the logic for inserting processed data into the database,
ensuring that records were accurately stored and could be retrieved for further analysis.
• Excel Export: I developed the functionality to export the processed data to Excel,
including formatting and styling the output to ensure readability and usability.
3. Validation and Testing:
• Data Validation: I performed validation to ensure that the number of processed records
matched the number of PDF files. This step was crucial to verify the accuracy and
completeness of the data extraction process.
• Error Handling: I implemented error handling mechanisms to manage any issues
encountered during the extraction, processing, and storage stages. This included

Dept. of CSE(AI&ML), GSSSIETW Page 14


Internship report

logging errors and providing meaningful error messages.


4. Reporting and Documentation:
• Reporting: I generated reports summarizing the data extraction and processing results,
including the number of records processed, top performers, and any discrepancies
encountered.
• Documentation: I documented the code, processes, and system design to ensure that
the project could be understood and maintained by others. This included writing
comments in the code and preparing project documentation.

4.6 Methodologies and tools

Methodologies Used
1. Design Thinking:
• Empathize: I began by understanding the needs and challenges associated with
extracting and processing academic data from PDFs. This involved identifying
key user requirements and pain points.
• Define: The project goals were given by the department, including the need to
extract specific data fields from PDFs and store them in a structured format for
analysis. This stage helped in creating clear objectives and scope for the project.
• Ideate: I brainstormed solutions for text extraction, data processing, and
reporting. This included considering various methods and technologies to
handle the extraction and processing tasks effectively.
• Prototype: I developed initial prototypes of the extraction and processing
scripts. These prototypes were tested with sample data to evaluate their
effectiveness and make necessary adjustments.
• Test: I tested the prototypes with real-world data to ensure they met the
requirements. Feedback from testing was used to refine and improve the
solution.

Dept. of CSE(AI&ML), GSSSIETW Page 15


Internship report

2. Agile Development:
• Iterative Development: I adopted an iterative approach to development,
allowing for continuous refinement of the solution. Each iteration included
designing, implementing, testing, and reviewing the code.

• Collaboration: I regularly interacted with the lecture to gather feedback and


make adjustments based on their needs. This ensured that the project remained
aligned with user expectations and requirements.
• Incremental Delivery: I focused on delivering incremental improvements to
the system, starting with basic functionality and progressively adding features
such as data processing.

Tools Used
1. Python:
• Libraries: Utilized libraries such as PyMuPDF (fitz) for PDF text extraction,
pandas for data manipulation and analysis, openpyxl for Excel file creation and
formatting, and pymysql for MySQL database interaction.
• Scripts: Developed Python scripts for automating the extraction, processing,
and storage of academic data.
2. MySQL:
• Database Management: Used MySQL for creating and managing the database
schema, storing extracted data, and performing queries. The pymysql library
was employed for connecting to and interacting with the MySQL database.
3. Excel:
• Data Export: Used openpyxl to export processed data to Excel. This included
formatting Excel files with styles such as bold headers and cell borders to
enhance readability.

Dept. of CSE(AI&ML), GSSSIETW Page 16


Internship report

4. Regular Expressions (Regex):


• Text Extraction: Employed regular expressions for parsing and extracting specific
data fields from the raw text extracted from PDF files.
5. Version Control:
• Git: Used Git for version control to manage changes to the codebase and
collaborate with other developers if applicable.
6. IDE/Text Editor:
• Development Environment: Utilized an integrated development environment
(IDE) or text editor for coding, debugging, and testing the scripts.

4.7 Overcoming Challenges


1. Data Extraction Accuracy:
• Extracting text from PDFs can be error-prone due to varying formats and structures
of the documents. Inconsistent text extraction could lead to incomplete or
inaccurate data.
• I implemented robust text extraction and cleaning procedures using PyMuPDF
(fitz). Regular expressions were fine-tuned to handle different formats and ensure
that the relevant data fields were accurately extracted. I also tested the extraction
process with multiple sample PDFs to verify accuracy.
2. Data Formatting and Parsing:
• The text extracted from PDFs often required significant formatting and parsing to
identify and structure the data correctly.
• I developed comprehensive regular expressions to extract specific data fields and
implemented additional data cleaning steps to handle inconsistencies. This included
removing unwanted characters and standardizing the extracted data.
3. Database Schema Design:
• Designing a database schema that could efficiently store and query the diverse set
of data extracted from PDFs required careful planning.
• I designed a normalized database schema with separate tables for extracted data,

Dept. of CSE(AI&ML), GSSSIETW Page 17


Internship report

subject details, SGPA records, and top performers. Indexes were added to improve
query performance. I also included foreign keys to maintain referential integrity.

4. Handling Missing or Incomplete Data:


• Some PDFs contained missing or incomplete data, which could affect the accuracy
of the results.
• I implemented checks and default values for missing data fields during the
extraction and processing stages. For example, if a field like "Semester" was
missing, a default value of "Unknown Semester" was used to ensure that the process
continued smoothly.
5. Calculating Grade Points and SGPA:
• Accurate calculation of grade points and SGPA required careful handling of credits,
marks, and grades, including dealing with failed subjects.
• I created a detailed algorithm to calculate grade points based on predefined ranges
and handle the impact of failing grades on SGPA. The calculation logic was
rigorously tested with different scenarios to ensure correctness.
6. Data Integrity and Validation:
• Ensuring data integrity and validating that the processed data matched the number
of PDFs was critical for reliability.
• I implemented validation checks to compare the number of processed records with
the number of PDF files. Regular audits and error handling were incorporated to
detect and address any discrepancies.
7. Performance Optimization:
• Processing large volumes of data from multiple PDFs could lead to performance
issues, including slow execution and memory usage.
• I optimized the code for performance by efficiently handling data structures and
minimizing redundant operations. Batch processing and indexing were used to
improve database interaction speed.

Dept. of CSE(AI&ML), GSSSIETW Page 18


Internship report

8. Excel Export and Formatting:


• Exporting data to Excel with the required formatting and ensuring that the file met
the project’s requirements were complex.
• I used the openpyxl library to customize the Excel output, including styling headers
and adding borders. I also verified the formatting by generating sample reports.

4.8 Outcomes and Impact to the Project


1. Accurate Data Extraction:
• The project successfully extracted relevant academic data from PDF files with
high accuracy. The use of PyMuPDF and carefully crafted regular expressions
ensured that data fields such as student names, subject codes, marks, and results
were reliably extracted.
• Accurate data extraction laid the foundation for meaningful analysis and
reporting. It eliminated manual data entry errors and streamlined the data
processing workflow.
2. Efficient Data Processing:
• The project implemented efficient algorithms for calculating grade points, SGPA,
and determining student results. This included handling special cases like failed
subjects and calculating weighted averages.
• Automated processing reduced the time and effort required to compute academic
metrics, providing quick and accurate results for each student. This enabled timely
reporting and analysis.
3. Comprehensive Data Storage:
• A well-designed database schema was created, including tables for extracted data,
subject details, SGPA records, and top performers. This ensured that all relevant
data was stored systematically and could be easily queried.
• The structured storage of data facilitated efficient data retrieval and analysis. It
supported various reporting needs and allowed for future scalability of the system.

Dept. of CSE(AI&ML), GSSSIETW Page 19


Internship report

4. Detailed Reporting and Analysis:


• Outcome: The project generated detailed reports and Excel files with formatted
data, including student performance metrics, top performers, and overall results.
• Impact: The reports provided valuable insights into student performance and
facilitated decision-making. The formatted Excel files were ready for presentation
and further analysis, meeting the project’s reporting requirements.

5. Validation and Error Handling:


• Outcome: Robust validation checks and error handling mechanisms were
incorporated to ensure data integrity and handle any discrepancies encountered
during processing.
• Impact: These measures enhanced the reliability of the system, minimized the risk
of data corruption, and ensured that the final outputs were accurate and
trustworthy.
6. Enhanced User Experience:
• Outcome: The project delivered an easy-to-use system for extracting and
processing academic data, with user-friendly outputs and clear formatting.
• Impact: Improved user experience facilitated efficient data handling and reporting,
reducing the learning curve for users and increasing the overall usability of the
system.

Figure 4.8.1 Folders containing Excel sheet to be read

Dept. of CSE(AI&ML), GSSSIETW Page 20


Internship report

Code Sinppet
if __name__ == "__main__":
folder_path = 'C:\\Users\\Fiza Naaz\\Desktop\\Result\\5semAIML'
semester_data = defaultdict(list)
subject_details = {}
try:
create_tables()
num_items_in_folder = len([file_name for file_name in os.listdir(folder_path) if
file_name.endswith('.pdf')])
print(f"Number of items in folder: {num_items_in_folder}")
for file_name in os.listdir(folder_path):
if file_name.endswith('.pdf'):
pdf_file_path = os.path.join(folder_path, file_name)
extracted_text = extract_text_from_pdf(pdf_file_path)
extracted_sections = extract_specific_sections(extracted_text)
table_data = extract_table_data(extracted_text, extracted_sections)
semester = extracted_sections.get("Semester", "Unknown Semester")
semester_data[semester].extend(table_data)
for semester, data in semester_data.items():
insert_extracted_data(data)
for row in data:
subject_code = row['Subject Code']
subject_name = row['Subject Name']
total_marks = row['Total Marks']
credits = fetch_credits(subject_code)
row['Grade Point'] = calculate_grade_point(total_marks, credits)
df_list = []

Dept. of CSE(AI&ML), GSSSIETW Page 21


Internship report

for semester, data in semester_data.items():


usn_name_groups = defaultdict(list)
for row in data:
usn_name_groups[(row['Usn'], row['StudentName'])].append(row)
for (usn, name), group in usn_name_groups.items():
row_dict = {'Semester': semester, 'University Seat Number': usn, 'Student Name':
name}
total_credits = 0
total_grade_points = 0
total_marks = 0
has_fail = any(row.get('Result', '') == 'F' for row in group) # Check if any row has
'Result' as 'F'
for row in group:
subject_code = row['Subject Code']
credits = fetch_credits(subject_code)
grade_points = row['Grade Point']
# Add credit column next to Total Marks
row['Credit'] = credits
# Calculate C*GP and set to 0 if the result is 'F'
if row.get('Result', '') == 'F':
row['C*GP'] = 0
else:
row['C*GP'] = credits * grade_points
total_credits += credits
total_grade_points += row['C*GP'] # Use row['C*GP'] to ensure it is 0 for F
grades
total_marks += row['Total Marks']
sgpa = total_grade_points / total_credits if total_credits != 0 else 0

Dept. of CSE(AI&ML), GSSSIETW Page 22


Internship report

rounded_sgpa = round(sgpa, 2)
insert_sgpa_data(usn, semester, name, rounded_sgpa)

for i, row in enumerate(group):


for key, value in row.items():
if key not in ('Semester', 'Usn', 'StudentName'):
row_dict[f'{key} {i+1}'] = value

row_dict['Total C*GP'] = total_credits * rounded_sgpa


row_dict['Total Credits'] = total_credits
row_dict['SGPA'] = rounded_sgpa
row_dict['Total Marks'] = total_marks
# Calculate percentage and determine result
percentage = rounded_sgpa * 10
if has_fail: # If any "F" grade in individual "Result" columns, assign "Fail"
result = 'Fail'
elif percentage >= 70:
result = 'First Class with Distinction (FCD)'
elif percentage >= 60:
result = 'First Class (FC)'
elif percentage >= 35:
result = 'Second Class (SC)'
else:
result = 'Fail'
row_dict['%'] = f'{percentage:.2f}%'
row_dict['Result'] = result
df_list.append(row_dict)
df = pd.DataFrame(df_list)

Dept. of CSE(AI&ML), GSSSIETW Page 23


Internship report

# Drop all columns named 'AnnouncedDate'


columns_to_drop = [col for col in df.columns if 'AnnouncedDate' in col]
df.drop(columns=columns_to_drop, inplace=True)
# Rename columns ending with numbers
rename_dict = {}
for column in df.columns:
# Check if column name ends with a number
if column[-1].isdigit():
# Extract the part before the number
original_column_name = column.rsplit(' ', 1)[0]
# Add renamed column name to the dictionary
rename_dict[column] = original_column_name
# Rename columns in the DataFrame
df.rename(columns=rename_dict, inplace=True)
# Call the function to swap the desired columns
swap_columns(df, 'Result', 'Grade Point')
swap_columns(df, 'Result', 'Credit')
swap_columns(df, 'Result', 'C*GP')
# Write DataFrame to Excel
excel_file_path = 'C:\\Users\\Fiza Naaz\\Desktop\\Result\\Excel\\5semAIML.xlsx'
with pd.ExcelWriter(excel_file_path, engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='Main', index=False)
# Bold header row and add borders (assuming you have defined the worksheet object
somewhere)
worksheet = writer.sheets['Main']
header_font = Font(bold=True)
for cell in worksheet[1]:
cell.font = header_font

Dept. of CSE(AI&ML), GSSSIETW Page 24


Internship report

border = Border(left=Side(style='thin'),
right=Side(style='thin'),
top=Side(style='thin'),
bottom=Side(style='thin'))
for row in worksheet.iter_rows(min_row=2, max_row=worksheet.max_row,
min_col=1, max_col=worksheet.max_column):
for cell in row:
cell.border = border
num_rows_excel_sheet = len(df)
print(f"Number of rows in Excel sheet: {num_rows_excel_sheet}")
if num_items_in_folder == num_rows_excel_sheet:
df_sorted = df.sort_values(by='SGPA', ascending=False)
top_3_toppers_data = extract_top_3_toppers(df_sorted)
print(top_3_toppers_data)
insert_topper_data(top_3_toppers_data)
with pd.ExcelWriter(excel_file_path, engine='openpyxl', mode='a') as writer:
if 'Top 3 Toppers' in writer.book.sheetnames:
idx = writer.book.sheetnames.index('Top 3 Toppers')
writer.book.remove(writer.book.worksheets[idx])
top_3_toppers_data.to_excel(writer, sheet_name='Top 3 Toppers', index=False)
print("Excel file created successfully.")
print("Data extraction and storage complete.")
else:
print("Number of items in the folder does not match the number of rows in the Excel
sheet.")

except Exception as e:
print(f"An error occurred: {e}")

Dept. of CSE(AI&ML), GSSSIETW Page 25


Internship report

Figure 4.8.2 Displaying the subject code credits present in the PDF, validation for number of Students
whose PDF was read and top 3 toppers

Figure 4.8.3 Extracted Data from the pdf stored into the database

Dept. of CSE(AI&ML), GSSSIETW Page 26


Internship report

Figure 4.8.4 Databases containing the subject name, subject code, credits, year and sem

Figure 4.8.5 Databases containing the calculated SGPA of the students

Dept. of CSE(AI&ML), GSSSIETW Page 27


Internship report

Figure 4.8.6 Database containing the topper list

Impact to the Project


1. Improved Efficiency:
• Impact: The automated extraction and processing of data significantly improved
the efficiency of handling academic records. Manual data entry and calculations
were replaced with streamlined processes, saving time and reducing errors.
2. Informed Decision-Making:
• Impact: The availability of accurate and comprehensive data enabled better
decision-making by providing insights into student performance, identifying top
performers, and highlighting areas for improvement.
3. Scalability and Flexibility:
• Impact: The system’s design allowed for scalability, making it adaptable to handle
larger volumes of data or additional features in the future. The modular approach
ensured that updates and enhancements could be integrated seamlessly.
4. Enhanced Reporting Capabilities:

Dept. of CSE(AI&ML), GSSSIETW Page 28


Internship report

• Impact: The project’s reporting capabilities facilitated clear communication of


results and performance metrics to stakeholders. The well-formatted Excel reports
and detailed analysis supported effective reporting and review processes.
5. Error Reduction and Data Integrity:
• Impact: By implementing rigorous validation and error handling, the project
ensured the accuracy and integrity of the data. This reduced the likelihood of
errors and provided confidence in the reliability of the system’s outputs.
6. Increased Automation:
• Impact: The automation of data extraction, processing, and reporting reduced
manual intervention and administrative overhead. This allowed for more efficient
use of resources and focused attention on strategic tasks.

Dept. of CSE(AI&ML), GSSSIETW Page 29


Internship report

Chapter-5
Project Report: Result Analysis

5.1 Introduction
The project focuses on developing a comprehensive system for extracting, processing, and
analyzing academic data from PDF files. The system leverages advanced techniques in data
extraction, cleaning, and database management to transform unstructured PDF documents into
structured and actionable insights. The main objectives are to automate the extraction of academic
records, compute relevant metrics such as SGPA, and generate detailed reports for academic
analysis. By integrating data processing with database management and reporting, the project aims
to enhance the efficiency and accuracy of managing academic performance data.

5.2 Objectives of the Project


1. Automate Data Extraction:
Develop a robust solution for extracting academic data from PDF files, including student
information, subject details, marks, and results.
2. Ensure Data Accuracy and Consistency:
Implement data cleaning and validation techniques to ensure that the extracted data is
accurate, consistent, and free from errors.
3. Design a Structured Database:
Create a well-organized database schema to store extracted data, including tables for
student records, subject details, SGPA calculations, and top performers.
4. Calculate Academic Metrics:
Develop algorithms to calculate important academic metrics such as grade points, SGPA,
and overall performance, handling cases of failed subjects appropriately.
5. Generate Detailed Reports:
Produce formatted reports and Excel sheets that provide clear and comprehensive insights
into student performance, including top performers and overall results.

Dept. of CSE(AI&ML), GSSSIETW Page 30


Internship report

6. Enhance Usability and Reporting:


Ensure that the system is user-friendly and that the reports are well-formatted for
presentation and further analysis, meeting the needs of stakeholders.
7. Implement Error Handling and Validation:
Incorporate mechanisms for validating data and handling errors to maintain data integrity
and reliability throughout the process.

5.3 Scope
1. Data Extraction:
The project covers the extraction of academic data from PDF files provided by the
institution. It includes text extraction from multiple PDF formats and cleaning of extracted
data to ensure consistency and accuracy.
2. Database Design and Management:
The project involves designing a normalized database schema to store various types of
data, including student records, subject details, SGPA calculations, and top performers. It
includes the implementation of database tables, indexes, and relationships.
3. Data Processing and Calculation:
The project encompasses the development of algorithms to compute grade points, SGPA,
and other relevant metrics based on the extracted data. It includes handling of special
cases such as failed subjects and integration of these calculations into the database.
4. Reporting and Export:
The project includes generating detailed reports in Excel format, with properly formatted
data and clear presentation. It covers the creation of main reports as well as additional
sheets for top performers.
5. Error Handling and Validation:
The project includes implementing validation checks and error handling mechanisms to
ensure data integrity and address any discrepancies during extraction and processing.

Dept. of CSE(AI&ML), GSSSIETW Page 31


Internship report

Chapter-6
Internship Report: Full Stacked Web Development

6.1 Introduction
The Department of Computer Science & Engineering and Information Science & Engineering
organized a Three Days Workshop during a one-month internship on “FULL STACK WEB
DEVELOPMENT” in association with CSI Mysore Chapter for 5th semester Computer Science
& Engineering and Information Science & Engineering students from 14th, 15th, and 17th
November 2023. Mr. Vinay Kumar Venkataramana, Creative Enthusiast, Building PraCodAI,
and NETTED community, AI and Research entrepreneur-working in the AI domain, was the
resource person for the workshop.

6.2 Project Experience: Social Application Development


During my internship, I had the enriching opportunity to contribute to the development of a social
application aptly named "Dark web" This project served as a practical and comprehensive
exploration of various facets of web development, ranging from client-side vanilla API to the
intricacies of controllers, routes, and the utilization of MySQL for database management.
The overarching goal of the "Dark web" application was to create a dynamic and engaging
platform thatfacilitated user interaction and content sharing. The duration of my internship was
dedicated to mastering the essential components that constitute a robust web application.

6.3 Key Learning Objectives:


➢ Client-Side Vanilla API:
• Understanding and implementing client-side vanilla API involved delving into
the intricacies of asynchronous communication, enabling the application to
seamlessly interact with external servers.
• Example Code:

Dept. of CSE(AI&ML), GSSSIETW Page 32


Internship report

Figure 6.3.1 Fetching user data from external API

➢ Designing Controllers:
• Crafting controllers involved meticulous planning to ensure the smooth flow of
data and userinteractions. Each controller played a crucial role in orchestrating
the application's behavior.
• Example Code:

Figure 6.3.2 Express.js controller

Dept. of CSE(AI&ML), GSSSIETW Page 33


Internship report

➢ Defining Routes:
• Carefully structuring routes to create an intuitive and navigable user experience
was anotherkey focus. This involved mapping out the various path users could
take within the application.
• Example Code:

Figure 6.3.3 Express.js routes

➢ Database Management with MySQL:


• Working with MySQL included tasks such as data modeling, query optimization,
andensuring data consistency. The MySQL database served as the backbone of
" Dark website " storinguser profiles, content, and facilitating seamless retrieval of
information.
• Example Code:

Figure 6.3.4 MySQL table

Dept. of CSE(AI&ML), GSSSIETW Page 34


Internship report

6.4 Accomplishments:
• Integration of Client-Side Components: Ensured a responsive and intuitive user
interface, enhancing the overall user experience.
• Implementation of Efficient Controllers and Well-Defined Routes: Contributed to the
creation of a structured and navigable application.
• Maintenance of a Robust and Secure MySQL Database: Underscored the importance
of data integrity and scalability.

6.5 Potential Enhancements:


Looking ahead, I envision potential enhancements for the "Dark website" application, including
the incorporation of additional features to enrich user interaction and further refinement of
database optimization for scalability.

Dept. of CSE(AI&ML), GSSSIETW Page 35


Internship report

Chapter-7
Learning and Skills Acquired

7.1 Technical Skills


• Client-Side Vanilla API: Understanding and implementing asynchronous communication
to enable seamless interaction with external servers.
• Designing Controllers: Crafting controllers to ensure smooth data flow and user
interactions, playing a crucial role in orchestrating application behavior.
• Defining Routes: Structuring routes to create an intuitive and navigable user experience.
• Database Management with MySQL: Tasks such as data modeling, query optimization,
and ensuring data consistency. MySQL database serves as the backbone for storing user
profiles, content, and facilitating seamless information retrieval.
• Full Stack Development: Proficiency in front-end and back-end technologies such as PHP,
HTML, CSS, JavaScript, and SQL.
• Cloud Deployment: Skills in managing and scaling applications through Google Cloud.
• Agile Methodologies: Practical experience with Agile methodologies, user-centered
design, and iterative development.

7.2 Soft Skills


• Teamwork: Improved through regular collaboration with mentors and users.
• Communication: Enhanced by engaging with team members and presenting findings.
• Presentation: Refined through the demonstration of project work and results.

7.3 Industry-specific Knowledge


• Emerging Technologies: Exposure to new technologies and methodologies in the tech
industry.
• Project Management: Skills in managing projects, problem-solving, and iterative
development processes.

Dept. of CSE(AI&ML), GSSSIETW Page 36


Internship report

7.4 Insights into the Innovation Process


• API Integration: Ensuring efficient and responsive data flow between client and server.
• User Interface Design: Developing a responsive and intuitive user interface.
• Data Integrity and Validation: Implementing validation checks and error handling to ensure
data reliability.
• Performance Optimization: Handling large data volumes efficiently through optimized
coding practices.
• Automation: Reducing manual intervention in data extraction, processing, and reporting,
which increases efficiency and accuracy.
• Scalability: Designing systems that can handle increased traffic and data volumes while
maintaining performance

Dept. of CSE(AI&ML), GSSSIETW Page 37


Internship report

Chapter-8

Summary Of My Internship

During my internship, my project allowed me to develop a comprehensive set of technical skills


and gain practical experience in full stack development. I worked extensively with client-side
Vanilla API to enable asynchronous communication with external servers, designed controllers
for smooth data flow, and defined routes to enhance user navigation for social media webpage.
MySQL database management was used to store the data, involving data modeling and query
optimization to ensure data consistency. Additionally, I gained proficiency in front-end and back-
end technologies, including PHP, HTML, CSS, and SQL, which were crucial for developing a
responsive and efficient user interface. Cloud deployment and Agile methodologies were integral
to managing and scaling the application effectively. This project not only honed my technical
skills but also enhanced my soft skills in teamwork, communication, and presentation, providing
me with a well-rounded experience and deeper insights into the innovation process, including
API integration, data integrity, performance optimization, and scalability.

Dept. of CSE(AI&ML), GSSSIETW Page 38


Internship report

References
[1] Prabhu T Kannan, Srividya K Bansal,"Unimate: A Student Information System",2013
International Conference on Advances in Computing, Communications and Informatics
(ICACCI)-p-1251-1256

[2] Dipin Budhrani1 & Vivek Mulchandani “Student Information Management System”,2018
IJEDR.

[3] Rajnish Tripathii, Raghvendra Singh, Ms. Jaweria Usmani,” Campus Recruitment and
Placement System”, In International Conference on Recent Innovations in Science and
Engineering (ICRISE-18), April, 2018.

[4] “PHP-MySQL database introduction”, GeeksForGeeks,


https://www.geeksforgeeks.org/php-mysql-database-introduction/

[5] “PHP-MySQL database”, W3Schools,


https://www.w3schools.com/php/php_mysql_intro.asp

[6] “PHP-Uploading File”, GeeksForGeeks,


https://www.geeksforgeeks.org/php-uploading-file/amp/

[7] OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model],


https://chat.openai.com/chat.

Dept. of CSE(AI&ML), GSSSIETW Page 39


Internship report

Copyright Certificate

Dept. of CSE(AI&ML), GSSSIETW Page 40

You might also like