B.SC Cs Batchno 20
B.SC Cs Batchno 20
I N T E L L I G E N C E A N D P Y TH O N W E B A P P L I C A TI O N
By
SCHOOL OF COMPUTING
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
JEPPIAAR NAGAR, RAJIV GANDHI SALAI,
MARCH 2021
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
(Established under Section 3 of UGC Act, 1956)
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI– 600119
www.sathyabama.ac.in
DATE:
This system could work with a large number of resumes for first identify
the right categories using different keywords, once matching has been done then
as per the job description, top candidates could be ranked using resume filters.
TABLE OF CONTENTS
ABSTRACT vi
LIST OF FIGURES x
LIST OF ABBREVIATIONS x
1 INTRODUCTION 1
1.1 OVERVIEW OF PROJECT 1
1.3.2 COMPONENTS 4
1.3.3 ROUTING 5
ARCHITECTURE 6
2 LITERATURE SURVEY 7
4.2.2 MODELS 17
4.2.3 VIEWS.PY 18
4.2.4 URLS.PY 19
4.3.2 COMPONENTS 20
4.3.3 SERVICES 21
4.3.4 ROUTING 22
4.4.2 GLOB 25
4.6 MYSQL 27
4.7 ENVIRONMENT 27
4.7.1 SUBLIME TEXT 27
RUNNING PROCESS 31
RUNNING PROCESS 32
6.1 SUMMARY 34
6.2 CONCLUSIONS 34
REFERENCE 35
APPENDIX 36
SOURCE CODE 36
LIST OF FIGURES
1. HR - Human Resources
2. CSS - Cascading Style Sheet
3. SCSS - Syntactically Awesome Style Sheet
4. API - Application programming interface
5. HTML - Hypertext Markup Language
6. DOM - Document Object Model
7. URL - Uniform Resource Locator
8. PDF - Portable Document Format
9. MVT - Model-View-Template
10. AI - Artificial Intelligence
CHAPTER 1
INTRODUCTION
1
Fig1.2: Django Logo
VIEW: View is the user interface — what you see in your browser
when you visit a website. These are represented by HTML/CSS files.
3
Fig1.3: Angular Logo
1.3.1 MODULES
1.3.2 COMPONENTS
4
it as a component, and provides the template and related component-specific
metadata.
1.3.3 ROUTING
The Angular Router Ng Module provides a service that lets you define a
navigation path among the different application states and view hierarchies in
your app. It is modeled on the familiar browser navigation conventions:
• Enter a URL within the address bar and therefore the browser navigates
to a Corresponding page.
• Click links on the page and the browser navigates to a new page.
• Click the browser's back and forward buttons and therefore the browser
navigates backward and forward through the history of pages you've
seen.
The router maps URL-like paths to views rather than pages. When a user
performs an action, like clicking a link, that might load a replacement page
within the browser, the router intercepts the browser's behavior, and shows
or hides view hierarchies.
The router interprets a link URL according to your app's view navigation
rules and data state. You can navigate to new views when the user clicks a
button or selects from a drop box, or in response to another stimulus from any
source. The router logs activity within the browser's history, therefore the
back and forward buttons work also.
5
data, in much an equivalent way that template syntax integrates your views
together with your program data. You can then apply program logic to settle
on which views to point out or to cover, in response to user input and your
own access rules.
The metadata for a component class has a template that defines a view. A
template combines ordinary HTML with Angular directives and binding which
allow Angular to modify the HTML before presenting on the screen.
The metadata for a service class consists of information Angular needs
to make it available to components through dependency injection (DI).
For better understanding you can refer the Fig1.4 Angular Architecture
Diagram given as follows:
6
Fig1.4: Angular Architecture Diagram
CHAPTER 2
LITERATURE SURVEY
7
The matching between the extracted objects from PDF document and the
result of layout analysis is also presented. On the other hand R. Mohemad, et
al in [12] described an approach which identify and recognize the layout and
structure of PDF document automatically together with the text, paragraph
and tabular data. The Jpedal extraction tool identifies all PDF objects and
returns the co-ordinates of rectangles in each page of PDF document. To
identify tables, the hierarchical agglomerative clustering algorithm is
implemented to group the closest tokens by checking the co-ordinate position
of objects.
Fang et al [4] proposed a new method for extracting information from
PDF files. They use a modified version of PdfBox, to extract and parse the text
contained within a PDF file and injects tags into text information to transform
it into semi-structured data to aid searching. The focus was on identifying the
title, author, address, abstract and keywords of each paper by taking
advantage of the text property information from PDFBox. Whereas Hassan, T.
In [8] developed a method to obtain text and images from a PDF file by
analyzing the layout structure. They used PDFBox library to extract object
from PDF file. PDFBox returns data in rectangular box with its coordinate
values. They proposed a bottom-up segmentation algorithm to group
segments into blocks representing logical elements on the page.
Burcu Yildiz, et al in [3] described an approach for extracting table from
PDF file. They use PdftoHtml tool to extract text from PDF file. The authors
used a document-based approach to extract table. The tool returns text
chunks with its property information such as top, left, width, height and font.
The algorithms are reduced to merge text, position the table, insert cells and
content of PDF file. Further a prototype is generated to evaluate the
performance of algorithm. This approach does not handle any language
specific features. Whereas X.Y. Lin, et al in [14] proposed a method by
combining rule based and learning based methods to identify mathematical
expressions in PDF document. This work is divided into different steps in the
first preprocessing step the different types of objects are matched with
mathematical expression elements. In the second step text lines are extracted,
in the third step feature analysis is done to extract character and
8
layout features and then the rule based and support vector methods are
applied to identify formula areas.
CHAPTER 3
3.1 AIM
Building AI based software with the help of MVT pattern which is good
for stable performance, Python is the base language with the help of Django
framework for the backend process and the user interface is build with the
help of angular framework which is also stable while scaling the process.
Once we get enough of data from the current feature with the help of AI
we could improve the searching option to more efficient.
9
CHAPTER 4
10
Fig4.2: Resume Filter Architecture
4.2.1 SETTINGS.PY
Example code
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
11
# Django REST framework
'rest_framework',
# Customers application
'filterResume.apps.FilterresumeConfig',
# CORS
'corsheaders',
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
# CORS
12
'corsheaders.middleware.CorsMiddleware',
'django.middleware.common.CommonMiddleware',
]
CORS_ORIGIN_ALLOW_ALL = False
CORS_ORIGIN_WHITELIST = (
'http://localhost:4200',
)
ROOT_URLCONF = 'miniproject.urls'
Fig4.4: Middleware
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [],
'APP_DIRS': True,
13
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
Fig4.5: Templates
WSGI_APPLICATION = 'miniproject.wsgi.application'
# Database
#https://docs.djangoproject.com/en/3.1/ref/settings/#databases
DATABASES = {
'default': {
14
'ENGINE': 'django.db.backends.mysql',
'NAME': 'Resume',
'USER': 'root',
'PASSWORD': '1234567',
'HOST': '127.0.0.1',
'PORT': '3306',
}
}
# Password validation
# https://docs.djangoproject.com/en/3.1/ref/settings/#auth-password-
validators
Fig4.6: Database
AUTH_PASSWORD_VALIDATORS = [
{
'NAME':
'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
15
'NAME':
'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME':
'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME':
'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
# Internationalization
# https://docs.djangoproject.com/en/3.1/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
16
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.1/howto/static-files/
STATIC_URL = '/static/'
4.2.2 MODELS
Example code
from django.db import models
# Create your models here.
class RegisterDetails(models.Model):
userName = models.CharField(max_length=70, blank=False, default='')
emailId = models.CharField(max_length=50,blank=False, default=1)
password = models.CharField(max_length=20, blank=False, default='')
class CandidateDetails(models.Model):
candidateName = models.CharField(max_length=70, blank=False,
default='')
mobileNumber = models.CharField(max_length=70,blank=False,
default='')
email = models.CharField(max_length=20, blank=False, default='')
resume = models.CharField(max_length=2000, blank=False, default='')
17
Fig4.7: Models
4.2.3 VIEWS.PY
Example code
from django.shortcuts import render
from django.http import HttpResponse
from django.http.response import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from rest_framework.parsers import JSONParser
from rest_framework import status
from filterResume.models import RegisterDetails,CandidateDetails
from filterResume.serializers import AdminSerializer,CandidateSerializer
@csrf_exempt
def register(request):
# if request.method == 'GET':
# filterResume = RegisterDetails.objects.all()
# filterResume_serializer = FilterResumeSerializer(filterResume,
many=True)
# return JsonResponse(filterResume_serializer.data, safe=False)
18
# # In order to serialize objects, we must set 'safe=False'
if request.method == 'POST':
admin_data = JSONParser().parse(request)
admin_serializer = AdminSerializer(data=admin_data)
if admin_serializer.is_valid():
admin_serializer.save()
return JsonResponse(admin_serializer.data,
status=status.HTTP_201_CREATED)
return JsonResponse(admin_serializer.errors,
status=status.HTTP_400_BAD_REQUEST)
4.2.4 URLS.PY
When a user makes a request for a page on your web app, Django
controller takes over to look for the corresponding view via the url.py file, and
then return the HTML response or a 404 not found error, if not found. In
url.py, the most important thing is the "urlpatterns" tuple.
Example code
urlpatterns = [
url(r'^register/$', views.register),
url(r'^candidateregister/$', views.candidateregister),
url(r'^filter/$', views.filter),
].
19
Fig4.8: URLS.PY
4.3.1 MODEL
Example code
20
mobileNumber:string;
email:string;
resume:string;
match:string;
percentage:string;
4.3.2 COMPONENTS
Component = One .html file + One .css or .scss file + One .ts file
1. candidate-list.component.css.
2. candidate-list.component.html.
3. candidate-list.component.spec.ts.
4. candidate-list.component.ts.
4.3.3 SERVICES
Angular services are singleton objects that get instantiated just one
occasion during the lifetime of an application.
They contain methods that maintain data throughout the lifetime of an
application, i.e. data does not get refreshed and is available all the time. The
main objective of a service is to arrange and share business logic, models, or
data and functions with different components of an Angular application.
Example code
@Injectable()
export class UserService {
21
private baseUrl1= "http://127.0.0.1:8000";
constructor(private http: HttpClient) { }
register(user: User) {
console.log(user);
return this.http.post(this.baseUrl1+"/register/", user);
}
passKeywords(Keywords:String):Observable<any>{
console.log("from service",Keywords);
return
this.http.get<CandidateDataList[]>(`${this.baseUrl1}/filter/${Keywords}/`);
}
4.3.4 ROUTING
In a single-page app, you modify what the user sees by showing or hiding
portions of the display that correspond to particular components, instead of
going bent the server to urge a replacement page. As users perform
application tasks, they have to maneuver between the various views that
you've got defined. To handle the navigation from one view to the next, you
use the Angular Router. The Router enables navigation by interpreting a
browser URL as an instruction to vary the view.
Example code
export const AdminLayoutRoutes: Routes = [
{ path:'admin-signup', component: AdminSignupComponent },
{ path:'admin-signin', component: AdminSigninComponent },
{ path:'candidate-list', component: CandidateListComponent },
];
Example code
export class CandidateListComponent implements OnInit {
constructor(private userData:UserService, private userService:
UserService,private router: Router,) { }
keywordForm=new FormGroup({
filterKeyword : new FormControl()
});
loading = false;
submitted = false;
candidateList: any[];
keyValue:any;
Key_value:any;
filterCandidate: any[]
dataSource:any;
filterDataSource:any;
displayedColumns = ["candidateName","mobileNumber", "email",
"resume","percentage"];
ngOnInit() {
this.userData.getCandidateList().subscribe(data => {
this.candidateList = data;
console.log("candidateList");
console.log(this.candidateList);
this.dataSource = new
MatTableDataSource<CandidateDataList>(this.candidateList);
23
});
}
onSubmit() {
this.keyValue =this.keywordForm.controls[
console.log("from KeyValue",this.keyValue);
// stop here if form is invalid
if (this.keywordForm.invalid) {
console.log("Error");
return;
}
this.loading = true;
this.Key_value=this.keyValue.split(
this.userService.passKeywords(this.Key_value)
.subscribe(
data => {
this.submitted = true;
this.filterCandidate = data;
console.log("candidateListAfterGet"
console.log(this.filterCandidate);
this.filterDataSource = new
this.loading = false;
this.router.navigate(['/candidate
},
error => {
console.log(error);
this.loading = false;
24
});
}
}
In this resume filter process, the main method we use to filter the
resume is pypdf2 and glob libraries.
4.4.1 PYPDF2
The pypdf2 package is a pure-python pdf library that you can use for
splitting, merging, cropping, and transforming pages in your pdfs. according to
the pypdf2 website, you can also use pypdf2 to add data, viewing options, and
passwords to the pdfs, too. finally, you can use pypdf2 to extract text and
metadata from your pdfs.
at the time of writing, the pypdf2 package hasn’t had a release since 2016.
however, it is still a solid and useful package that is worth your time to
learn.
25
4.4.2 GLOB
The glob module finds all the path names matching a specified pattern
according to the rules used by the Unix shell, although results are returned in
arbitrary order. No tilde expansion is done, but *, ?, and character ranges
expressed with [ ] will be correctly matched. This is done by using the
os.scandir( ) and fnmatch.fnmatch( ) functions in concert, and not by actually
invoking a subshell. Note that unlikefnmatch.fnmatch(),glob treats filenames
beginning with a dot (.) as special cases.
26
MODEL: Model is going to act as the interface of your data. It is responsible for
maintaining data. It is the logical data structure behind the entire application
and is represented by a database (generally relational databases such as
MySql, Postgres).
VIEW: The View is the user interface — what you see in your browser when
you render a website. It is represented by HTML/CSS/Javascript and Jinja
files.
4.6 MYSQL
MySQL has stand-alone clients that allow users to interact directly with a
MySQL database using SQL, but more often MySQL is used with other
programs to implement applications that need relational database capability.
MySQL is a component of the LAMP web application software stack, which is
an acronym for Linux, Apache, MySQL, Perl/PHP/Python. MySQL is used by
many database-driven web applications, including Drupal, Joomla, phpBB, and
WordPress
27
Fig4.10: MySQL Logo
4.7 ENVIRONMENT
Most of the framework will run through Visual Studio Code so we use VS
Code to run the Front-end Angular framework and for Back-end Sublime Text
is used as a run-time environment.
28
Visual Studio Code is a free source-code editor made by Microsoft for
Windows, Linux and mac OS. Features include support for debugging, syntax
highlighting, intelligent code completion, snippets, code refactoring, and
embedded Git. Users can change the theme, keyboard shortcuts, preferences,
and install extensions that add additional functionality.
CHAPTER 5
RESULTS AND PERFORMANCE ANALYSIS
29
Fig5.1: Registration Page
Fig5.2:Login Page
30
Fig5.3:Candidate List
Step1: Django REST framework works on top of Django and helps us to build
RESTful Web Services flexibly. To install this package, run command: pip
install djangorestframework.
Step2: Create Django project named miniproject with command: django-
admin start project mini project.
Step3: Connect the database in “settings.py” with your MySQL username and
password.
Step4: Run following commands to create new Django App named
filterResume:
python manage.py startapp filterResume.
31
Step5: After creating model, Run initial migration for data model.
Step6: Run the following Python script to apply the generated migration:
python manage.py migrate filterResume.
32
Step2: Import the extracted project into our Visual Studio Code. (File →
Add Folder to Workspace → Import from the local)
Step3: Click “Terminal → New terminal”.
Step4: Change the baseURL port w.r.t Back-end server port.
Step5: In terminal, first we need to install Node Package Manager(npm) using
the command “npm install”.
Step6: Then run the project using the command “ng serve”.
Step7: Check the output in the browser with the help of URL, for eg.,
“http://localhost:4001”.
1. Platform independent.
2. 24/7 accessible services.
3. Ease to use.
33
4. Customizable.
5. Can handle large data.
6. More productive and speedy.
7. Cost-effective.
8. Flexible.
CHAPTER 6
SUMMARY AND CONCLUSION
6.1 SUMMARY
The Resume Filter Web Application is made flexible and versatile using
the Django framework and Angular framework. The application can handle
large data at a time without any delay with the influence of Django. Angular
allows the user to experience the multiple outcomes of the User Interface.
Data has been stored in MySQL DB. Here the main method we use to filter the
resume is pypdf2 and glob libraries. The pypdf2 package is a pure-python pdf
library that you can use for splitting, merging, cropping, and transforming
pages in your pdfs. The glob module finds all the path names matching a
specified pattern according to the rules used by the Unix shell, although
results are returned in arbitrary order. No tilde expansion is done, but *,?, and
character ranges expressed with [] will be correctly matched. This complete
process is done by using the MVT(Model View Template) pattern which is a
software design pattern. It is a collection of three important components
Model View and Template. Django acts as the main server in which the filter
and match process is going to happen with the given pdf files. This can be
achieved by giving the path where all the resumes are stored and by the help
of pypdf2 and glob libraries. Angular acts as a user interface where the user or
admin can register the candidate details and can get the required resumes by
typing the keyword.
34
6.2 CONCLUSION
REFERENCE
35
[9] A. Manikandan, S. Choudhury and S. Majumder, "Text reader for visually
impaired people: Any reader", 2017 IEEE International Conference on Power,
Control, Signals and Instrumentation Engineering (ICPCSI), 2017.
[10] S. Sabab and M. Ashmafee, "Blind Reader: An intelligent assistant for blind",
2016 19th International Conference on Computer and Information Technology
(ICCIT), 2016.
[11] [S. Sonth and J. Kallimani, "OCR based facilitator for the visually challenged",
2017 International Conference on Electrical, Electronics, Communication, Computer,
and Optimization Techniques (ICEECCOT), 2017.
[12] A. Jain and J. Sharma, "Classification and interpretation of characters in multi-
application OCR system", 2014 International Conference on Data Mining and
Intelligent Computing (ICDMIC), 2014.
[13] P. Thakare, K. Shubham, P. Ankit, R. Ajinkya and S. Om, "Interactive Reader and
Recogniser System", 2018 Second International Conference on Inventive
Communication and Computational Technologies (ICICCT), 2018.
[14] P. Manwatkar and S. Yadav, "Text recognition from images", 2015 International
Conference on Innovations in Information, Embedded and Communication Systems
(ICIIECS), 2015.
[15] S. Karmakar, U. Zhu, “Visualizing Text Readability”, 2011 2010 6th
International Conference on Advanced Information Management and Service (IMS),
2010
[16] J. Sodnik, G. Jakus and S. Tomažič, "Enhanced Synthesized Text Reader for
Visually Impaired Users", 2010 Third International Conference on Advances in
Computer-Human Interactions, 2010.
[17] P. Manwatkar and K. Singh, "A technical review on text recognition from
images", 2015 IEEE 9th International Conference on Intelligent Systems and Control
(ISCO), 2015.
APPENDIX
SOURCE CODE
import PyPDF2
36
from django import template
register = template.Library()
import glob
class MainCode:
def FindMatch(self,text,keyword):
if keyword in text:
return 1
else:
return 0
@register.filter(name='StringToList')
def StringToList(self,EnteredKey,arg):
return EnteredKey.split(arg)
def filterResume(EnteredKey):
obj=MainCode()
mypath = "C:\\Users\\Sheela\\Downloads\\Resume_Filter\\pdfDocs"
match=0
arg=','
37
keywordlist=obj.StringToList(EnteredKey,arg)
lenght=len(keywordlist)
#print("first For")
if file.endswith('.pdf'):
count = (fileReader.numPages)
#print("While",count)
count -= 1
pageObj = fileReader.getPage(count)
text = pageObj.extractText()
#print("inner for")
keyword=keywordlist[i]
#print(keyword)
status=obj.FindMatch(text,keyword)
if(status==1):
match=match+1
percent=match/lenght*100
38
if percent>1:
print("start", percent)
CandidateDetails.objects.filter(resume=file).update(match=True,percentage=perc
ent)
#CandidateDetails.objects.filter(resume=file).update(percentage=percent)
print(file,str(percent))
match=0
else:
print("not in format")
return "completed"
39