0% found this document useful (0 votes)
22 views67 pages

Major Merged

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views67 pages

Major Merged

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 67

UNSUPERVISED MACHINE LEARNING FOR MANAGING

SAFETY ACCIDENTS IN RAILWAY STATIONS


A major project report is submitted in partial fulfillment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGINEERING

Submitted by
N. BHANU PRAKASH REDDY : 21BT5A0511

B.SAI KIRAN : 21BT5A0501

G. YASHWANTH : 21BT5A0507

N. SHASHANK : 21BT5A0521

TIMOTHI : 20BT5A0541

Under the Guidance of Mrs.


K. PRANUSHA
Assistant Professor (CSE)

Visvesvaraya College of Engineering & Technology


Affiliated to JNTUH, Hyderabad certified by NAAC with ‘A’ GRADE M.P.
Patelguda(V), Ibrahimpatnam(M), R.R. District-501510
2023-24
Visvesvaraya College of Engineering &Technology
Affiliated to JNTUH, Hyderabad certified by NAAC with ‘A’ GRADE
M.P.Patelguda(V), Ibrahimpatnam(M), R.R.District-50151O

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE
This is to certify that this project report entitled UNSUPERVISED
MACHINE LEARNING FOR MANAGING SAFETY ACCIDENTS IN
RAILWAY STATIONS submitted by N. Bhanu Prakash Reddy
(21BT5AO511), B. Sai Kiran (21BT5A0501), G. Yashwanth
(21BT5A0501), N. Shashank (21BT5A0521) ,Timothy(20BT1A0541),
in partial fulfillment of the requirements for the degree of Bachelor of
Technology in Computer Science & Engineering to the Jawaharlal
Nehru Technological University, Hyderabad, during the academic
year2023-24, is a bonafide record of work carried out under our guidance and
supervision.

The results embodied in this report have not been submitted to any other
University or Institution for the award of any degree or diploma

Mrs. K. Pranusha Mrs. T. Ramyasree (Assistant Professor) (Head of Department)

(Internal Guide) (Dept. of CSE)

(External Examiner)

DECLARATION
We hereby declare that this submission is our own work and that, to the best
of our knowledge and belief, it contains no material previously published or
written by another person nor material which to a substantial extent has been
accepted for the award of any other degree or diploma of the university or
other institute of higher learning, except where due acknowledgment has been
made in the text.

Signatures:

N. BHANU PRAKASH REDDY : 21BT5A0511


B.SAI KIRAN : 21BT5A0501

G. YASHWANTH : 21BT5A0507

N. SHASHANK : 21BT5A0521

TIMOTHI : 20BT5A0541

ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the project
undertaken during B.Tech. We would like to express our special thanks to
our Principal & Professor (CSE) Dr. D. Ramesh for moral support and college
Management of Visvesvaraya College of Engineering & Technology, Hyderabad for
providing us infrastructure to complete the project.

We owe special debt of gratitude to Dr. S. Selva Kumar, Dean of Academics of


Visvesvaraya College of Engineering & Technology, Hyderabad for his constant support
and guidance throughout the course of our work.

We thank Mrs. T. Ramyasree Head of the Department of Computer Science &


Engineering for his constant support and cooperation.

We owe special debt of gratitude to our Guide Mrs. K. Pranusha, Assistant


Professor (CSE) Visvesvaraya College of Engineering & Technology, Hyderabad for her
guidance throughout the course of our work. It is only her cognizant efforts that our
endeavors have seen light of the day.

We also like to express our gratitude towards our Parents/Guardians & siblings
for their kind co-operation and encouragement which helped us in completion of this
project.

We do not want to miss the opportunity to acknowledge the contribution of all


faculty members of the department for their kind assistance and cooperation during the
development of our project. Last but not the least, we acknowledge our friends for their
contribution in the completion of the project.

N. BHANU PRAKASH REDDY : 21BT5A0511


B.SAI KIRAN : 21BT5A0501

G. YASHWANTH : 21BT5A0507
N. SHASHANK : 21BT5A0521

TIMOTHI : 20BT5A0541
`CONTENT

Table of Contents Page No

Abstract I

List Of Figures II

List Of Output Screens III

1. Introduction 1
2. Literature Survey 4-9
3. Software Requirement Analysis 10-13
3.1 System Design and Development 10
3.1.1 Input Design 10
3.1.2 Output Design 10-11
3.2 Modules 11
3.2.1 Service Provider 11
3.2.2 View and Authorize User 11
3.2.3 Remote User 11
3.3 Feasibility Study 12-13
3.3.1 Request Clarification 12
3.3.2 Feasibility Study 12
3.3.2.1 Operational Feasibility 13
3.3.2.2 Economic Feasibility 13
3.3.2.3 Technical Feasibility 13
3.3.3 Request Approval 13
4. Software Design 14-22
4.1 DFD Diagram 14-15
4.2 UML Diagram 16
4.3 Use case Diagram 17-18
4.4 Class Diagram 19 4.5 Sequence
Diagram 20

4.6 Control Flow Diagram 21-22


5. Software Requirement Specifications 23
5.1 Software Requirement Specifications 23
5.2 Hardware Requirement Specifications 23
6. Coding 24-36
6.1 Sample Coding 24-28
6.2 Python 29
6.2.1 History of Python 29
6.2.2 Features of Python 30-32
6.3 Django 32-36
6.3.1 History of Django 32
6.3.2 Q-Learning 33
6.3.3 Pandas 33-34
6.3.4 SK-Learn 34
6.3.5 Implementation of SK-Learn 34-35
6.3.6 Count Vectorizer 35
6.3.7 Voting Classifier 35-36
7. System Testing 37-45
7.1 Unit Testing 37
7.2 Integration Testing 38
7.2.1 Types of Integration Testing 39
7.2.1.1 Top-Down Integration Testing 39
7.2.1.2 Bottom-Up Integration Testing 39
7.3 User Acceptance Testing 40
7.3.1 Execute User Acceptance Testing 40
7.3.2 Purpose of User Acceptance Testing 41
7.4 Output Testing 41
7.5 Validation Checking 41
7.5.1 Text Field 42
7.5.2 Numeric Field 42
7.5.3 Preparation Test Data 42
7.5.4 Using Live Test Data 42
7.5.5 Using Artificial Test Data 43
7.6 User Training 43
7.7 Maintenance 43
7.8 Testing Strategy 44
7.8.1 System Testing 44
7.8.2 Unit Testing 44-45
8. Output Slides 46-51
9. Conclusion 52-53
10. Future Scope 54
11. Reference 55-56
` ABSTRACT
For both passenger and freight transportation, railroad operations must be dependable,
accessible, maintained, and safe (RAMS). In many urban areas, railway stations risk
and safety accidents represent an essential safety concern for daily operations.
Moreover, the accidents lead to damage to market reputation, including injuries and
anxiety among the people and costs. This stations under pressure caused by higher
demand which consuming infrastructure and raised the safety administration
consideration. To analysing these accidents and utilising the technology such AI
methods to enhance safety, it is suggested to use unsupervised topic modelling for better
understand the contributors to these extreme accidents.

It is conducted to optimise Latent Dirichlet Allocation (LDA) for fatality


accidents in the railway stations from textual data gathered RSSB including 1000
accidents in the Indian railway station. This research describes using the machine
learning topic method for systematic spot accident characteristics to enhance safety and
risk management in the stations and provides advanced analysing. The study evaluates
the efficacy of text by mining from accident history, gaining information, lesson learned
and deeply coherent of the risk caused by assessing fatalities accidents for large and
enduring scale.

This Intelligent Text Analysis presents predictive accuracy for valuable accident
information such as root causes and the hot spots in the railway stations. Further, the big
data analytics’ improvement results in an understanding of the accidents’ nature in ways
not possible if a considerable amount of safety history and not through narrow domain
analysis of the accident reports. This technology renders stand with high accuracy and a
beneficial and extensive new era of AI applications in railway industry safety and other
fields for safety applications.

I
List Of Figures

S.No Fig No Name of the Figure Page No

1 2.1 RAMS 5

2 2.2 Framework of Textual Data 6

3 2.3 Steps for LDA 7

4 2.4 LDA Graph 8

5 2.5 Example of Datasets Inputs 9

6 4.1 DFD Diagram 15

7 4.2 Use case Diagram 18

8 4.3 Class Diagram 19

9 4.4 Sequence Diagram 20

10 4.5 Remote User 21

11 4.6 Service Provider 22

II
List Of Output Slides
S. No Fig No Name of the Figure Page No

01 8.1 Website Interface 51

02 8.2 User Registration Page 51

03 8.3 User Login 52

04 8.4 User Profile Interface 52

05 8.5 Prediction of Safety Accident 53

06 8.6 Service Provider Login Page 53

07 8.7 Service Provider Login Interface 54

08 8.8 Train and Test Datasets 54

09 8.9 Railway Accident Prediction Type Details 55

10 8.10 Railway Accident Prediction Type Ratio Details 55

11 8.11 All Remote Users Data 56

12 8.12 Predicted Datasets 56

III
1.INTRODUCTION
Trains as public transportation have been considered as safer than other means.
However, passengers on trains stations sometimes face many risks because of many
overlapping factors such as station operation, design, and passenger behaviors. Due to
the gradually increasing demand and the heavily congested society and the state of some
station’s layout and complexity in design, there are potential risks during the operation
of the stations.
Furthermore, Passenger, people and public safety is the main concern of the
railway industry and one of the critical parts of the system. European Union put into
practice Reliability, Availability, Maintainability and Safety (RAMS)as a standard in
1999 known as EN 50126. Aiming to prevent railway accidents and ensure a high level
of safety in railway operations. The RAMS analyses concepts lead to minimizing the
risks to acceptable levels and rise safety levels.
However, that have been an urgent issue and still, the reports show several
people are killed every year in the railway station, some accidents lead to injuries or
fatalities. For example, In Japan in 2016, 420 accidents occurred that included being
struck by a train, which resulted in 202 deaths. This including of those 420 accidents,
179 (resulting in 24 fatalities) included falling from a platform and following injury or
death as a consequence of hitting with a train. In the UK, 2019/20, it has been reported
that Most passenger injuries occur from accidents in stations.
Greatest Major injuries are the outcome of slips, trips and falls, of which there
were approximately 200 play significant impact in reducing injuries on station platforms
and provide quality, reliable and safe travel environment for all passengers, worker and
public. Even if some accident does not result in deaths or injuries, such accidents cause
delay, cost, fear and anxiety among the people, interruption in the operations and
damage the industry reputation. Also, to provide or invest any control safety
measurements the stations it is crucial to considering the risks associated with the
railway incidents and risks in the station and identification of many factors related to the
accident by a comprehensive knowledge of the root cause of accidents considering all
the possible technology.

Page | 1
The objective of this project is to analysis a collection case of accidents between
01/01/2000 and 17/04/2020 data to introduce a smart method, which expected to
develop the safety level future, the risk management process, and the way to collect data
in the railway stations. This data been gathered by RSSBS and agreed to be used for the
research purpose. Analyzing an extensive amount of data recorded in a different form
are a challenging job. Nowadays, it is hard to obtain for specific information in such
mix digitization big data in including Web, video, images and other sources, it is
research of a needle in a haystack. Thus, a powerful tool for assistance manages, search
and understand these vast amounts of information is needed indeed. Many pre-
processing techniques and algorithms are required to obtain valuable characteristics
from an enormous amount of safety data in the stations including textual.
The project covers the topic modeling to identify useful characteristics such the
root cause of the accidents and also exploring the factors which are multiple groups of
words or phrases that explain and summarize the content covered by an accident’s
reports reducing time with high accuracy of outcomes. Topic modeling techniques are
robust smart methods that extensively applied in natural language processing to topic
detection and semantic mining from unstructured documents. Consequently, It has been
suggested in this work the LDA model which is one of the best-known probabilistic
unsupervised learning methods that marks the topics implicit in collection of contexts.
Since increasing of applying new technologies and the revolution of data, the
development of technology and utilizing AI in many fields it suggested in this paper a
smart analysis utilizing the topic modeling techniques which can be very useful and
effective to semantic mining and latent discovery context documents and datasets. The
other source of data (Images-videos and numerical) been conducted utilizing AI
approaches which cover supervised learning, so the unstructured textual data is targeted.
Hence, our motivation is to investigate the topic modeling approaches to risks
and safety accident subjects in the stations. This work provides the method of topic
modeling based on LDA with other models for advanced analytics, aiming to make
contributions in the future of smart safety and risk management in the stations. Through
applying the models, we investigate the safety accidents for fatality accident in the
railway.

Page | 2
This project establishes an innovative method in the area to studies how the
textual source of data of railway station accident reports could be efficiently used to
extract the root causes of accidents and establish an analysis between the textual and the
possible cause. Where the full automated process that has ability to get the input of text
and provide outputs not yet ready. Applying this method expected to come overcome
issues such as aid the decision-maker in real time and extract the key information to be
understandable from non-experts, better identify the details of the accident in-depth,
design expert smart safety system and effective usage of the safety history records. A
Such results could support in the analysis of safety and risk management to be
systematic and smarter. Our approach uses state-of-the-art LDA algorithm to capture the
critical texts information of accidents and their causes.

Page | 3
2. LITERATURE SURVEY
Text data is essential nowadays more than before, which is valuable and can be easy to
store in massive amounts to be processed and mining. Using social media is expanding
from the public, and the customer’s reviews and reactions are necessary and powerful
tool for quality services, sustainable tourism and transport and other aspects such as
maintenance. Many points can be raised from such technology of data mining. For
instance, the call data which is valuable and raw for long-term history safety data
contains many inputs such as risk indicators, the time and date of the week or the
seasons. This big data can be classified by different methods, which contain information
on safety hazard, can be used to reduce accidents, and form a proactive analyzing
approach [1]. Safety history is a rich source of knowledge discovery and risk
management analysis.

For instance, investigation reports after accidents by a responsible authority or


expert person, are one of the most popular safety actions that it evaluates and analysis
the accidents causes and the consequence of the risk which be very effective for
analyzing the behavior, hidden risk cause and lessons can be learned. The text data has
many source forms including social media, emails and call recording, such data exist in
a raw and unstructured status which requiring transfer and cleaning as part from topic
modelling to capture the needed information [2]. A framework based on textual sources
data using AI algorithms to build a tag recordation system from safety.

Such method has ability to explore and digest the complete history, it has powerful to
tacking, navigate through time to reveal how specific events have changed and can be
adapted to many kinds of data.

Moreover, to enable automation and digitalization concepts, currently, more texts


are available online and the human do not have ability to read, analysis, explore and
study how connected to each other, such flow of textual. The topic model is f it to
facilitate such issues and annotate large archives of records [3]. The lesson must be
learned to prevent repeat accidents in the stations, and a massive effort happened in the
field for controlling the issues, and recommendations from investigations have been
yielded for high safety level. Usually, many reports and or document been recorded and

Page | 4
were presented initiated from risk assessment until the accident investigations report
from different organizations which is narratives are indispensable. Regardless of
whether or not the text data is structured, many challenges have been expected, such as,
massive data, time, cost, the shortage of experts and the context in the documents which
may has nonstandard terms. These challenges and more can be decreased by the
intelligent use of Deep Learning methods to automate and analysis as a part of the
process [4].

Fig no 2.1 RAMS

Despite the scatter of applyig such method and the differences in terms been
using in the literature, there is a shortage of such applications in the railway industry.
Moreover, the NLP has been implemented to detect defects in the requirements
documents of a railway signaling manufacturer [4]. Also, for translating terms of the
contract into technical specifications in the railway sector.

Additionally, identifying the significant factors contributing to railway accidents,


the taxonomy framework was proposed using (Self-Organizing Maps– SOM), to
classify human, technology, and organization factors in railway accidents [5]. Likewise,

Page | 5
association rules mining has been used to identify potential causal relationships between
factors in railway accidents [6]. In the field of the machine learning and risk, safety
accident, and occupational safety, there are many ML algorithms been used such as
SVM, ANN, extreme learning machine (ELM), and decision tree (DT) [7]. Scholars
have been conducted the topic modelling in, where such method has been proved as one
of the most powerful methods in data mining [8] many fields and applied in various
areas such as software engineering medical and health and linguistic science etc.,
Furthermore, from the literature It has been utilized this technique in for predictions
some areas such as occupational accident, construction and aviation. For Understand
occupational construction incidents in the construction and for construction injury
prediction the method been conducted for analyzing the factors associated with
occupational falls, for steel factory occupational incidents and Cybersecurity and Data
Science [9]. Moreover, from 156 construction safety accidents reports in urban rail
transport in India risks information, relationships and factors been extracting and
identified for safety risk analysis.

Fig no 2.2 Framework of textual data

From the literature it has been seen that, there is no perfect model for all text
classifications issues and also the process of extracting information from text is an

Page | 6
incremental [10]. In the railway sector, a semi-automated method has been examined for
classifying unstructured text-based close call reports which show high accuracy.

Moreover, for future expectations, it has been reported that such technology could be
compulsory for safety management in railway [11]. Applying text analyzing methods in
railway safety expected to solve issues such as time-consuming analysis and incomplete
analysis. Addition ally, some advantages have been proved, automated process, high
productivity with quality and effective system for supervision safety in the railway
system. Moreover, For the prevention of railway accidents, machine learning methods
have been conducted. Many methods used for data mining including machine learning,
information extraction (IE), natural language processing (NLP), and information
retrieval (IR). For instance, to improve the identification of secondary crashes, a text
mining approach (classification) based on machine learning been applied to distinguish
secondary crashes based on crash narratives, which appear satisfactory performance and
has great potential for identifying secondary crashes.

Fig no 2.3 Steps for LDA

Such methods are powerful for railway safety, which aid decision-maker,
investigate the causes of the accident, the relevant factors, and their correlations. It has

Page | 7
been proved that text mining has several areas of future work development and
advances for safety engineering railway. Text mining with probabilistic modelling and
k-means clustering is helpful for the knowledge of causes factors to rail accidents. From
that application analysis for reports about major railroad accidents in the United States
and the Transportation Safety Board of Canada, the study has been designating out that
the factors of lane defects, wheel defects, level crossing accidents and switching
accidents can lead to the many of recurring accidents [12]. Text mining is used to
understand the characteristics of rail accidents and enhance safety engineers, and more
to provide a worth amount of information with more detail. An accident reports data for
11 years in the U.S. are analyzed by the combination of text analysis with ensemble
methods has been used to better understand the contributors and characteristics of these
accidents, yet and more research is needed. Also, from the U.S, railroad equipment
accidents report are used to identify themes using a comparison text mining methods
(Latent Semantic Analysis (LSA)and Latent Dirichlet Allocation (LDA)) [13].

Fig no 2.4 LDA Graph

Additionally, to identify the main factors associated with injury severity, data
mining methods such as an ordered probit model, association rules, and classification
and regression tree (CART) algorithms have been conducted. Using the U.S accidents
highway railroad grade crossings database for the period 2007–2013, where Some
factors have been discussed such the train speed, age, gender and the time [14].In recent

Page | 8
years, the revolution of big data is opportunities in the railway industry, and that is
opening up for safety analysis depends on data, the approach to proactively identify
high-risk scenarios been recommended such as applying the Natural Language
Processing (NLP) analysis, natural language processing has been applied for extraction
and analysis of risk factors from accident reports [12]. In the context of deep learning,
Data From 2001 to 2016 rail accidents reports in the U.S. examined to extract the
relationships between railroad accidents’ causes and their correspondent descriptions.
Thus, for automatic understanding of domain specific texts and analyze railway accident
narratives, deep learning has been conducted, which bestowed an accurately classify
accident causes, notice important differences in accident reporting and beneficial to
safety engineers [13]. Also, text mining conducted to diagnose and predict failures of
switches [14]. For highspeed railways, fault diagnosis of vehicle on board equipment,
the prior LDA model was introduced for fault feature extraction and for fault feature
extraction the Bayesian network (BN) is also used. For automatic classification of
passenger complaints text and eigenvalue extraction, the term frequency-inverse
document frequency algorithm been used with Naive Bayesian classifier [15].

Fig no 2.5 Example of dataset inputs

Page | 9
Page | 10
3.SOFTWARE REQUIREMENT ANALYSIS
3.1 System Design and Development
3.1.1 Input Design
Input Design plays a vital role in the life cycle of software development, it requires very
careful attention of developers. The input design is to feed data to the application as
accurate as possible. So, inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts,
the input forms or screens are designed to provide to have a validation control over the
input limit, range and other related validations. This system has input screens in almost
all the modules. Error messages are developed to alert the user whenever he commits
some mistakes and guides him in the right way so that invalid entries are not made. Let
us see deeply about this under module design.

Input design is the process of converting the user created input into a
computerbased format. The goal of the input design is to make the data entry logical and
free from errors. The error is in the input are controlled by the input design. The
application has been developed in user-friendly manner. The forms have been designed
in such a way during the processing the cursor is placed in the position where must be
entered. The user is also provided with in an option to select an appropriate input from
various alternatives related to the field in certain cases.

Validations are required for each data entered. Whenever a user enters an
erroneous data, error message is displayed and the user can move on to the subsequent
pages after completing all the entries in the current page.

3.1.2 Output Design


The Output from the computer is required to mainly create an efficient method
of communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients. The output of VPN is the
system which allows the project leader to manage his clients in terms of creating new
clients and assigning new projects to them, maintaining a record of the project validity
and providing folder level access to each client on the user side depending on the

Page | 11
projects allotted to him. After completion of a project, a new project may be assigned to
the client.

User authentication procedures are maintained at the initial stages itself. A new user may
be created by the administrator himself or a user can himself register as a new user but
the task of assigning projects and validating a new user rests with the administrator only.

The application starts running when it is executed for the first time. The server has to be
started and then the internet explorer in used as the browser. The project will run on the
local area network so the server machine will serve as the administrator while the other
connected systems can act as the clients. The developed system is highly user friendly
and can be easily understood by anyone using it even for the first time.

3.2 Modules
3.2.1 Service Provider

In this module, the Service Provider has to login by using valid user name and
password. After login successful he can do some operations such as Train & Test
Railway
Data Sets, View Trained and Tested Railway Data Sets Accuracy in Bar Chart, View
Railway Data Sets Trained and Tested Accuracy Results, View Prediction Of Railway
Accident Type, View Railway Accident Type Ratio, Download Predicted Data Sets,
View Railway Accident Type Ratio Results, View All Remote Users.

3.2.2 View and Authorize Users

In this module, the admin can view the list of users who all registered. In this, the admin
can view the user’s details such as, user name, email, address and admin authorizes the
users.

3.2.3 Remote User

In this module, there are n numbers of users are present. User should register before
doing any operations. Once user registers, their details will be stored to the database.
After registration successful, he has to login by using authorized user name and

Page | 12
password. Once Login is successful user will do some operations like REGISTER AND
LOGIN, PREDICT RAILWAY ACCIDENT TYPE, VIEW YOUR PROFILE.

3.3 Feasibility Study

• Preliminary Investigation

The first and foremost strategy for development of a project starts from the thought of
designing a mail enabled platform for a small firm in which it is easy and convenient of
sending and receiving messages, there is a search engine ,address book and also
including some entertaining games. When it is approved by the organization and our
project guide the first activity, ie. preliminary investigation begins. The activity has
three parts:

• Request Clarification

• Feasibility Study

• Request Approval

3.3.1 Request Clarification

After the approval of the request to the organization and project guide, with an
investigation being considered, the project request must be examined to determine
precisely what the system requires.

Here our project is basically meant for users within the company whose systems can be
interconnected by the Local Area Network(LAN). In today’s busy schedule man need
everything should be provided in a readymade manner. So taking into consideration of
the vastly use of the net in day to day life, the corresponding development of the portal
came into existence.

3.3.2 Feasibility Analysis

An important outcome of preliminary investigation is the determination that the system


request is feasible. This is possible only if it is feasible within limited resource and time.
The different feasibilities that have to be analyzed are

Page | 13
• Operational Feasibility
• Economic Feasibility
• Technical Feasibility

3.3.2.1 Operational Feasibility


Operational Feasibility deals with the study of prospects of the system to be developed.
This system operationally eliminates all the tensions of the admin and helps him in
effectively tracking the project progress. This kind of automation will surely reduce the
time and energy, which previously consumed in manual work. Based on the study, the
system is proved to be operationally feasible.

3.3.2.2 Economic Feasibility

Economic Feasibility or Cost-benefit is an assessment of the economic justification for a


computer-based project. As hardware was installed from the beginning & for lots of
purposes thus the cost on project of hardware is low. Since the system is a network
based, any number of employees connected to the LAN within that organization can use
this tool from at any time. The Virtual Private Network is to be developed using the
existing resources of the organization. So, the project is economically feasible.

3.3.2.3 Technical Feasibility


According to Roger S. Pressman, Technical Feasibility is the assessment of the technical
resources of the organization. The organization needs IBM compatible machines with a
graphical web browser connected to the Internet and Intranet. The system is developed
for platform independent environment. Java Server Pages, JavaScript, HTML, SQL
server and WebLogic Server are used to develop the system. The technical feasibility
has been carried out. The system is technically feasible for development and can be
developed with the existing facility. 3.3.3 Request Approval

Not all request projects are desirable or feasible. Some organization receives so many
project requests from client users that only few of them are pursued. However, those
projects that are both feasible and desirable should be put into schedule. After a project
request is approved, it cost, priority, completion time and personnel requirement is
estimated and used to determine where to add it to any project list. Truly speaking, the
approval of those above factors, development works can be launched.

Page | 14
4. SOFTWARE DESIGN
4.1 Data-Flow Diagram (DFD)
• The DFD is also called as bubble chart. It is a simple graphical formalism that
can be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.
• The data flow diagram (DFD) is one of the most important modelling tools. It is
used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with the
system and the information flows in the system.
• DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.
• DFD is also known as bubble chart. A DFD may be used to represent a system at
any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.

Page | 15
Fig no 4.1 Data Flow Diagram

Page | 16
4.2 UML Diagrams

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of
objectoriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying,


Visualization, Constructing and documenting the artifacts of software system, as well as
for business modeling and other non-software systems. The UML represents a collection
of best engineering practices that have proven successful in the modeling of large and
complex systems.

The UML is a very important part of developing object oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:

• Provide users a ready-to-use, expressive visual modeling Language so that they


can develop and exchange meaningful models.

• Provide extendibility and specialization mechanisms to extend the core


concepts.

• Be independent of particular programming languages and development process.


• Provide a formal basis for understanding the modeling language.

• Encourage the growth of OO tools market.

• Support higher level development concepts such as

Page | 17
collaborations, frameworks, patterns and components.

• Integrate best practices.

4.3 Use Case Diagram


A use case diagram in the Unified Modelling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their
goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.

Page | 18
Fig no 4.2 Use Case Diagram

4.4 Class Diagram

Page | 19
In software engineering, a class diagram in the Unified Modeling Language (UML) is a
type of static structure diagram that describes the structure of a system by showing the
system's classes, their attributes, operations (or methods), and the relationships among
the classes. It explains which class contains information

Fig no 4.3 Class Diagram

4.5 Sequence Diagram

Page | 20
A sequence diagram in Unified Modeling Language (UML) is a kind
of interaction diagram that shows how processes operate with one
another and in what order. It is a construct of a Message Sequence
Chart. Sequence diagrams are sometimes called event diagrams,
event scenarios, and timing diagrams.

Fig no 4.4 Sequence Diagram

4.6 Control Flow Diagram

Page | 21
A control-flow diagram can consist of a subdivision to show sequential steps, with
ifthen-else conditions, repetition, and/or case conditions. Suitably annotated geometrical
figures are used to represent operations, data, or equipment, and arrows are used to
indicate the sequential flow from one to another.

Fig no 4.5 Remote User

Page | 22
Fig no 4.6 Service Provider

5.SOFTWARE REQUIREMENT SPECIFICATION

Page | 23
A set of programs associated with the operation of a computer is called software.
Software is the part of the computer system, which enables the user to interact with
several physical hardware devices

5.1 Software Requirement Specification


The minimum software requirement specifications for developing this project are as
follows:
• Operating system : Windows 7/10 Ultimate.
• Coding Language : Python. V3.6.2
• Front-End : Python.
• Back-End : Django-ORM
• Designing : Html, CSS, JavaScript.
• Data Base : MySQL (WAMP Server).
5.2 Hardware Requirement Specification
The collection of internal electronic circuits and external physical devices used in
building a computer is called the Hardware. The minimum hardware requirement
specifications for developing this project are as follows:

• Processor - Pentium –IV


• RAM - 4 GB (min)
• Hard Disk - 120 GB
• Key Board - Standard Windows Keyboard
• Mouse - Two or Three Button Mouse
• Monitor - SVGA

6. CODING 6.1

Sample Code

Page | 24
def login(request): if request.method == "POST" and

'submit1' in request.POST:

username = request.POST.get('username')

password = request.POST.get('password')

try:

enter =
ClientRegister_Model.objects.get(username=username,password=password)

request.session["userid"] = enter.id

return redirect('ViewYourProfile')

except: pass

return render(request,'RUser/login.html') def

index(request):

return render(request, 'RUser/index.html')

def Register1(request): if request.method

== "POST":

username = request.POST.get('username')

email = request.POST.get('email') password

= request.POST.get('password') phoneno =

request.POST.get('phoneno') country =

request.POST.get('country') state =

request.POST.get('state') city =

request.POST.get('city') address =

Page | 25
request.POST.get('address') gender =

request.POST.get('gender')

ClientRegister_Model.objects.create(username=username, email=email,

password=password, phoneno=phoneno, country=country, state=state,

city=city,address=address,gender=gender) obj = "Registered

Successfully" return render(request, 'RUser/Register1.html',

{'object':obj})

else:

return render(request,'RUser/Register1.html') def

ViewYourProfile(request): userid = request.session['userid'] obj

= ClientRegister_Model.objects.get(id= userid) return

render(request,'RUser/ViewYourProfile.html',{'object':obj}) def

Predict_Accident_Type(request): if request.method == "POST":

if request.method == "POST":

RID= request.POST.get('RID')

Location= request.POST.get('Location')

Latitude= request.POST.get('Latitude')

Longitude= request.POST.get('Longitude')

Avgpassengersperday= request.POST.get('Avgpassengersperday')

Nooftrainspassing= request.POST.get('Nooftrainspassing')

Nooftrainsstopping= request.POST.get('Nooftrainsstopping')

Noofplatforms= request.POST.get('Noofplatforms')

Nooftracks= request.POST.get('Nooftracks')

Page | 26
Trainhaltingtime= request.POST.get('Trainhaltingtime')

Avgtrainspeed= request.POST.get('Avgtrainspeed')

Averageaccidentspermonth= request.POST.get('Averageaccidentspermonth')

population= request.POST.get('population')

PhysicalEnvironment= request.POST.get('PhysicalEnvironment')

DateTime= request.POST.get('DateTime') admin_found=

request.POST.get('admin_found') df = pd.read_csv('Datasets.csv',

encoding='latin-1') def apply_response(Label):

if (Label == 0):

return 0 # Safety Accident

elif (Label == 1):

return 1 # No Safety Accident

df['results'] = df['Label'].apply(apply_response)

cv = CountVectorizer() X = df['RID'] y=

df['results']

print("Sid")

print(X)

print("Results")

print(y)

X = cv.fit_transform(X) models =

[] RID1 = [RID] vector1 =

cv.transform(RID1).toarray()

Page | 27
predict_text = classifier.predict(vector1)

pred = str(predict_text).replace("[", "")

pred1 = pred.replace("]", "") prediction

= int(pred1) if (prediction == 0):

val = 'Safety Accident' elif (prediction

== 1): val = 'No Safety Accident'

print(val) print(pred1)

accident_type_prediction.objects.create(

RID=RID,

Location=Location,

Latitude=Latitude, Longitude=Longitude,

Avgpassengersperday=Avgpassengersperday,

Nooftrainspassing=Nooftrainspassing,

Nooftrainsstopping=Nooftrainsstopping,

Noofplatforms=Noofplatforms,

Nooftracks=Nooftracks,

Trainhaltingtime=Trainhaltingtime,

Avgtrainspeed=Avgtrainspeed,

Averageaccidentspermonth=Averageaccidentspermonth,

population=population,

PhysicalEnvironment=PhysicalEnvironment, DateTime=DateTime,

admin_found=admin_found, Prediction=val) return render(request,

Page | 28
'RUser/Predict_Accident_Type.html',{'objs': val}) return render(request,

'RUser/Predict_Accident_Type.html')

6.2 PYTHON

Python is a high-level, interpreted, interactive and object-oriented scripting


language. Python is designed to be highly readable. It uses English keywords frequently
where as other languages use punctuation, and it has fewer syntactical constructions
than other languages.

• Python is Interpreted: Python is processed at runtime by the interpreter. You


do not need to compile your program before executing it. This is similar to
PERL and PHP.

• Python is Interactive: You can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.

• Python is Object-Oriented: Python supports Object-Oriented style or


technique of programming that encapsulates code within objects.

• Python is a Beginner's Language: Python is a great language for the


beginnerlevel programmers and supports the development of a wide range of
applications from simple text processing to WWW browsers to games.

6.2.1 History of Python

Page | 29
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the
Netherlands.Python is derived from many other languages, including ABC, Modula-3,
C, C++, Algol-68, Small Talk, and Unix shell and other scripting languages.Python is
copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).Python is now maintained by a core development team at the
institute, although Guido van Rossum still holds a vital role in directing its progress.

6.2.2 Python Features

Python's features include:

o Easy-to-learn: Python has few keywords, simple structure, and a clearly


defined syntax. This allows the student to pick up the language quickly.

o Easy-to-read: Python code is more clearly defined and visible to the eyes. o

Easy-to-maintain: Python's source code is fairly easy-to-maintain.

o A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.

o Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

o Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.

o Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.

Page | 30
o Databases: Python provides interfaces to all major commercial databases.

o GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.

o Scalable: Python provides a better structure and support for large programs than
shell scripting.

Python has a big list of good features:

• It supports functional and structured programming methods as well as OOP.

• It can be used as a scripting language or can be compiled to byte-code for


building large applications.

• It provides very high-level dynamic data types and supports dynamic type
checking.

• IT supports automatic garbage collection.

It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

You can choose the right database for your application. Python Database API supports a
wide range of database servers such as −

• GadFly

• mSQL

• MySQL

• PostgreSQL

Page | 31
• Microsoft SQL Server 2000

• Informix

• Interbase

• Oracle

• Sybase

The DB API provides a minimal standard for working with databases using Python
structures and syntax wherever possible. This API includes the following:

• Importing the API module.

• Acquiring a connection with the database.

• Issuing SQL statements and stored procedures.

• Closing the connection

6.3 Django

Django (/ˈdʒæŋɡoʊ/ JANG-goh; sometimes stylized as django)[6] is a free and


opensource, Python-based web framework that runs on a web server. It follows the
model– template–views (MTV) architectural pattern.[7][8] It is maintained by the
Django Software Foundation (DSF), an independent organization established in the US
as a 501(c)(3) non-profit.

Django's primary goal is to ease the creation of complex, database-driven


websites. The framework emphasizes reusability and "pluggability" of components, less
code, low coupling, rapid development, and the principle of don't repeat yourself.[9]
Python is used throughout, even for settings, files, and data models. Django also
provides an optional administrative create, read, update and delete interface that is

Page | 32
generated dynamically through introspection and configured via admin models. Some
well-known sites that use Django includes Instagram, Mozilla, Disqus, Bitbucket, Next-
door and Clubhouse.

6.3.1 History

Django was created in the autumn of 2003, when the web programmers at the Lawrence
Journal-World newspaper, Adrian Holovaty and Simon Willison, began using Python to
build applications. Jacob Kaplan-Moss was hired early in Django's development shortly
before Simon Willison's internship ended. It was released publicly under a BSD license
in July 2005. The framework was named after guitarist Django Reinhardt. Adrian
Holovaty is a Romani jazz guitar player inspired in part by Reinhardt's music.

In June 2008, it was announced that a newly formed Django Software


Foundation (DSF) would maintain Django in the future.

6.3.2 Q-Learning

Reinforcement Learning is a paradigm of the Learning Process in which a learning agent


learns, over time, to behave optimally in a certain environment by interacting
continuously in the environment. The agent during its course of learning experiences
various situations in the environment it is in. These are called states. The agent while
being in that state may choose from a set of allowable actions which may fetch different
rewards (or penalties). Over time, The learning agent learns to maximize these rewards
to behave optimally at any given state it is in. Q-learning is a basic form of
Reinforcement Learning that uses Q-values (also called action values) to iteratively
improve the behaviour of the learning agent.

6.3.3 Pandas

Pandas is a powerful and open-source Python library. The Pandas library is used for data
manipulation and analysis. Pandas consist of data structures and functions to perform
efficient operations on data.

Page | 33
The Pandas library is generally used for data science, but have you wondered
why? This is because the Pandas library is used in conjunction with other libraries that
are used for data science. It is built on top of the NumPy library which means that a lot
of the structures of NumPy are used or replicated in Pandas. The data produced by
Pandas is often used as input for plotting functions in Matplotlib, statistical analysis in
SciPy, and machine learning algorithms in Scikit-learn.

You must be wondering, Why should you use the Pandas Library. Python’s
Pandas library is the best tool to analyse, clean, and manipulate data.

Here is a list of things that we can do using Pandas.

• Data set cleaning, merging, and joining.


• Easy handling of missing data (represented as NaN) in floating point as well as
non-floating-point data.

• Columns can be inserted and deleted from Data Frame and higher-dimensional
objects.

• Powerful group by functionality for performing split-apply-combine operations


on data sets.

• Data Visualization.

6.3.4 SK-Learn

French research scientist David Cournapeau's scikits.learn is a Google Summer of Code


venture where the scikit-learn project first began. Its name refers to the idea that it's a
modification to SciPy called "SciKit" (SciPy Toolkit), which was independently created
and published. Later, other programmers rewrote the core codebase.

The French Institute for Research in Computer Science and Automation at


Rocquencourt, France, led the work in 2010 under the direction of Alexandre Gramfort,
Gael Varoquaux, Vincent Michel, and Fabian Pedregosa. On February 1st of that year,
the institution issued the project's first official release. In November 2012, scikit-learn

Page | 34
and scikit-image were cited as examples of scikits that were "well-maintained and
popular". One of the most widely used machine learning packages on GitHub is
Python's scikit-learn.

6.3.5 Implementation of SK-learn

Scikit-learn is mainly coded in Python and heavily utilizes the NumPy library for highly
efficient array and linear algebra computations. Some fundamental algorithms are also
built in Cython to enhance the efficiency of this library. Support vector machines,
logistic regression, and linear SVMs are performed using wrappers coded in Cython for
LIBSVM and LIBLINEAR, respectively. Expanding these routines with Python might
not be viable in such circumstances.

Scikit-learn works nicely with numerous other Python packages, including SciPy,
Pandas data frames, NumPy for array vectorization, Matplotlib, seaborn and plotly for
plotting graphs, and many more.

• Benefits of Using Scikit-Learn for Implementing Machine Learning Algorithms

You will discover that scikit-learn is well-documented and straightforward to


understand, regardless of if you are seeking an overview of ML, wish to get up to speed
quickly or seek the most recent ML learning tool. With the help of this high-level
toolkit, you can quickly construct a predictive data analysis model and use it to fit the
collected data. It is adaptable and works well alongside other Python libraries.

6.3.6 Count Vectorizer

In order to use textual data for predictive modeling, the text must be parsed to remove
certain words – this process is called tokenization. These words need to then be encoded
as integers, or floating-point values, for use as inputs in machine learning algorithms.
This process is called feature extraction (or vectorization).

Scikit-learn’s CountVectorizer is used to convert a collection of text documents to a


vector of term/token counts. It also enables the pre-processing of text data prior to

Page | 35
generating the vector representation. This functionality makes it a highly flexible feature
representation module for text.

6.3.7 Voting Classifier

A Voting Classifier is a machine learning model that trains on an ensemble of numerous


models and predicts an output (class) based on their highest probability of chosen class
as the output.

It simply aggregates the findings of each classifier passed into Voting Classifier
and predicts the output class based on the highest majority of voting. The idea is instead
of creating separate dedicated models and finding the accuracy for each them, we create
a single model which trains by these models and predicts output based on their
combined majority of voting for each output class.

• Voting Classifier supports two types of votings.

Hard Voting: In hard voting, the predicted output class is a class with the highest
majority of votes i.e., the class which had the highest probability of being predicted by
each of the classifiers. Suppose three classifiers predicted the output class (A, A, B), so
here the majority predicted A as output. Hence A will be the final prediction.

Soft Voting: In soft voting, the output class is the prediction based on the average of
probability given to that class. Suppose given some input to three models, the prediction
probability for class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So, the average for
class A is 0.4333 and B is 0.3067, the winner is clearly class A because it had the highest
probability averaged by each classifier.

Page | 36
7. SYSTEM TESTING
Testing methodologies
The following are the Testing Methodologies:

o Unit Testing. o Integration


Testing.
o User Acceptance Testing. o
Output Testing.
o Validation Testing.

7.1 Unit Testing

Unit testing focuses verification effort on the smallest unit of Software design that is the
module. Unit testing exercises specific paths in a module’s control structure to ensure
complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit
Testing.

During this testing, each module is tested individually and the module interfaces are
verified for the consistency with design specification. All-important processing path are
tested for the expected results. All error handling paths are also tested.

Page | 37
Unit testing, as principle for testing separately smaller parts of large software systems
dates back to the early days of software engineering. In June 1956, H.D. Benington
presented at US Navy's Symposium on Advanced Programming Methods for Digital
Computers the SAGE project and its specification-based approach where the coding
phase was followed by "parameter testing" to validate component subprograms against
their specification, followed then by an "assembly testing" for parts put together.

In 1964, a similar approach is described for the software of the Mercury project,
where individual units developed by different programmers underwent "unit tests"
before being integrated together. In 1969, testing methodologies appear more structured,
with unit tests, component tests and integration tests with the purpose of validating
individual parts written separately and their progressive assembly into larger blocks.
Some public standards adopted end of the 60's, such as MIL-STD-483 and MIL-STD-
490 contributed further to a wide acceptance of unit testing in large projects.

Unit testing was in those times interactive or automated, using either coded tests or
capture and replay testing tools. In 1989, Kent Beck described a testing framework for
Smalltalk (later called SUnit) in "Simple Smalltalk Testing: With Patterns". In 1997,
Kent Beck and Erich Gamma developed and released JUnit, a unit test framework that
became popular with Java developers. Google embraced automated testing around
2005–2006.

Unit tests can be performed manually or via automated test execution. Automated
tests include benefits such as: running tests often, running tests without staffing cost,
consistent and repeatable testing. Testing is often performed by the programmer who
writes and modifies the code under test. Unit testing may be viewed as part of the
process of writing code.

7.2 Integration Testing

Integration testing addresses the issues associated with the dual problem of verification
and program construction. After the software has been integrated a set of high order
tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design

Page | 38
Integration testing is the process of testing the interface between two software
units or modules. It focuses on determining the correctness of the interface. The purpose
of integration testing is to expose faults in the interaction between integrated units. Once
all the modules have been unit-tested, integration testing is performed.

Integration testing is a software testing technique that focuses on verifying the


interactions and data exchange between different components or modules of a software
application. The goal of integration testing is to identify any problems or bugs that arise
when different components are combined and interact with each other. Integration
testing is typically performed after unit testing and before system testing. It helps to
identify and resolve integration issues early in the development cycle, reducing the risk
of more severe and costly problems later on.

Integration testing can be done by picking module by module. This can be done
so that there should be a proper sequence to be followed. And also, if you don’t want to
miss out on any integration scenarios then you have to follow the proper sequence.
Exposing the defects is the major focus of the integration testing and the time of
interaction between the integrated units.

Advantages:

• In bottom-up testing, no stubs are required.


• A principal advantage of this integration testing is that several disjoint
subsystems can be tested simultaneously.

• It is easy to create the test conditions.


• Best for applications that uses bottom-up design approach.
• It is Easy to observe the test results.

Disadvantages:

• Driver modules must be produced.


• In this testing, the complexity that occurs when the system is made up of a large
number of small subsystems.

Page | 39
• As Far modules have been created, there is no working model can be
represented.

7.2.1. The following are the types of Integration Testing:

7.2.1.1. Top-Down Integration


This method is an incremental approach to the construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning
with the main program module. The module subordinates to the main program module
are incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are
replaced when the test proceeds downwards.

7.2.1.2. Bottom-up Integration

This method begins the construction and testing with the modules at the lowest level in
the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for
stubs is eliminated. The bottom-up integration strategy may be implemented with the
following steps:
• The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
• A driver (i.e.) the control program for testing is written to coordinate test case
input and output.
• The cluster is tested.
• Drivers are removed and clusters are combined moving upward in the program
structure
The bottom-up approaches tests each module individually and then each module is
module is integrated with a main module and tested for functionality.

7.3 User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever

Page | 40
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.

User Acceptance Testing is a testing methodology where clients/end users


participate in product testing to validate the product against their requirements. It is
done at the client’s site on the developer’s site. For industries such as medicine or
aerospace, contractual and regulatory compliance testing, and operational acceptance
tests are also performed as part of user acceptance tests.

7.3.1 Execute UAT

The way to carry out effective User Acceptance Testing involves getting people into
your product’s user acquisition funnel. What are some example questions you could ask
users? What information would be useful, what is relevant and why do you want it
found by other potential customers? You can’t test all possible data points at once so a
lot may need refinement before launching but in theory, testing should give you an idea
that there might just not even exist enough value being tested or the wrong question was
asked. The way to carry out effective User Acceptance Testing has some prerequisites.

A comprehensive knowledge base, in which everything is tested and proven


before release; it needs accurate information about user behavior from the beginning
until end; on every front page, you need appropriate visual aids for testing purposes, just
as people at any software company are expected by other companies to do use web tools
or online services like forums, etc.

One should develop such a database-like system with different levels of detail
that will be useful only if your business grows quickly over time; after development,
there exist lots more possibilities open up when looking at each level’s value since all
users accept not always what they think but usually something better than others does.

7.3.2 What is the purpose of UAT?

The purpose of User Acceptance Testing (UAT) is to identify bugs in software, systems,
and networks that may cause problems for users. UAT ensures that software can handle
real-world tasks and perform to development specifications. Users are allowed to

Page | 41
interact with the software before its official release to see if any features were
overlooked or if any bugs exist.

7.4 Output Testing

After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the
specified format. Asking the users about the format required by them tests the outputs
generated or displayed by the system under consideration. Hence the output format is
considered in 2 ways – one is on screen and another in printed format.

7.5 Validation Checking

Validation checks are performed on the following fields.

7.5.1 Text Field

The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect
entry always flashes and error message.

7.5.2 Numeric Field

The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error message. The individual modules are checked for accuracy and what it
has to perform. Each module is subjected to test run along with sample data. The
individually tested modules are integrated into a single system. Testing involves
executing the real data information is used in the program the existence of any program
defect is inferred from the output. The testing should be planned so that all the
requirements are individually tested.

A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.

7.5.3 Preparation of Test Data

Page | 42
Taking various kinds of test data does the above testing. Preparation of test data plays a
vital role in the system testing. After preparing the test data the system under study is
tested using that test data. While testing the system by using test data errors are again
uncovered and corrected by using above testing steps and corrections are also noted for
future use.

7.5.4 Using Live Test Data

Live test data are those that are actually extracted from organization files. After a system
is partially constructed, programmers or analysts often ask users to key in a set of data
from their normal activities. Then, the systems person uses this data as a way to partially
test the system. In other instances, programmers or analysts extract a set of live data
from the files and have them entered themselves.

It is difficult to obtain live data in sufficient amounts to conduct extensive testing.


And, although it is realistic data that will show how the system will perform for the
typical processing requirement, assuming that the live data entered are in fact typical,
such data generally will not test all combinations or formats that can enter the system.
This bias toward typical values then does not provide a true system test and in fact
ignores the cases most likely to cause system failure.
7.5.5 Using Artificial Test Data

Artificial test data are created solely for test purposes, since they can be generated to test
all combinations of formats and values. In other words, the artificial data, which can
quickly be prepared by a data generating utility program in the information systems
department, make possible the testing of all login and control paths through the
program.

The most effective test programs use artificial test data generated by persons other than
those who wrote the programs. Often, an independent team of testers formulates a
testing plan, using the systems specifications.The package “Virtual Private Network”
has satisfied all the requirements specified as per software requirement specification and
was accepted.

Page | 43
The process of evaluating software during the development process or at the end
of the development process to determine whether it satisfies specified business
requirements. Validation Testing ensures that the product actually meets the client's
needs. It can also be defined as to demonstrate that the product fulfills its intended use
when deployed on appropriate environment.

7.6 User Training

Whenever a new system is developed, user training is required to educate them about
the working of the system so that it can be put to efficient use by those for whom the
system has been primarily designed. For this purpose, the normal working of the project
was demonstrated to the prospective users. Its working is easily understandable and
since the expected users are people who have good knowledge of computers, the use of
this system is very easy.

7.7 Maintenance

This covers a wide range of activities including correcting code and design errors. To
reduce the need for maintenance in the long run, we have more accurately defined the
user’s requirements during the process of system development. Depending on the
requirements, this system has been developed to satisfy the needs to the largest possible
extent. With development in technology, it may be possible to add many more features
based on the requirements in future. The coding and designing are simple and easy to
understand which will make maintenance easier.

7.8 Testing Strategy

A strategy for system testing integrates system test cases and design techniques into a
well-planned series of steps that results in the successful construction of software. The
testing strategy must co-operate test planning, test case design, test execution, and the
resultant data collection and evaluation. A strategy for software testing must
accommodate low-level tests that are necessary to verify that a small source code
segment has been correctly implemented as well as high level tests that validate
major system functions against user requirements.

Page | 44
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.

7.8.1 System Testing

Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that
overall system function performance is achieved. It also tests to find discrepancies
between the system and its original objective, current specifications and system
documentation.

7.8.2 Unit Testing

In unit testing different are modules are tested against the specifications produced during
the design for the modules. Unit testing is essential for verification of the code produced
during the coding phase, and hence the goals to test the internal logic of the modules.
Using the detailed design description as a guide, important Conrail paths are tested to
uncover errors within the boundary of the modules. This testing is carried out during the
programming stage itself. In this type of testing step, each module was found to be
working satisfactorily as regards to the expected output from the module.

In Due Course, latest technology advancements will be taken into


consideration. As part of technical build-up many components of the networking system
will be generic in nature so that future projects can either use or interact with this. The
future holds a lot to offer to the development and refinement of this project.

Page | 45
8. OUTPUT SLIDES

Fig no 8.1 Webpage Interface

Page | 46
Fig no 8.2 User Registration Page

Fig no 8.3 User Login Page

Page | 47
Fig no 8.4 User Profile Interface

Fig no 8.5 Prediction of Safety Accidents

Page | 48
Fig no 8.6 Service Provider Login Page

Fig no 8.7 Service Provider Login Interface

Page | 49
Fig no 8.8 Train & Test Data Sets

Fig no 8.9 Railway Accident Prediction Type Details

Page | 50
Fig no 8.10 Railway Accident Type Ratio Details

Fig no 8.11 All Remote Users Data

Page | 51
Fig no 8.12 Predicted Data Sets

9.CONCLUSION
Topic models have an important role in many fields and in such case of safety and risk
management in the railway stations for texts mining. In Topic modeling, a topic is a list
of words that occur in statistically significant methods. A text can be voice records
investigation reports, or reviews risk documents and so on.

This research displays various cases for the power of unsupervised machine
learning topic modeling in promoting risk management, safety accidents investigation
and restructuring accidents recording and documentation on the industry-based level.
The description of the root causes accident, the suggested model, it has been showing
that the platforms are the hot point in the stations. The outcomes reveal the station’s
accidents to be occurring owing to four main causes: falls, struck by trains, electric
shock. Moreover, the night time and days of the week seems to contact to the risks are
significant.

Page | 52
With increased safety text mining, knowledge is gained on a wide scale and
different periods resulting in greater efficiency RAMS and providing the creation of a
holistic perspective for all stakeholders.

Application of the unsupervised machine learning technique is useful for safety


since, which is solving, exploring hidden patterns and deal with many challenges such
as: • Text data from many perspectives and in unstructured forms
• Power for discovery, dealing with missing values, and spot safety and risk kyes
from data
• Smart labeling, clustering, centroids, sampling, and associated coordinates
• Capture the relationships, causations, more for ranking risks and related
information
• Prioritization risks and measures implementations
• Aid the process of safety review and learning from the long and massive
experience.
• Can be used the scale and weighted as configuration options which can be used
for assessing risks.

Although this paper highlights the innovative of unsupervised machine learning in


accidents classification of railway accidents and root cause analyses, it is a necessity to
focus on expanded research on the huge data topics concerning the diversity of the
station’s locations, size and safety cultures and other factors with further techniques of
unsupervised machine learning algorithms in the future. Finally, this research enhances
safety, but it raises the importance of data in text form and suggests redesigning the way
of gathering data to be more comprehensive.

Page | 53
10. FUTURE SCOPE

Page | 54
11. REFERENCE

1. S. Terabe, T. Kato, H. Yaginuma, N. Kang, and K. Tanaka, ‘‘Risk assessment


model for railway passengers on a crowded platform,’’ Transp. Res. Rec., J.
Transp. Res. Board, vol. 2673, no. 1, pp. 524–531, Jan. 2019, doi:
10.1177/0361198118821925.
2. Annual Health and Safety Report 19/2020, RSSB, London, U.K., 2020.
3. D. M. Blei, ‘‘Probabilistic topic models,’’ Commun. ACM, vol. 55, no. 4,
4. pp. 77–84, Apr. 2012, doi: 10.1145/2133806.2133826.M. Gethers and D.
Poshyvanyk, ‘‘Using relational topic models to capture coupling among classes
in object-oriented software systems,’’ in Proc. IEEE Int. Conf. Softw.
Maintenance, Sep. 2010, pp. 1–10, doi: 10.1109/ICSM.2010.5609687.
5. D. M. Blei, A. Y. Ng, and M. I. Jordan, ‘‘Latent Dirichlet allocation,’’ J. Mach.
Learn. Res., vol. 3, nos. 4–5, pp. 993–1022, Mar. 2003, doi:10.1016/B978-0-
12411519-4.00006-9.

Page | 55
6. H. Alawad, S. Kaewunruen, and M. An, ‘‘A deep learning approach towards
railway safety risk assessment,’’ IEEE Access, vol. 8, pp. 102811–102832,
2020, doi: 10.1109/ACCESS.2020.2997946.
7. H. Alawad, S. Kaewunruen, and M. An, ‘‘Learning from accidents: Machine
learning for safety at railway stations,’’ IEEE Access, vol. 8, pp. 633–648, 2020,
doi: 10.1109/ACCESS.2019.2962072.
8. A. J.-P. Tixier, M. R. Hallowell, B. Rajagopalan, and D. Bowman, ‘‘Automated
content analysis for construction safety: A natural language processing system
to extract precursors and outcomes from unstructured injury reports,’’ Autom.
Construct., vol. 62, pp. 45–56, Feb. 2016, doi:10.1016/j.autcon.2015.11.001.
9. J. Sido and M. Konopik, ‘‘Deep learning for text data on mobile devices,’’ in
Proc. Int. Conf. Appl. Electron., Sep. 2019, pp. 1–4,
doi:10.23919/AE.2019.8867025.
10. A. Serna and S. Gasparovic, ‘‘Transport analysis approach based on bigdata and
text mining analysis from social media,’’ Transp. Res. Proc., vol. 33, pp. 291–
298, Jan. 2018, doi: 10.1016/j.trpro.2018.10.105.
11. P. Hughes, D. Shipp, M. Figueres-Esteban, and C. van Gulijk, ‘‘From free-text
to structured safety management: Introduction of a semiautomated classification
method of railway hazard reports to elements on a bow-tie diagram,’’ Saf. Sci.,
vol. 110, pp. 11–19, Dec. 2018, doi:10.1016/j.ssci.2018.03.011.
12. A. Chanen, ‘‘Deep learning for extracting word-level meaning from safety
report narratives,’’ in Proc. Integr. Commun. Navigat. Surveill. (ICNS), Apr.
2016, pp. 5D2-1–5D2-15, doi: 10.1109/ICNSURV.2016.7486358.
13. A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, and S.
Gnesi,
‘‘Detecting requirements defects with NLP patterns: An industrial experience in
the railway domain,’’ Empirical Softw. Eng., vol. 23, no. 6, pp. 3684–3733, Dec.
2018, doi: 10.1007/s10664-018-9596-7.
14. G. Fantoni, E. Coli, F. Chiarello, R. Apreda, F. Dell’Orletta, and G. Pratelli,
‘‘Text mining tool for translating terms of contract into technical specifications:
Development and application in the railway sector,’’ Comput. Ind., vol. 124, Jan.
2021, Art. no. 103357, doi:10.1016/j.compind.2020.103357.

Page | 56
15. G. Yu, W. Zheng, L. Wang, and Z. Zhang, ‘‘Identification of significant factors
contributing to multi-attribute railway accidents dataset (MARA-D) using SOM
data mining,’’ in Proc. 21st Int. Conf. Intell. Transp. Syst.(ITSC), Nov. 2018, pp.
170–175, doi: 10.1109/ITSC.2018.8569336.

Page | 57

You might also like