Major Merged
Major Merged
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGINEERING
Submitted by
N. BHANU PRAKASH REDDY : 21BT5A0511
G. YASHWANTH : 21BT5A0507
N. SHASHANK : 21BT5A0521
TIMOTHI : 20BT5A0541
CERTIFICATE
This is to certify that this project report entitled UNSUPERVISED
MACHINE LEARNING FOR MANAGING SAFETY ACCIDENTS IN
RAILWAY STATIONS submitted by N. Bhanu Prakash Reddy
(21BT5AO511), B. Sai Kiran (21BT5A0501), G. Yashwanth
(21BT5A0501), N. Shashank (21BT5A0521) ,Timothy(20BT1A0541),
in partial fulfillment of the requirements for the degree of Bachelor of
Technology in Computer Science & Engineering to the Jawaharlal
Nehru Technological University, Hyderabad, during the academic
year2023-24, is a bonafide record of work carried out under our guidance and
supervision.
The results embodied in this report have not been submitted to any other
University or Institution for the award of any degree or diploma
(External Examiner)
DECLARATION
We hereby declare that this submission is our own work and that, to the best
of our knowledge and belief, it contains no material previously published or
written by another person nor material which to a substantial extent has been
accepted for the award of any other degree or diploma of the university or
other institute of higher learning, except where due acknowledgment has been
made in the text.
Signatures:
G. YASHWANTH : 21BT5A0507
N. SHASHANK : 21BT5A0521
TIMOTHI : 20BT5A0541
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the project
undertaken during B.Tech. We would like to express our special thanks to
our Principal & Professor (CSE) Dr. D. Ramesh for moral support and college
Management of Visvesvaraya College of Engineering & Technology, Hyderabad for
providing us infrastructure to complete the project.
We also like to express our gratitude towards our Parents/Guardians & siblings
for their kind co-operation and encouragement which helped us in completion of this
project.
G. YASHWANTH : 21BT5A0507
N. SHASHANK : 21BT5A0521
TIMOTHI : 20BT5A0541
`CONTENT
Abstract I
List Of Figures II
1. Introduction 1
2. Literature Survey 4-9
3. Software Requirement Analysis 10-13
3.1 System Design and Development 10
3.1.1 Input Design 10
3.1.2 Output Design 10-11
3.2 Modules 11
3.2.1 Service Provider 11
3.2.2 View and Authorize User 11
3.2.3 Remote User 11
3.3 Feasibility Study 12-13
3.3.1 Request Clarification 12
3.3.2 Feasibility Study 12
3.3.2.1 Operational Feasibility 13
3.3.2.2 Economic Feasibility 13
3.3.2.3 Technical Feasibility 13
3.3.3 Request Approval 13
4. Software Design 14-22
4.1 DFD Diagram 14-15
4.2 UML Diagram 16
4.3 Use case Diagram 17-18
4.4 Class Diagram 19 4.5 Sequence
Diagram 20
This Intelligent Text Analysis presents predictive accuracy for valuable accident
information such as root causes and the hot spots in the railway stations. Further, the big
data analytics’ improvement results in an understanding of the accidents’ nature in ways
not possible if a considerable amount of safety history and not through narrow domain
analysis of the accident reports. This technology renders stand with high accuracy and a
beneficial and extensive new era of AI applications in railway industry safety and other
fields for safety applications.
I
List Of Figures
1 2.1 RAMS 5
II
List Of Output Slides
S. No Fig No Name of the Figure Page No
III
1.INTRODUCTION
Trains as public transportation have been considered as safer than other means.
However, passengers on trains stations sometimes face many risks because of many
overlapping factors such as station operation, design, and passenger behaviors. Due to
the gradually increasing demand and the heavily congested society and the state of some
station’s layout and complexity in design, there are potential risks during the operation
of the stations.
Furthermore, Passenger, people and public safety is the main concern of the
railway industry and one of the critical parts of the system. European Union put into
practice Reliability, Availability, Maintainability and Safety (RAMS)as a standard in
1999 known as EN 50126. Aiming to prevent railway accidents and ensure a high level
of safety in railway operations. The RAMS analyses concepts lead to minimizing the
risks to acceptable levels and rise safety levels.
However, that have been an urgent issue and still, the reports show several
people are killed every year in the railway station, some accidents lead to injuries or
fatalities. For example, In Japan in 2016, 420 accidents occurred that included being
struck by a train, which resulted in 202 deaths. This including of those 420 accidents,
179 (resulting in 24 fatalities) included falling from a platform and following injury or
death as a consequence of hitting with a train. In the UK, 2019/20, it has been reported
that Most passenger injuries occur from accidents in stations.
Greatest Major injuries are the outcome of slips, trips and falls, of which there
were approximately 200 play significant impact in reducing injuries on station platforms
and provide quality, reliable and safe travel environment for all passengers, worker and
public. Even if some accident does not result in deaths or injuries, such accidents cause
delay, cost, fear and anxiety among the people, interruption in the operations and
damage the industry reputation. Also, to provide or invest any control safety
measurements the stations it is crucial to considering the risks associated with the
railway incidents and risks in the station and identification of many factors related to the
accident by a comprehensive knowledge of the root cause of accidents considering all
the possible technology.
Page | 1
The objective of this project is to analysis a collection case of accidents between
01/01/2000 and 17/04/2020 data to introduce a smart method, which expected to
develop the safety level future, the risk management process, and the way to collect data
in the railway stations. This data been gathered by RSSBS and agreed to be used for the
research purpose. Analyzing an extensive amount of data recorded in a different form
are a challenging job. Nowadays, it is hard to obtain for specific information in such
mix digitization big data in including Web, video, images and other sources, it is
research of a needle in a haystack. Thus, a powerful tool for assistance manages, search
and understand these vast amounts of information is needed indeed. Many pre-
processing techniques and algorithms are required to obtain valuable characteristics
from an enormous amount of safety data in the stations including textual.
The project covers the topic modeling to identify useful characteristics such the
root cause of the accidents and also exploring the factors which are multiple groups of
words or phrases that explain and summarize the content covered by an accident’s
reports reducing time with high accuracy of outcomes. Topic modeling techniques are
robust smart methods that extensively applied in natural language processing to topic
detection and semantic mining from unstructured documents. Consequently, It has been
suggested in this work the LDA model which is one of the best-known probabilistic
unsupervised learning methods that marks the topics implicit in collection of contexts.
Since increasing of applying new technologies and the revolution of data, the
development of technology and utilizing AI in many fields it suggested in this paper a
smart analysis utilizing the topic modeling techniques which can be very useful and
effective to semantic mining and latent discovery context documents and datasets. The
other source of data (Images-videos and numerical) been conducted utilizing AI
approaches which cover supervised learning, so the unstructured textual data is targeted.
Hence, our motivation is to investigate the topic modeling approaches to risks
and safety accident subjects in the stations. This work provides the method of topic
modeling based on LDA with other models for advanced analytics, aiming to make
contributions in the future of smart safety and risk management in the stations. Through
applying the models, we investigate the safety accidents for fatality accident in the
railway.
Page | 2
This project establishes an innovative method in the area to studies how the
textual source of data of railway station accident reports could be efficiently used to
extract the root causes of accidents and establish an analysis between the textual and the
possible cause. Where the full automated process that has ability to get the input of text
and provide outputs not yet ready. Applying this method expected to come overcome
issues such as aid the decision-maker in real time and extract the key information to be
understandable from non-experts, better identify the details of the accident in-depth,
design expert smart safety system and effective usage of the safety history records. A
Such results could support in the analysis of safety and risk management to be
systematic and smarter. Our approach uses state-of-the-art LDA algorithm to capture the
critical texts information of accidents and their causes.
Page | 3
2. LITERATURE SURVEY
Text data is essential nowadays more than before, which is valuable and can be easy to
store in massive amounts to be processed and mining. Using social media is expanding
from the public, and the customer’s reviews and reactions are necessary and powerful
tool for quality services, sustainable tourism and transport and other aspects such as
maintenance. Many points can be raised from such technology of data mining. For
instance, the call data which is valuable and raw for long-term history safety data
contains many inputs such as risk indicators, the time and date of the week or the
seasons. This big data can be classified by different methods, which contain information
on safety hazard, can be used to reduce accidents, and form a proactive analyzing
approach [1]. Safety history is a rich source of knowledge discovery and risk
management analysis.
Such method has ability to explore and digest the complete history, it has powerful to
tacking, navigate through time to reveal how specific events have changed and can be
adapted to many kinds of data.
Page | 4
were presented initiated from risk assessment until the accident investigations report
from different organizations which is narratives are indispensable. Regardless of
whether or not the text data is structured, many challenges have been expected, such as,
massive data, time, cost, the shortage of experts and the context in the documents which
may has nonstandard terms. These challenges and more can be decreased by the
intelligent use of Deep Learning methods to automate and analysis as a part of the
process [4].
Despite the scatter of applyig such method and the differences in terms been
using in the literature, there is a shortage of such applications in the railway industry.
Moreover, the NLP has been implemented to detect defects in the requirements
documents of a railway signaling manufacturer [4]. Also, for translating terms of the
contract into technical specifications in the railway sector.
Page | 5
association rules mining has been used to identify potential causal relationships between
factors in railway accidents [6]. In the field of the machine learning and risk, safety
accident, and occupational safety, there are many ML algorithms been used such as
SVM, ANN, extreme learning machine (ELM), and decision tree (DT) [7]. Scholars
have been conducted the topic modelling in, where such method has been proved as one
of the most powerful methods in data mining [8] many fields and applied in various
areas such as software engineering medical and health and linguistic science etc.,
Furthermore, from the literature It has been utilized this technique in for predictions
some areas such as occupational accident, construction and aviation. For Understand
occupational construction incidents in the construction and for construction injury
prediction the method been conducted for analyzing the factors associated with
occupational falls, for steel factory occupational incidents and Cybersecurity and Data
Science [9]. Moreover, from 156 construction safety accidents reports in urban rail
transport in India risks information, relationships and factors been extracting and
identified for safety risk analysis.
From the literature it has been seen that, there is no perfect model for all text
classifications issues and also the process of extracting information from text is an
Page | 6
incremental [10]. In the railway sector, a semi-automated method has been examined for
classifying unstructured text-based close call reports which show high accuracy.
Moreover, for future expectations, it has been reported that such technology could be
compulsory for safety management in railway [11]. Applying text analyzing methods in
railway safety expected to solve issues such as time-consuming analysis and incomplete
analysis. Addition ally, some advantages have been proved, automated process, high
productivity with quality and effective system for supervision safety in the railway
system. Moreover, For the prevention of railway accidents, machine learning methods
have been conducted. Many methods used for data mining including machine learning,
information extraction (IE), natural language processing (NLP), and information
retrieval (IR). For instance, to improve the identification of secondary crashes, a text
mining approach (classification) based on machine learning been applied to distinguish
secondary crashes based on crash narratives, which appear satisfactory performance and
has great potential for identifying secondary crashes.
Such methods are powerful for railway safety, which aid decision-maker,
investigate the causes of the accident, the relevant factors, and their correlations. It has
Page | 7
been proved that text mining has several areas of future work development and
advances for safety engineering railway. Text mining with probabilistic modelling and
k-means clustering is helpful for the knowledge of causes factors to rail accidents. From
that application analysis for reports about major railroad accidents in the United States
and the Transportation Safety Board of Canada, the study has been designating out that
the factors of lane defects, wheel defects, level crossing accidents and switching
accidents can lead to the many of recurring accidents [12]. Text mining is used to
understand the characteristics of rail accidents and enhance safety engineers, and more
to provide a worth amount of information with more detail. An accident reports data for
11 years in the U.S. are analyzed by the combination of text analysis with ensemble
methods has been used to better understand the contributors and characteristics of these
accidents, yet and more research is needed. Also, from the U.S, railroad equipment
accidents report are used to identify themes using a comparison text mining methods
(Latent Semantic Analysis (LSA)and Latent Dirichlet Allocation (LDA)) [13].
Additionally, to identify the main factors associated with injury severity, data
mining methods such as an ordered probit model, association rules, and classification
and regression tree (CART) algorithms have been conducted. Using the U.S accidents
highway railroad grade crossings database for the period 2007–2013, where Some
factors have been discussed such the train speed, age, gender and the time [14].In recent
Page | 8
years, the revolution of big data is opportunities in the railway industry, and that is
opening up for safety analysis depends on data, the approach to proactively identify
high-risk scenarios been recommended such as applying the Natural Language
Processing (NLP) analysis, natural language processing has been applied for extraction
and analysis of risk factors from accident reports [12]. In the context of deep learning,
Data From 2001 to 2016 rail accidents reports in the U.S. examined to extract the
relationships between railroad accidents’ causes and their correspondent descriptions.
Thus, for automatic understanding of domain specific texts and analyze railway accident
narratives, deep learning has been conducted, which bestowed an accurately classify
accident causes, notice important differences in accident reporting and beneficial to
safety engineers [13]. Also, text mining conducted to diagnose and predict failures of
switches [14]. For highspeed railways, fault diagnosis of vehicle on board equipment,
the prior LDA model was introduced for fault feature extraction and for fault feature
extraction the Bayesian network (BN) is also used. For automatic classification of
passenger complaints text and eigenvalue extraction, the term frequency-inverse
document frequency algorithm been used with Naive Bayesian classifier [15].
Page | 9
Page | 10
3.SOFTWARE REQUIREMENT ANALYSIS
3.1 System Design and Development
3.1.1 Input Design
Input Design plays a vital role in the life cycle of software development, it requires very
careful attention of developers. The input design is to feed data to the application as
accurate as possible. So, inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts,
the input forms or screens are designed to provide to have a validation control over the
input limit, range and other related validations. This system has input screens in almost
all the modules. Error messages are developed to alert the user whenever he commits
some mistakes and guides him in the right way so that invalid entries are not made. Let
us see deeply about this under module design.
Input design is the process of converting the user created input into a
computerbased format. The goal of the input design is to make the data entry logical and
free from errors. The error is in the input are controlled by the input design. The
application has been developed in user-friendly manner. The forms have been designed
in such a way during the processing the cursor is placed in the position where must be
entered. The user is also provided with in an option to select an appropriate input from
various alternatives related to the field in certain cases.
Validations are required for each data entered. Whenever a user enters an
erroneous data, error message is displayed and the user can move on to the subsequent
pages after completing all the entries in the current page.
Page | 11
projects allotted to him. After completion of a project, a new project may be assigned to
the client.
User authentication procedures are maintained at the initial stages itself. A new user may
be created by the administrator himself or a user can himself register as a new user but
the task of assigning projects and validating a new user rests with the administrator only.
The application starts running when it is executed for the first time. The server has to be
started and then the internet explorer in used as the browser. The project will run on the
local area network so the server machine will serve as the administrator while the other
connected systems can act as the clients. The developed system is highly user friendly
and can be easily understood by anyone using it even for the first time.
3.2 Modules
3.2.1 Service Provider
In this module, the Service Provider has to login by using valid user name and
password. After login successful he can do some operations such as Train & Test
Railway
Data Sets, View Trained and Tested Railway Data Sets Accuracy in Bar Chart, View
Railway Data Sets Trained and Tested Accuracy Results, View Prediction Of Railway
Accident Type, View Railway Accident Type Ratio, Download Predicted Data Sets,
View Railway Accident Type Ratio Results, View All Remote Users.
In this module, the admin can view the list of users who all registered. In this, the admin
can view the user’s details such as, user name, email, address and admin authorizes the
users.
In this module, there are n numbers of users are present. User should register before
doing any operations. Once user registers, their details will be stored to the database.
After registration successful, he has to login by using authorized user name and
Page | 12
password. Once Login is successful user will do some operations like REGISTER AND
LOGIN, PREDICT RAILWAY ACCIDENT TYPE, VIEW YOUR PROFILE.
• Preliminary Investigation
The first and foremost strategy for development of a project starts from the thought of
designing a mail enabled platform for a small firm in which it is easy and convenient of
sending and receiving messages, there is a search engine ,address book and also
including some entertaining games. When it is approved by the organization and our
project guide the first activity, ie. preliminary investigation begins. The activity has
three parts:
• Request Clarification
• Feasibility Study
• Request Approval
After the approval of the request to the organization and project guide, with an
investigation being considered, the project request must be examined to determine
precisely what the system requires.
Here our project is basically meant for users within the company whose systems can be
interconnected by the Local Area Network(LAN). In today’s busy schedule man need
everything should be provided in a readymade manner. So taking into consideration of
the vastly use of the net in day to day life, the corresponding development of the portal
came into existence.
Page | 13
• Operational Feasibility
• Economic Feasibility
• Technical Feasibility
Not all request projects are desirable or feasible. Some organization receives so many
project requests from client users that only few of them are pursued. However, those
projects that are both feasible and desirable should be put into schedule. After a project
request is approved, it cost, priority, completion time and personnel requirement is
estimated and used to determine where to add it to any project list. Truly speaking, the
approval of those above factors, development works can be launched.
Page | 14
4. SOFTWARE DESIGN
4.1 Data-Flow Diagram (DFD)
• The DFD is also called as bubble chart. It is a simple graphical formalism that
can be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.
• The data flow diagram (DFD) is one of the most important modelling tools. It is
used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with the
system and the information flows in the system.
• DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.
• DFD is also known as bubble chart. A DFD may be used to represent a system at
any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.
Page | 15
Fig no 4.1 Data Flow Diagram
Page | 16
4.2 UML Diagrams
The goal is for UML to become a common language for creating models of
objectoriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML.
The UML is a very important part of developing object oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
Page | 17
collaborations, frameworks, patterns and components.
Page | 18
Fig no 4.2 Use Case Diagram
Page | 19
In software engineering, a class diagram in the Unified Modeling Language (UML) is a
type of static structure diagram that describes the structure of a system by showing the
system's classes, their attributes, operations (or methods), and the relationships among
the classes. It explains which class contains information
Page | 20
A sequence diagram in Unified Modeling Language (UML) is a kind
of interaction diagram that shows how processes operate with one
another and in what order. It is a construct of a Message Sequence
Chart. Sequence diagrams are sometimes called event diagrams,
event scenarios, and timing diagrams.
Page | 21
A control-flow diagram can consist of a subdivision to show sequential steps, with
ifthen-else conditions, repetition, and/or case conditions. Suitably annotated geometrical
figures are used to represent operations, data, or equipment, and arrows are used to
indicate the sequential flow from one to another.
Page | 22
Fig no 4.6 Service Provider
Page | 23
A set of programs associated with the operation of a computer is called software.
Software is the part of the computer system, which enables the user to interact with
several physical hardware devices
6. CODING 6.1
Sample Code
Page | 24
def login(request): if request.method == "POST" and
'submit1' in request.POST:
username = request.POST.get('username')
password = request.POST.get('password')
try:
enter =
ClientRegister_Model.objects.get(username=username,password=password)
request.session["userid"] = enter.id
return redirect('ViewYourProfile')
except: pass
index(request):
== "POST":
username = request.POST.get('username')
= request.POST.get('password') phoneno =
request.POST.get('phoneno') country =
request.POST.get('country') state =
request.POST.get('state') city =
request.POST.get('city') address =
Page | 25
request.POST.get('address') gender =
request.POST.get('gender')
ClientRegister_Model.objects.create(username=username, email=email,
{'object':obj})
else:
render(request,'RUser/ViewYourProfile.html',{'object':obj}) def
if request.method == "POST":
RID= request.POST.get('RID')
Location= request.POST.get('Location')
Latitude= request.POST.get('Latitude')
Longitude= request.POST.get('Longitude')
Avgpassengersperday= request.POST.get('Avgpassengersperday')
Nooftrainspassing= request.POST.get('Nooftrainspassing')
Nooftrainsstopping= request.POST.get('Nooftrainsstopping')
Noofplatforms= request.POST.get('Noofplatforms')
Nooftracks= request.POST.get('Nooftracks')
Page | 26
Trainhaltingtime= request.POST.get('Trainhaltingtime')
Avgtrainspeed= request.POST.get('Avgtrainspeed')
Averageaccidentspermonth= request.POST.get('Averageaccidentspermonth')
population= request.POST.get('population')
PhysicalEnvironment= request.POST.get('PhysicalEnvironment')
request.POST.get('admin_found') df = pd.read_csv('Datasets.csv',
if (Label == 0):
df['results'] = df['Label'].apply(apply_response)
cv = CountVectorizer() X = df['RID'] y=
df['results']
print("Sid")
print(X)
print("Results")
print(y)
X = cv.fit_transform(X) models =
cv.transform(RID1).toarray()
Page | 27
predict_text = classifier.predict(vector1)
print(val) print(pred1)
accident_type_prediction.objects.create(
RID=RID,
Location=Location,
Latitude=Latitude, Longitude=Longitude,
Avgpassengersperday=Avgpassengersperday,
Nooftrainspassing=Nooftrainspassing,
Nooftrainsstopping=Nooftrainsstopping,
Noofplatforms=Noofplatforms,
Nooftracks=Nooftracks,
Trainhaltingtime=Trainhaltingtime,
Avgtrainspeed=Avgtrainspeed,
Averageaccidentspermonth=Averageaccidentspermonth,
population=population,
PhysicalEnvironment=PhysicalEnvironment, DateTime=DateTime,
Page | 28
'RUser/Predict_Accident_Type.html',{'objs': val}) return render(request,
'RUser/Predict_Accident_Type.html')
6.2 PYTHON
• Python is Interactive: You can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.
Page | 29
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the
Netherlands.Python is derived from many other languages, including ABC, Modula-3,
C, C++, Algol-68, Small Talk, and Unix shell and other scripting languages.Python is
copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).Python is now maintained by a core development team at the
institute, although Guido van Rossum still holds a vital role in directing its progress.
o Easy-to-read: Python code is more clearly defined and visible to the eyes. o
o A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
o Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
o Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.
o Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.
Page | 30
o Databases: Python provides interfaces to all major commercial databases.
o GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.
o Scalable: Python provides a better structure and support for large programs than
shell scripting.
• It provides very high-level dynamic data types and supports dynamic type
checking.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
You can choose the right database for your application. Python Database API supports a
wide range of database servers such as −
• GadFly
• mSQL
• MySQL
• PostgreSQL
Page | 31
• Microsoft SQL Server 2000
• Informix
• Interbase
• Oracle
• Sybase
The DB API provides a minimal standard for working with databases using Python
structures and syntax wherever possible. This API includes the following:
6.3 Django
Page | 32
generated dynamically through introspection and configured via admin models. Some
well-known sites that use Django includes Instagram, Mozilla, Disqus, Bitbucket, Next-
door and Clubhouse.
6.3.1 History
Django was created in the autumn of 2003, when the web programmers at the Lawrence
Journal-World newspaper, Adrian Holovaty and Simon Willison, began using Python to
build applications. Jacob Kaplan-Moss was hired early in Django's development shortly
before Simon Willison's internship ended. It was released publicly under a BSD license
in July 2005. The framework was named after guitarist Django Reinhardt. Adrian
Holovaty is a Romani jazz guitar player inspired in part by Reinhardt's music.
6.3.2 Q-Learning
6.3.3 Pandas
Pandas is a powerful and open-source Python library. The Pandas library is used for data
manipulation and analysis. Pandas consist of data structures and functions to perform
efficient operations on data.
Page | 33
The Pandas library is generally used for data science, but have you wondered
why? This is because the Pandas library is used in conjunction with other libraries that
are used for data science. It is built on top of the NumPy library which means that a lot
of the structures of NumPy are used or replicated in Pandas. The data produced by
Pandas is often used as input for plotting functions in Matplotlib, statistical analysis in
SciPy, and machine learning algorithms in Scikit-learn.
You must be wondering, Why should you use the Pandas Library. Python’s
Pandas library is the best tool to analyse, clean, and manipulate data.
• Columns can be inserted and deleted from Data Frame and higher-dimensional
objects.
• Data Visualization.
6.3.4 SK-Learn
Page | 34
and scikit-image were cited as examples of scikits that were "well-maintained and
popular". One of the most widely used machine learning packages on GitHub is
Python's scikit-learn.
Scikit-learn is mainly coded in Python and heavily utilizes the NumPy library for highly
efficient array and linear algebra computations. Some fundamental algorithms are also
built in Cython to enhance the efficiency of this library. Support vector machines,
logistic regression, and linear SVMs are performed using wrappers coded in Cython for
LIBSVM and LIBLINEAR, respectively. Expanding these routines with Python might
not be viable in such circumstances.
Scikit-learn works nicely with numerous other Python packages, including SciPy,
Pandas data frames, NumPy for array vectorization, Matplotlib, seaborn and plotly for
plotting graphs, and many more.
In order to use textual data for predictive modeling, the text must be parsed to remove
certain words – this process is called tokenization. These words need to then be encoded
as integers, or floating-point values, for use as inputs in machine learning algorithms.
This process is called feature extraction (or vectorization).
Page | 35
generating the vector representation. This functionality makes it a highly flexible feature
representation module for text.
It simply aggregates the findings of each classifier passed into Voting Classifier
and predicts the output class based on the highest majority of voting. The idea is instead
of creating separate dedicated models and finding the accuracy for each them, we create
a single model which trains by these models and predicts output based on their
combined majority of voting for each output class.
Hard Voting: In hard voting, the predicted output class is a class with the highest
majority of votes i.e., the class which had the highest probability of being predicted by
each of the classifiers. Suppose three classifiers predicted the output class (A, A, B), so
here the majority predicted A as output. Hence A will be the final prediction.
Soft Voting: In soft voting, the output class is the prediction based on the average of
probability given to that class. Suppose given some input to three models, the prediction
probability for class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So, the average for
class A is 0.4333 and B is 0.3067, the winner is clearly class A because it had the highest
probability averaged by each classifier.
Page | 36
7. SYSTEM TESTING
Testing methodologies
The following are the Testing Methodologies:
Unit testing focuses verification effort on the smallest unit of Software design that is the
module. Unit testing exercises specific paths in a module’s control structure to ensure
complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit
Testing.
During this testing, each module is tested individually and the module interfaces are
verified for the consistency with design specification. All-important processing path are
tested for the expected results. All error handling paths are also tested.
Page | 37
Unit testing, as principle for testing separately smaller parts of large software systems
dates back to the early days of software engineering. In June 1956, H.D. Benington
presented at US Navy's Symposium on Advanced Programming Methods for Digital
Computers the SAGE project and its specification-based approach where the coding
phase was followed by "parameter testing" to validate component subprograms against
their specification, followed then by an "assembly testing" for parts put together.
In 1964, a similar approach is described for the software of the Mercury project,
where individual units developed by different programmers underwent "unit tests"
before being integrated together. In 1969, testing methodologies appear more structured,
with unit tests, component tests and integration tests with the purpose of validating
individual parts written separately and their progressive assembly into larger blocks.
Some public standards adopted end of the 60's, such as MIL-STD-483 and MIL-STD-
490 contributed further to a wide acceptance of unit testing in large projects.
Unit testing was in those times interactive or automated, using either coded tests or
capture and replay testing tools. In 1989, Kent Beck described a testing framework for
Smalltalk (later called SUnit) in "Simple Smalltalk Testing: With Patterns". In 1997,
Kent Beck and Erich Gamma developed and released JUnit, a unit test framework that
became popular with Java developers. Google embraced automated testing around
2005–2006.
Unit tests can be performed manually or via automated test execution. Automated
tests include benefits such as: running tests often, running tests without staffing cost,
consistent and repeatable testing. Testing is often performed by the programmer who
writes and modifies the code under test. Unit testing may be viewed as part of the
process of writing code.
Integration testing addresses the issues associated with the dual problem of verification
and program construction. After the software has been integrated a set of high order
tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design
Page | 38
Integration testing is the process of testing the interface between two software
units or modules. It focuses on determining the correctness of the interface. The purpose
of integration testing is to expose faults in the interaction between integrated units. Once
all the modules have been unit-tested, integration testing is performed.
Integration testing can be done by picking module by module. This can be done
so that there should be a proper sequence to be followed. And also, if you don’t want to
miss out on any integration scenarios then you have to follow the proper sequence.
Exposing the defects is the major focus of the integration testing and the time of
interaction between the integrated units.
Advantages:
Disadvantages:
Page | 39
• As Far modules have been created, there is no working model can be
represented.
This method begins the construction and testing with the modules at the lowest level in
the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for
stubs is eliminated. The bottom-up integration strategy may be implemented with the
following steps:
• The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
• A driver (i.e.) the control program for testing is written to coordinate test case
input and output.
• The cluster is tested.
• Drivers are removed and clusters are combined moving upward in the program
structure
The bottom-up approaches tests each module individually and then each module is
module is integrated with a main module and tested for functionality.
User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever
Page | 40
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.
The way to carry out effective User Acceptance Testing involves getting people into
your product’s user acquisition funnel. What are some example questions you could ask
users? What information would be useful, what is relevant and why do you want it
found by other potential customers? You can’t test all possible data points at once so a
lot may need refinement before launching but in theory, testing should give you an idea
that there might just not even exist enough value being tested or the wrong question was
asked. The way to carry out effective User Acceptance Testing has some prerequisites.
One should develop such a database-like system with different levels of detail
that will be useful only if your business grows quickly over time; after development,
there exist lots more possibilities open up when looking at each level’s value since all
users accept not always what they think but usually something better than others does.
The purpose of User Acceptance Testing (UAT) is to identify bugs in software, systems,
and networks that may cause problems for users. UAT ensures that software can handle
real-world tasks and perform to development specifications. Users are allowed to
Page | 41
interact with the software before its official release to see if any features were
overlooked or if any bugs exist.
After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the
specified format. Asking the users about the format required by them tests the outputs
generated or displayed by the system under consideration. Hence the output format is
considered in 2 ways – one is on screen and another in printed format.
The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect
entry always flashes and error message.
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error message. The individual modules are checked for accuracy and what it
has to perform. Each module is subjected to test run along with sample data. The
individually tested modules are integrated into a single system. Testing involves
executing the real data information is used in the program the existence of any program
defect is inferred from the output. The testing should be planned so that all the
requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.
Page | 42
Taking various kinds of test data does the above testing. Preparation of test data plays a
vital role in the system testing. After preparing the test data the system under study is
tested using that test data. While testing the system by using test data errors are again
uncovered and corrected by using above testing steps and corrections are also noted for
future use.
Live test data are those that are actually extracted from organization files. After a system
is partially constructed, programmers or analysts often ask users to key in a set of data
from their normal activities. Then, the systems person uses this data as a way to partially
test the system. In other instances, programmers or analysts extract a set of live data
from the files and have them entered themselves.
Artificial test data are created solely for test purposes, since they can be generated to test
all combinations of formats and values. In other words, the artificial data, which can
quickly be prepared by a data generating utility program in the information systems
department, make possible the testing of all login and control paths through the
program.
The most effective test programs use artificial test data generated by persons other than
those who wrote the programs. Often, an independent team of testers formulates a
testing plan, using the systems specifications.The package “Virtual Private Network”
has satisfied all the requirements specified as per software requirement specification and
was accepted.
Page | 43
The process of evaluating software during the development process or at the end
of the development process to determine whether it satisfies specified business
requirements. Validation Testing ensures that the product actually meets the client's
needs. It can also be defined as to demonstrate that the product fulfills its intended use
when deployed on appropriate environment.
Whenever a new system is developed, user training is required to educate them about
the working of the system so that it can be put to efficient use by those for whom the
system has been primarily designed. For this purpose, the normal working of the project
was demonstrated to the prospective users. Its working is easily understandable and
since the expected users are people who have good knowledge of computers, the use of
this system is very easy.
7.7 Maintenance
This covers a wide range of activities including correcting code and design errors. To
reduce the need for maintenance in the long run, we have more accurately defined the
user’s requirements during the process of system development. Depending on the
requirements, this system has been developed to satisfy the needs to the largest possible
extent. With development in technology, it may be possible to add many more features
based on the requirements in future. The coding and designing are simple and easy to
understand which will make maintenance easier.
A strategy for system testing integrates system test cases and design techniques into a
well-planned series of steps that results in the successful construction of software. The
testing strategy must co-operate test planning, test case design, test execution, and the
resultant data collection and evaluation. A strategy for software testing must
accommodate low-level tests that are necessary to verify that a small source code
segment has been correctly implemented as well as high level tests that validate
major system functions against user requirements.
Page | 44
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting
anomaly for the software. Thus, a series of testing are performed for the proposed
system before the system is ready for user acceptance testing.
Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that
overall system function performance is achieved. It also tests to find discrepancies
between the system and its original objective, current specifications and system
documentation.
In unit testing different are modules are tested against the specifications produced during
the design for the modules. Unit testing is essential for verification of the code produced
during the coding phase, and hence the goals to test the internal logic of the modules.
Using the detailed design description as a guide, important Conrail paths are tested to
uncover errors within the boundary of the modules. This testing is carried out during the
programming stage itself. In this type of testing step, each module was found to be
working satisfactorily as regards to the expected output from the module.
Page | 45
8. OUTPUT SLIDES
Page | 46
Fig no 8.2 User Registration Page
Page | 47
Fig no 8.4 User Profile Interface
Page | 48
Fig no 8.6 Service Provider Login Page
Page | 49
Fig no 8.8 Train & Test Data Sets
Page | 50
Fig no 8.10 Railway Accident Type Ratio Details
Page | 51
Fig no 8.12 Predicted Data Sets
9.CONCLUSION
Topic models have an important role in many fields and in such case of safety and risk
management in the railway stations for texts mining. In Topic modeling, a topic is a list
of words that occur in statistically significant methods. A text can be voice records
investigation reports, or reviews risk documents and so on.
This research displays various cases for the power of unsupervised machine
learning topic modeling in promoting risk management, safety accidents investigation
and restructuring accidents recording and documentation on the industry-based level.
The description of the root causes accident, the suggested model, it has been showing
that the platforms are the hot point in the stations. The outcomes reveal the station’s
accidents to be occurring owing to four main causes: falls, struck by trains, electric
shock. Moreover, the night time and days of the week seems to contact to the risks are
significant.
Page | 52
With increased safety text mining, knowledge is gained on a wide scale and
different periods resulting in greater efficiency RAMS and providing the creation of a
holistic perspective for all stakeholders.
Page | 53
10. FUTURE SCOPE
Page | 54
11. REFERENCE
Page | 55
6. H. Alawad, S. Kaewunruen, and M. An, ‘‘A deep learning approach towards
railway safety risk assessment,’’ IEEE Access, vol. 8, pp. 102811–102832,
2020, doi: 10.1109/ACCESS.2020.2997946.
7. H. Alawad, S. Kaewunruen, and M. An, ‘‘Learning from accidents: Machine
learning for safety at railway stations,’’ IEEE Access, vol. 8, pp. 633–648, 2020,
doi: 10.1109/ACCESS.2019.2962072.
8. A. J.-P. Tixier, M. R. Hallowell, B. Rajagopalan, and D. Bowman, ‘‘Automated
content analysis for construction safety: A natural language processing system
to extract precursors and outcomes from unstructured injury reports,’’ Autom.
Construct., vol. 62, pp. 45–56, Feb. 2016, doi:10.1016/j.autcon.2015.11.001.
9. J. Sido and M. Konopik, ‘‘Deep learning for text data on mobile devices,’’ in
Proc. Int. Conf. Appl. Electron., Sep. 2019, pp. 1–4,
doi:10.23919/AE.2019.8867025.
10. A. Serna and S. Gasparovic, ‘‘Transport analysis approach based on bigdata and
text mining analysis from social media,’’ Transp. Res. Proc., vol. 33, pp. 291–
298, Jan. 2018, doi: 10.1016/j.trpro.2018.10.105.
11. P. Hughes, D. Shipp, M. Figueres-Esteban, and C. van Gulijk, ‘‘From free-text
to structured safety management: Introduction of a semiautomated classification
method of railway hazard reports to elements on a bow-tie diagram,’’ Saf. Sci.,
vol. 110, pp. 11–19, Dec. 2018, doi:10.1016/j.ssci.2018.03.011.
12. A. Chanen, ‘‘Deep learning for extracting word-level meaning from safety
report narratives,’’ in Proc. Integr. Commun. Navigat. Surveill. (ICNS), Apr.
2016, pp. 5D2-1–5D2-15, doi: 10.1109/ICNSURV.2016.7486358.
13. A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, and S.
Gnesi,
‘‘Detecting requirements defects with NLP patterns: An industrial experience in
the railway domain,’’ Empirical Softw. Eng., vol. 23, no. 6, pp. 3684–3733, Dec.
2018, doi: 10.1007/s10664-018-9596-7.
14. G. Fantoni, E. Coli, F. Chiarello, R. Apreda, F. Dell’Orletta, and G. Pratelli,
‘‘Text mining tool for translating terms of contract into technical specifications:
Development and application in the railway sector,’’ Comput. Ind., vol. 124, Jan.
2021, Art. no. 103357, doi:10.1016/j.compind.2020.103357.
Page | 56
15. G. Yu, W. Zheng, L. Wang, and Z. Zhang, ‘‘Identification of significant factors
contributing to multi-attribute railway accidents dataset (MARA-D) using SOM
data mining,’’ in Proc. 21st Int. Conf. Intell. Transp. Syst.(ITSC), Nov. 2018, pp.
170–175, doi: 10.1109/ITSC.2018.8569336.
Page | 57