0% found this document useful (0 votes)
54 views45 pages

Final Mini Project Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views45 pages

Final Mini Project Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

ERGONOMICS ANALYSIS

USING IMAGE PROCESSING


M.Tech Mini Project Report

submitted by
AMAL NN
M190125EE

Submitted in partial fulfillment for the award of the Degree of


Master of Technology in Electrical Engineering
in Industrial Power and Automation

Under the guidance of


Dr:TK Sunil Kumar

Department of Electrical Engineering


NATIONAL INSTITUTE OF TECHNOLOGY CALICUT
NIT Campus P.O., Calicut - 673601, India
JULY 2020
ACKNOWLEDGEMENT

First and foremost I thank God Almighty. His blessings were with me throughout
my preparation.

I am grateful to Dr. Ashok S, Professor, Department of Electrical Engineering,


NIT Calicut, for providing me the best facilities for the completion and presen-
tation of my project.

I thank Dr.T.K.SUNIL KUMAR, Assistant Professor, Department of Elec-


trical Engineering, for providing necessary information and guidance throughout
my project.

I would also like to thank Dr. RIJIL RAMCHAND, Professor and Head of
the Department of Electrical Engineering, for providing felicitous environment
and support during the entire course of the project.

I thank Mr.RIJO ABRAHAM. M.Tech scholar, Department of Electrical En-


gineering, and Ms.SANDRA C. M.Tech scholar, Department of Electrical Engi-
neering, for providing necessary information and guidance throughout my project.

Finally, yet importantly, I would like to express my heartfelt thanks to my beloved


parents for their blessings, my friends and classmates for their help and wishes
for the successful completion of this seminar.

AMAL NN

2
DECLARATION

I undersigned hereby declare that the project report “ERGONOMICS ANAL-


YSIS USING IMAGE PROCESSING”, submitted for partial fulfillment of
the requirements for the award of degree of Master of Technology in Electrical
Engineering (Industrial Power and Automation) from National Institute of Tech-
nology Calicut is a bonafide work done by me under supervision of project in
charge Dr Ashok S under the guidance of Dr Sunil Kumar. This submission
represents my ideas in my own words and where ideas or words of others have
been included, I have adequately and accurately cited and referenced the original
sources. I also declare that I have adhered to ethics of academic honesty and
integrity and have not misrepresented or fabricated any data or idea or fact or
source in my submission. I understand that any violation of the above will be a
cause for disciplinary action by the institute and/or the University and can also
evoke penal action from the sources which have thus not been properly cited or
from whom proper permission has not been obtained. This report has not been
previously formed the basis for the award of any degree, diploma or similar title
of any other University

Calicut AMAL NN
30-07-2020

3
CERTIFICATE

This is to certify that the mini project report entitled “ERGONOMICS


ANALYSIS USING IMAGE PROCESSING” is a bonafide
record of the mini-project done by AMAL NN (M190125EE) during
Wintersemester 2019-2020, in partial fulfillment of the requirements for the
award of Degree of Master ofTechnology in Electrical Engineering (Industrial
Power and Automation) from National Institute ofTechnology Calicut for the
year 2020

Dr.ASHOK S Dr. RIJIL RAMCHAND


Professor Professor
Faculty In-Charge of Mini Project Head of Department
Dept. of Electrical Engineering Dept. of Electrical Engineering
ABSTRACT

Ergonomic analysis for work posture is carried out for identifying the risk behind
that posture and avoid serious musculoskeletal disorders. In many industries
workers have to do many tasks manually and some of them are doing in awkward
posture. But after a long run they will experience several health issues. To avoid
this usually companies hire a group of ergonomic experts to analyze the work
posture in their industries and to take necessary action for preventing undesirable
after effects. Usually small scale industries do not take this as a serious case due to
following reason: 1. They might be unaware about the workposture assessment, 2.
Currently available methodologies are not affordable for them. Here a significant
improvement must be there to solve the problem.
The main part of the assessment in the measurement of the body joint angle
and are done either by direct method or by using expensive wearable sensors. In
direct method supervision from an expert is necessary. As the sensors are costly
(Kinet) and sensors need to place on the body in correct position. So here in
my project, I am trying to the advantages and developments in the Artificial
Intelligence and Image processing to this field so that the solution will be better.
I am using image processing for body pose estimation which identifies the key
body joints and from that data we can estimate the necessary information for the
first phase of assessment. The algorithm can be tested on a smartphone. This
project describes a novel approach towards the ergonomics.
The entire project consists of three parts: 1. Pose estimation with the help of
Image Processing, 2. Haar cascade based object detection for identification of
relative position of peripheral devices and 3. Fully automated RULA assessment.
Using a completely developed model, user need to take a photograph of a partic-
ular working posture and rest of the work is done by the smartphone itself and
hence this is very suitable for ever small scale industries.
Keywords: Pose estimation, RULA assessment, Convolutional Neural Network,
Caffe architecture, Haar cascade.

5
Contents

1 Inroduction 4
1.1 Ergonomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Computer workstation ergonomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Significance of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Report Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Literature Review 6
2.1 Types of assessment method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Direct Observation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Using wearable sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Using Microsoft Kinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Summary and Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Research Gap objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Work Posture Assessment Tools 12


3.1 RULA Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 Phase 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Further improvement in RULA in case of computer workstation ergonomics . . . . . . . . . . . 16

4 Image Processing 18
4.0.1 Convolutional Neural Networks (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.0.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.0.3 Stride . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.0.4 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.0.5 Features of CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.0.5.1 Convolution formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.0.5.2 Non linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.0.5.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1
4.0.5.4 Fully connected layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Convolutional Architecture for Fast Feature Embedding (CAFFE) . . . . . . . . . . . . . . . . 23
4.1.1 Highlights of Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.1 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.2 Separation of representation and implementation . . . . . . . . . . . . . . . . . 24
4.1.1.3 Test coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.4 Python and MATLAB bindings . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.5 Pre-trained reference models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Binary Large Object (BLOB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 The Recursive Grass-Fire Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 MPII Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Haar Cascade Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.1 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.2 Festure Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Project Work 32
5.1 Part 1: Working Posture Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Part 2: Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Part 3: RULA Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Result and Discussion 34

7 Conclusions 38
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.2 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2
List of Figures

3.1 RULA Table A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14


3.2 RULA Table B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 RULA Table C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Computer Workstation Posture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Features in each layer in CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


4.2 CNNN input size visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 CNN hidden layer connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 CNN Filter visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Convolution of filter (stride=2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Zero padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.7 CNN Deep Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.8 Visualization of Convolution Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.9 Max pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.10 Caffe architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.11 Segmentaion using blob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.12 Blob connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.13 4 connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.14 Haar features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.15 Haar feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.1 Pose estimation result of a random pose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34


6.2 Positive and Negative images for dataset preparation . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3 Custom haar cascade for monitor detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.4 Python GUI for addition information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.5 Caffe model Pose estimation result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.6 Final Rula Result conducted for computer workplace posture . . . . . . . . . . . . . . . . . . . 37

3
Chapter 1

Inroduction

Work posture analysis is very important in almost every industry. Ergonomics researchers commonly conducts
various tests for this purpose. Till now they have to manually record the relevant information (ie. angular
displacement of different body parts) to perform the test. Even some software automated the entire process
but taking relevant information from the posture is still manual (In the manual it says take a picture of the
posture and measure the angles using a protractor) Main aim is to identify disorders in the posture using image
processing techniques and take information to conduct the test.

1.1 Ergonomics
Ergonomics is the process of designing or arranging workplaces, products and systems so that they fit the
people who use them. Lack of ergonomics and improper workplace design will cause a number of work-related
musculoskeletal disorder for the employees and a huge cost for the company and society. In industry the
main aim is to reduce the excessive force and avoid awkward posture and reduce the problem related to
musculoskeletal disorder. Some common body assessment methods are OWAS, RULA, REBA, NIOSH etc.
We can conduct these tests for workplace design in any area like manufacturing industry, construction site and
desktop setup etc. to reduce MSD.
Researchers use different methods for analysing the working posture and try to determine the severity in the
MSD (Musculoskeletal disorder) and assign a test score for a particular posture. For small scale industries
RULA (Rapid upper limb assessment) and REBA (Rapid entire body assessment) assessment is carried out
for the evaluation of work posture. Due to its simplicity and variety of application, these tests acquired its
popularity among the ergonomics investigators of workplace. Workers have to perform various tasks such as
repetitive lifting, differential lifting height etc. or preforming a task in very awkward posture such as kneeling
posture are to be examined under these assessments. First step being the examination of posture and identify
the relevant information according to the worksheet that given in the manual. Usually these small-scale
industries hire an ergonomic expert to conduct the test and make relevant inference out of the test. They
verify upper and lower body posture separately and give some individual score for upper arm, lower arm, neck,
trunk and leg etc. With the help of worksheet gives a final score which tells the danger level of the posture.
There is many software available for conducting the test, but still it need manual supervision while using it
ie. the user manual for one of the software, but in the first step we need to take a picture of a worker in their
working posture and using a protractor measure the angle between the different body parts according to the
test worksheet.

4
1.2 Computer workstation ergonomics
Beside such small-scale industry, lack of care on sitting posture in front of computer for a long period will
also cause severe after effects. Awkward posture may cause eye strain and back pain due to damage in spinal
cord. Here also RULA and REBA works good in the investigation. In this method also a camera is used to
take series of images of test posture and conduct these assessments as explained before. Here also by using the
virtue of Image processing technique, extracting the relevant data for the test can be easy.

1.3 Significance of the project


Virtue of ergonomics assessment is still unreachable for SMALL SCALE INDUSTRIES due to following reason

• They have to meet additional expense in hiring ergonomic experts from outside
• The workers might be unaware about ergonomics

Why existing methods are not suitable for such small scale industries?

• Direct method is not accurate as measurements are taken manually


• Wearing sensors over body is not possible for all king of posture and wearing and proper placement of
sensors is not an easy task
• Even though the Kinet based assessment gives good result, the cost becomes the barrier her

These are the core reason which explains the significance of this project.

1.4 Report Layout


This report provides a detailed review about my project with
chapter 2 contains the literature review, research gap and proposed model.
Chapter 3 explains about the procedure and steps while conducting RULA assessment. Also includes the
recommended body posture for computer work posture.
chapter 4 to chapter 8 it contains explanation for important terminologies that mentioned in this project. In
includes explanation about CNN, CAFFE model, BLOBs, Haar cascade classifier and Pose estimation.
Chapter 9 is a brief explanation about the project work that I have done. Explains about the different parts
of the complete project.
Chapter 10 is the results and discussion section. Test results and final outcome is revealed in this section.
Chapter 12 briefs about the summary of the project, benefits and future escalations that is possible.

5
Chapter 2

Literature Review

Ergonomics researchers use different methods and assessment tools for evaluating work posture in real time.
Different case studies are conducted in different ways. That may be manual, Direct or using specialized
equipment. Manual method consists of some check list that the researcher has to be check according to that
particular posture. There are chance of error while taking measurement and computation. Researcher has to
manually measure the different angle and categorise according to that. Slight deviations in the measurement
will lead to undesirable result. And this method not reliable as this could lead biased interpretation.

2.1 Types of assessment method


2.1.1 Direct Observation Method
This is one of the simplest and traditional method for conducting the work posture assessment. Mainly RULA
and REBA assessments are carried out with the help of guide lines provided including three tables which gives
assessment score. The main part of the assessment is locating the initial scores for different body parts like
neck, trunk, upper and lower arm and wrist etc. In direct method these scores for different body joints are
obtained by manual inspection. A lot of research papers are available in which they conduct direct method for
finding risk associated with a particular work posture.
A study of ergonomics and fatigue during manual process conducted in ABC company in India [1]. The manual
process is recorded with high definition video camera. This video is analyzed and discussions are made with
workers regarding health issues during the process. Two techniques Rapid Entire Body Assessment (REBA)
and Rapid Upper Limb Assessment (RULA) are then used to evaluate postural loading on whole body. Similar
manual process is assessed from other developed countries whose ergonomic conditions are more improved for
the same task. The results are then compared and some conclusions are made. Various tasks are taker like
lifting, shifting and mounting are taken for assessment. For initial posture analysis video camera used for taking
continuous footage of respective tasks and some important information like repetition time and load weights
are also noted. Then they calculate RULA and REBA score and according to the score important suggestions
are made. A comparison is made between India plant and other developed country plant. Developed countries
uses different tools like jib crane with vacuum hook for various tasks like spool lifting, spool shifting and spool
mounting process that reduces the risks associated with it. They have found that compared to developed
countries, in India the work postures for the above-mentioned tasks are showing relatively high risk which is
obvious in the RULA and REBA assessment results. So they suggest to use some additional tools like vacuum
hook cranes for these purposes to reduce the risk.

6
One more example for the case study that conducted at MIDC Wardha (Maharashtra, India) [2]. This is also
similar to the previous case study. Assessment is carried out using worksheet. The RULA method determined
that the majority of the workers were under high risk levels and required immediate change. The REBA
method determined that some of the workers were under lower levels and majority at high risk levels. This
shows the importance of study at small industries in India.
N. A. Ansari and Dr. M. J. Sheikh [2] says small scale industries plays an important role in the Indian
economy. But still the workers are doing almost every tasks performed by manually and hence they experience
some musculoskeletal disorder and injuries in various part of their body. They point out this issue through
their paper. They observed that the lack of knowledge about the importance of work posture assessment can
be the main reason for that. Conducting a work posture assessment and identify the area where improvement
can be carried out by using some better tools for reducing the risk. In their study, 15 workers are chosen
in MIDC Wardha. The workers have average stature 168.34 cm + 2.69 S.D., average age 35.8 years + 3.02
S.D., average weight 63.6 kilograms + 6.66 S.D. and average experience 11.2 years. Snapshots are analyzed
for conducting RULA and REBA assessment. In this study, they found that according to RULA assessment,
around 40 percentage of the workers are at high risk level and needs a Investigate and change immediately,
whereas 47 percentage workers were found at medium risk levels and needs a Investigate further and change
soon. Around 13 percentage of the workers are working in Investigate further. And the REBA assessment
result shows around 53 percentage of the workers were working at high risk levels. It was found that, if the
workers continued to work in the same posture, they suffer from the MSDs related to neck, trunk and wrist in
the near future.
Not only RULA and REBA, OWAS is also an important tool in ergonomics assessment. Marta Gómez- Galán
and José Pérez-Alonso conducted a study in their paper [5] with OWAS. They have found has been applied
mainly in three sectors: industry, health, and agriculture and livestock. It is one of the most widely used and
tested semi-direct methods of MSD evaluation in the world, but needs to be complemented with other indirect
or direct methods. Whenever the OWAS has been used, whether individually or together with other methods,
MSD risks have been detected, this perhaps being an indicator to review the evaluation parameters because
overestimating the risk.
Farshad Soheili-Fard and Amir Rahbar [6] conducted the OWAS and REBA method to conduct ergonomic
investigation of workers in a tea factory at Langroud region, Guilan, Iran. They have conducted both REBA
and OWAS tests and results was quite promising. Around 75 percentage workers in the tea factories are
resulting a REBA score in between 4 to 7. It means that their level exposure to risk is medium and corrective
measurements are necessary. And 12 percentage of workers shows a REBA score in between 2 – 3 and 8 – 10.
Just 5 postures were in 11 to 15 score level. High risk was observed in curling and oxidation unit and corrective
measurements is necessary now.
The OWAS results shows curling and oxidation stage need immediate corrective actions as these postures
allocated 78 percentage of all back related postures and also 34 percentage of used postures by the workers of
tea factories had high and very high-risk level, so modification are necessary. The average of energy expenditure
of tea 21.843 kJ min-1.

2.1.2 Using wearable sensors


The same test can be conducted using some wearable devices for more accurate results compared to the
observation method. Wearable smartphones can be used for this purpose [3]. Wearable sensors ((including
motion, video, RFID and pressure sensors) are used for detecting physical fatigue occurrence in simulated
manufacturing task . Zahra Sedighi and Mohammad Ali Alamdar in their study mainly examine two things.
First estimate the physical fatigue level over time and secondly examine the use of wearable sensors to detect
physical fatigue occurrence in simulated manufacturing tasks. Eight healthy participants were chosen for the
assessment and senor data were recorded. Important features from the five sensors locations were selected

7
using Least Absolute Shrinkage and Selection Operator (LASSO), a popular variable selection methodology.
The results show that the LASSO model performed well for both physical fatigue detection and modeling.
The experiment consists of three lab sessions which takes upto three hours. In the beginning session, par-
ticipants completed a sleep quality questionnaire, a risk taking behavior task (Balloon Analogue Risk Task
(BART)), and a psycho-motor vigilance task (using PC-PVT). In addition, the subject was asked to lay in a
supine position for five minutes to measure resting heart rate. After baseline measurements, the participant was
provided with instructions on the relevant physically fatiguing task for the session. This study concentrates on
the work process to model physical fatigue, so in order to improve the model from a human work performance
perspective, further analyses concerned with quantitative performance measures (e.g., number of defects in a
time window and average task completion time over a time window) should also be examined.
Nipun D. Natha and Theodora Chasparib [7] shows the the importance of smartphone sensors in ergonomics.
Sensors can be used to collect input data for machine learning algorithms to identify field activities and to
estimate activity durations, and analysis in an experimental setting, findings can be generalized and inform
similar efforts in various occupations including construction, manufacturing, healthcare, transportation, and
agriculture. It uses smart phone for collecting time-stamped motion data from body-mounted smartphones
(i.e., accelerometer, linear accelerometer, and gyroscope signals), automatically detecting workers’ activities
through a classification framework, and estimating activity duration and frequency information. Megh Doshi.
Harsh Shah and Heetika Gada [8] have designed a wearable data acquisition system which measures posture
and angle values of the driver’s body and gives us an accurate R.U.L.A. rating of the driver in order to validate
and compare the software simulation values as obtained on Catia. It consists of neck, shoulder, elbow, wrist
and trunk angle sensor.
They put some effort on cockpit design which starts by first taking basic dimensions of drivers such as driver
height, sitting shoulder height, buttock width, shoulder width and distance between elbows. Adequate clear-
ances are then added to these dimensions to reach basic designing of frame. Considering sub-system integration
along with the above driver dimensions a rough chassis is designed which is inspected for different ergonomic
aspects to ensure driver comfort. RULA rating of designed chassis is done by using CATINA computer software.
The DAQ system measure the rating in both static and dynamic condition. Body joint angles are measured
using flex sensors and gyroscope mounted on drivers suit.
On testing the DAQ on BAJA vehicle continuous change of score is observed to range between a RULA score
of 4 to 6 for our vehicle for continuous testing 4 hours as compared to the RULA score of 3 which had been
obtained on the software Catia v5. They also conclude that software simulations cannot be perfect and as we
know that RULA is a universal rating and not only for driver ergonomics, this wearable DAQ can be used to
find the RULA score for normal everyday work activities as well which can help us determine risk factors for
all activities and not only for driving a vehicle.
Several other papers [9] are there which describe recent implementations of wearable sensors for quantitative
instrumental-based biomechanical risk assessments in prevention of WMSDs. They finds that still too few
researchers foresee the use of wearable technologies for biomechanical risk assessment although the requirement
to obtain increasingly quantitative evaluations and also new innovative technologies for biomechanical risk
assessment is only at its initial stage.
Ranavolo Alberto, Francesco Draicchio and two other researchers in their paper [8] describe recent implemen-
tations of wearable sensors for quantitative instrumental-based biomechanical risk assessments in prevention of
WMSDs. Instrumental approaches based on inertial measurement units and sEMG sensors have been used for
direct evaluations to classify lifting tasks into low and high risk categories. Wearable sensors have also been
used for direct instrumental evaluations in handling of low loads at high frequency activities by using the local
myoelectric manifestation of muscle fatigue estimation. In the field of the rating of standard methods, on-body
wireless sensors network-based approaches for real-time ergonomic assessment in industrial manufacturing have
been proposed. Several motion sensors are used for conducting the assessment. sEMG provide the measure
of electrical activity (on the skin) of the muscles involved in the movement. Single- or double differential

8
bipolar sEMG performed by using wet electrodes is widely and easily used in ergonomics for research activities
and directly at the workplace. IMUs (Inertial measurement units) allow the measure of orientation, position,
velocity and accelerations of each investigated segment and whole body posture.

2.1.3 Using Microsoft Kinet


Almost every limitation can be mitigated by the use of Microsoft Kinet sensor for pose estimation [4]. This is
able to predict the potential accuracy of the measurement for such complex 3D poses and sensor placements is
challenging in classical experimental setups .
More advanced body joint angle measurement can be carried out using Microsoft Kinet 3D sensor for pose
estimation. Pierre Plantard and Hubert P.H. Shum [9] propose and evaluate a RULA ergonomic assessment in
real work conditions using recently published occlusion-resistant Kinect skeleton data correction. First, they
compared postures estimated with this method to ground-truth data, in standardized laboratory conditions.
Second, compared RULA scores to those provided by two professional experts, in a non-laboratory cluttered
workplace condition. In their study corrected Kinect data can provide more accurate RULA grand scores,
even under sub-optimal conditions induced by the workplace environment. They conducted the test in 12
male participants (age: 30.1 7.0 years, height: 1.75 0.05 m, mass: 62.2 7.0 kg). They were equipped with
47 reflective markers positioned at standardized anatomical landmarks, as suggested in (Wu et al., 2005) to
measure reference postures. The motion of the participants was recorded by both a Microsoft Kinect 2 system
and a 15 cameras Vicon optical motion capture system. Overall, 5 different workstations were assessed and
the work task was recorded by a Microsoft Kinect 2 sensor. Their results shows that in controlled and real
workstation environments, the method accurately assessed the RULA score, even in challenging environments
with many occlusions. They also conclude that the system is easy to use and deploy in real work conditions,
without disturbing the workers and without specific engineer skills as no calibration is needed. Moreover, the
method could assist the ergonomists, and improve the standardization of assessments performed in various
geographic sites and periods.
Manghisi, V.M [10] present K2RULA, a semi-automatic RULA evaluation software based on the Microsoft
Kinect v2 depth camera, aimed at detecting awkward postures in real time, but also in off-line analysis. It
allows to speed-up the detection of critical conditions and to reduce the subjective bias. K2RULA is able
to analyze off-line data and to save the results for deeper ergonomic studies. They also conclude that the
proposed system can be effectively used as a fast, semi-automatic and low-cost tool for RULA analysis. In
their experiment they used 15 static posture. They hired an ergonomics expert and a volunteer. While the
volunteer was keeping each static pose for a few seconds, we recorded each posture. We assessed the RULA
grand-scores using both the K2RULA and the Jack-TAT. The RULA expert analyzed offline the recorded
video of each posture and assessed the RULA grand-scores. The results indicate their K2RULA method is
a best alternative for the classical visual inspection evaluation. They also compared it with a commercial
software, the Jack-TAT, based on the Kinect v1 sensor. In summary, K2RULA grand-scores are equivalent
to the assessments obtained with an optical motion capture system. And also K2RULA grand-scores are in
perfect agreement with a RULA expert evaluation. K2RULA outperforms the Jack-TAT tool, based on Kinect
v1. This can be effectively used as a fast, semi-automatic and low-cost tool for RULA analysis.
Darius Nahavandi and Mohammed Hossny [11] investigated the utilization of Kinet sensors for real time rapid
upper limb assessment (RULA) to aid in ergonomic analysis for assembly operations in industrial environ-
ments. Unlike earlier similar attempts, the work presented in this paper does not rely on tracking body parts
and extracting a kinematically sound skeleton, instead identifying a RULA score is formulated as a semantic
segmentation problem. A random decision forest (RDF) classifier is used to give each pixel a different RULA
score based on postures captured with a Kinect camera. Results demonstrate a converging accuracy of 93

9
2.2 Summary and Research Gap
Industries are seriously looking towards the musculoskeletal disorder that may be experienced by the workers
due to awkward working posture. So they are spending valuable time for an ergonomics test to identify the
issues and to get suggestion in order to rectify that. A lot of case studies are conducted by several industries
and some of them found that their workers are performing under serious working posture. Different researchers
employ different methods for evaluation. That may be direct observation, using some wearable sensors or much
more advanced sensors like kinet.
In the direct method researchers use some kind of sensors and take external influencing factors like muscle
factor and load factor etc. This method gives quite satisfying result which eliminates the error. But this uses
some sophisticated sensors and require particular arrangements for attaching these sensors with the different
parts of body may not be applicable in all the working posture. And this method need the expert supervision.
Also the sensor position affect the result and finding out a recommended sensor position is so crucial.
To overcome this limitation, recent works have proposed to take the reliability of the Kinect data into account
in the correction process. Reliability can then be integrated into a lazy learning framework to reconstruct a
more reliable posture. Almost all the research shows that the results are quite promising . 3D modelling and
Machine Learning algorithms opens a new era in this field. but, we still need to buy Microsoft Xbox One
Kinect Sensor + Adapter for XBOX One S Windows Motion Controller to conduct the assessment which costs
around 300 USD which is not practical for small scale industries.

2.2.1 Research Gap objectives


Government of India is now promoting small scale industries to begin new ventures by expanding their field
to different area as well as bringing their potential to its maximum. but usually ergonomics assessment is not
carried out in these kinds of industries. Sometime they may be unaware about the ergonomics and the impact
of wrong working posture or they don’t want to afford large amount of money for ergonomics assessments.
Simple observation based tests doesn’t cost much but either they have to hire some experts or themselves have
to study the whole process.
Kinet based test are giving more better results but the cost for that may not be favourable for these small
industries. A smartphone based ergonomic study is suitable in this area. The following reasons justifies the
statement

• Smart phones are now common to everyone and no need of any expert

• Latest image processing techniques can be deployed in the form of a smart phone application
• Employee just need to take a picture and the rest of the analysis is done by the smartphone itself
• They get instant result within their smartphone

10
2.2.2 Proposed Model
Using python, automate the process of extraction of information from an image to conduct ergonomics analysis
test. Image processing is the best way to solve the issue. There are different ways to the problem. We can use
image classification, detection or pose estimation. To perform each one, plenty of algorithms and open source
libraries are available, where ANN, CNN, RCNN, Fast RCNN, DNN etc are the important ones. This proposed
model consists of three main parts,

1. Working posture analysis using pose estimation

2. Detection of surrounding tools. Posture analysis can be automated by pose estimation model and neigh-
bouring tools such as mouse, monitor etc. can be detected by object detection model.
3. Third part is the extraction of relevant information about the posture from the image and conducting
the test.

11
Chapter 3

Work Posture Assessment Tools

Awkward working will lead to different musculoskeletal disorders and it is necessary to take proper ergonomic
assessment in these fields. Proper benchmarking should be there in rectify this. To go further many assessment
tools are which ergonomic researchers use. For example, RULA, REBA and OWAS etc. These tests are
conducted over a particular working posture and a final test score is published which in turn gives some
insights related to the severity related to that posture.

3.1 RULA Assessment


RULA (rapid upper limb assessment) is a survey method developed for use in ergonomics investigations of
workplaces where work-related upper limb disorders are reported. This tool requires no special equipment in
providing a quick assessment of the postures of the neck, trunk and upper limbs along with muscle function and
the external loads experienced by the body. A coding system is used to generate an action list which indicates
the level of intervention required to reduce the risks of injury due to physical loading on the operator. RULA
was actually developed to

• Provide a method of screening a working population quickly, for exposure to a likely risk of work-related
upper limb disorders

• Identify the muscular effort which is associated with working posture, exerting force and performing static
or repetitive work, and which may contribute to muscle fatigue
• Give results which could be incorporated in a wider ergonomics assessment covering epidemiological,
physical, mental, environmental and organizatioral factors

The development of RULA occurred in three phases

1. The development of the method for recording working postures

2. Development of the system for grouping the body part posture scores
3. Development of grand score and action list

12
3.1.1 Phase 1
The whole body was divided into two Group A and Group B. Group A consists of upper arm, lower arm and
wrist. Group B consists of neck, trunk and leg. Different body parts are assigned with some individual score
ie. number 1 is given to the range of movement or working posture where the risk factors present are minimal,
higher numbers are allocated to parts of the movement range with more extreme postures.
Figure 3.1 shows the diagrams for scoring the posture of the body parts in Group A, which are the upper arm,
lower arm and wrist, with a section to record the pronation or supination occurring. Similarly for Group B,
scoring scheme is explained in the figure 3.2.
As RULA can be conducted quickly, an assessment can be made of each posture in the work cycle. When
using RULA, only the right or left side is assessed at a tirne. After observing the operator it may be obvious
that only one arm is under load; however, if undecided, the observer would assess both sides.. RULA can be
extended to more detailed examination. Most noticeably, the postural assessment of the fingers and thumb
may be required in some investigations where exposure to risk factors is high for these digits. RULA does not
include such detail, although any force exerted by the fingers or thumb is recorded as part of the assessment
process.

3.1.2 Phase 2
A single score is required from the Groups A and B which will represent the level of postural loading of the
musculoskeletal system due to the combined body part. Muscle use and force scores A scoring svstem was
developed to include the additional load to the musculoskeletal system caused by excessive static muscle work,
repetitive motions and the requirement to exert force or maintain an external load while working. Total score
for both Group A and Group B can be calculated as follows
Total Group A score = Score in Table A + Muscle score + Force/Load score
Total Group B score = Score in Table B + Muscle score + Force/Load score
Muscle score and Force/Load score are the external risk factors which will cause fatigue and subsequent tissue
damage is dependent upon the time that the operators. RULA provides a simplified and conservative rating
system to be used as a guide to indicate whether these risk factors are present. It would be the function
of a subsequent more detailed assessment to establish their extent and effect on the operator’s wellbeing
and work. The muscle use is defined as repetitive if the action is repeated more than four times a minute.
This is acknowledged as a conservative general definition from which a risk may be present; however, further
assessment would be required. If the load or force is 2 kg or less and held intermittently then the score is 0. If
the intermittent load is 2-10 kg a score of 1 is given. If the load of 2-10 kg is static or repeated the score is 2.
The score is also 2 if the load is intermittent but more than 10 kg. Lastly, if the load or a force of more than
10 kg is experienced statically or repeatedly, the score is 3. If a load or force of any magnitude is experienced
with rapid build-up or a jolting action the score is also 3.

3.1.3 Phase 3
In this phase total scores from both groups are used to get overall RULA score according to the Table C. The
inference from the grand score can be as follows.

13
Figure 3.1: RULA Table A

14
Figure 3.2: RULA Table B
15
Figure 3.3: RULA Table C

• 1-2 : Acceptable score

• 3-4 : Further investigation, change needed


• 5-6 : Further investigation, change soon
• 7 = Investigate and implement soon

3.2 Further improvement in RULA in case of computer workstation


ergonomics
There is growing evidence that there needs to be flexibility in the way we sit at computer workstations. While
there is not one correct way to sit at a workstation, seating should support postures that can be changed
frequently within a comfortable range throughout the day. It should accommodate the:

• work being done


• visual demands
• worker’s individual differences

This will reduce fatigue and strain on the neck, shoulders, back and legs. All users should trial different
positions to work out the best set up for themselves An acceptable and well supported seated position means:

• sitting with the body close to the desk

• the head and neck are in a forward facing and midline position (i.e. no backward arching of the neck or
forward extension of the chin)

16
• the shoulders are relaxed and symmetrical and elbows slightly closer to the side of the body
• • using the preferred keying posture, depending on the style of keying used (i.e. traditional style or with
forearm support

• the back is supported by the chair backrest. The curved lower part of the backrest should fit into the
lower back or the lumbar curve
• having an open angle of 100-120 degrees (slightly more than a right angle) at the hip. This can be
achieved by adjusting the seat pan tilt and the backrest

• having knees at a height lower or level with the hips


• ensuring a gap of 2-3 finger widths between the front of the chair and the back of the knees • having feet
flat on the floor or footrest

Figure 3.4 below illustrates acceptable sitting position at the workstation. It allows for well supported postures
that can be changed within a comfortable range throughout the day.

Figure 3.4: Computer Workstation Posture

17
Chapter 4

Image Processing

4.0.1 Convolutional Neural Networks (CNN)


In the field of pattern recognition, there were a lot of problems that are not able solve by classical neural
networks and the CNN comes with ground breaking results. CNN that has an advantage over the ANN ie.
reduced number of parameters, so that researchers can solve larger models with that. CNN extracts features
that are spatially independent. For example, in a face detection application, we do not need to pay attention
to where the faces are located in the images. So the aim is to detect the object and the position of object in
the picture is not important. One more important aspect is to get different features when input propagates
towards the deeper layers. For example refer the given figure 4.1. As the input image propagates through the
layers, different features like detection of edge, shapes and facial features are extracted from layer 1, layer 2
and layer 3 respectively

Figure 4.1: Features in each layer in CNN

4.0.2 Convolution
Assume that the input to the CNN be coloured image of size 32X32 pixels of width and height. Since it is a
coloured image, have 3 more additional channels which contains information about density of primary colours
(RGB). So the input can be represented by a three dimensional matrix of size 32X32X3.

18
Figure 4.2: CNNN input size visualization

Adding one additional neuron to the hidden layer becomes 4 dimensional from 3 dimensional input ie. 32X32X3X2
parameters. So total 6144 parameters needed connect between input to just only two nodes. And two neurons
are not enough for any useful processing of image. We can connect the input image to the neurons in the
next layer with exactly the same values for the height and width. This network can be applied for the edge
detection. So the number of weight connections becomes 32X32X3 by 32X32.
It is a good idea for looking at local regions instead of looking the the whole image and it is more efficient way.
Below given figure 4.3 shows the regional connection for the next layer ie. only a part of the previous layer.
Thus, if we want to have 32X32 neurons in the next layer, then we will have 5X5X3 by 32X32 connections
which is 76,800 connections (compared to 3,145,728 for full connectivity).

Figure 4.3: CNN hidden layer connection

This shows number of parameters are drastically dropped. Next assumption for simplification, is keep the local
connection weights fixed for entire neuron of the next layer. It again reduces number of parameters and no: of
weights to just 5x5x3 equals to 75. One of the advantages to these assumptions is no: of connections decreases
from around 3 million to just 75 connections. And also fixing the local connection weights is similar to sliding
window of size 5x5x3. It provides an opportunity to detect and recognize features regardless of their positions
in the image. This is the reason why they are called convolutions.
Figure 4.4 shows the effect of convolution matrix for a window of size 3x3. This figure explains edge detection in
image processing. These sliding windows are also called filters as it acts like classic filter (here the filter extracts
the edges in an image. However, in the convolutional neural network these filters are initialized, followed by
the training procedure shape filters, which are more suitable for the given task. Adding more and more layers
are actually more beneficial in feature extraction. Each layer corresponds to separate filter matrix and there
fore different features can be taken out from an image. In the figure another filter looks at the same part of
the input image.

19
Figure 4.4: CNN Filter visualization

4.0.3 Stride
CNN has more opportunities for reducing the number of parameters and at the same time it is able to reduce
the side effects. In the previous example, it is simply assumed that the next layer’s node has lots of overlaps
with their neighbours by looking at the regions and we can control the degree of this overlapping by controlling
the number of strides. For example the figure 4.5 shows a 7x7 image, If we move the filter one node every time,
we can have a 5x5 output only. Note that the output of the three left matrices have an overlap. if we move
and make every stride 2, then the output will be 3x3. Put simply, not only overlap, but also the size of the
output will be reduced.

Figure 4.5: Convolution of filter (stride=2)

For an image of NXN dimension, filter of size FXF and S being the number of strides then the output size O
can be calculated as
N −F
O =1+
S

4.0.4 Padding
In CNN the convolution step causes loss of information at the border of the image. Border pixels are only
captured when the filter slides, they never have the chance to be seen. Append with zeros is one of the solution
for this and is known as zero padding. Zero padding also manages the output size.
For example if N=7, F=3 and stride = 1 then the output size will be 5x5. If we are introducing zero padding
the output remains 7x7 (same as that of input image size). Then the modified formula for output size including
zero padding becomes
N + 2P − F
O =1+
S

20
where P is the number of layers of zero padding. This padding idea helps us to prevent network output size
from shrinking with depth. Therefore, it is possible to have any number of deep convolutional networks.

Figure 4.6: Zero padding

4.0.5 Features of CNN


The weight sharing brings invariance translations to the model. It helps to filter the learn feature regardless
of the spatial properties. By starting random values for the filters, they will learn to detect the edge (such
as in Figure 4.5) if it improves the performance. It is important to remember that if we need to know that
something is spatially important in the given input, then it is an extremely bad idea to use a shared weight.
Computation can be in different dimensions according to the nature of input. If it is a sequential data (like
audio signal) 1 dimension convolution can be employed. If it is an image convolution will be 2 dimensional
and 3 dimensional for greyscale and coloured image respectively. Figure 4.7 shows a visualization of CNN deep
layers.

Figure 4.7: CNN Deep Layer

4.0.5.1 Convolution formula

Convolution can be explained by the given formula for each pixel in the output
XX
(x ∗ w)[i, j] = x[n.m]w[i − m, j − n]
n t

This equation can be visualized by the figure 4.8 given below. The element by element product of the input
and kernel is aggregated, and then represents the corresponding point in the next layer.

21
Figure 4.8: Visualization of Convolution Formula

4.0.5.2 Non linearity

Layer of non linearity is the layer just after the convolution. This layer is applied in order to saturate the
output or limiting the generated output. Sigmoid was the popular non linear function used for many years
which is described by the equation
1
Sigmoid(x) =
1 − exp−x
Hyperbolic tangent function also used
Now lot of non linear functions are used but Rectified Linear Unit (ReLU) became more popular due to the
following reason. ReLU(x) equals to 0 if x<0 otherwise ReLU(x) equals to x itself. Derivative of ReLU function
equals to 0 if x<0 otherwise equals to 1.

• Simple definition for derivative and computational simplicity


• Sigmoid and tanh functions have vanishing gradient where as ReLU has constant gradient for positive
inputs

• The ReLU creates a sparser representation. because the zero in the gradient leads to obtaining a complete
zero. However, sigmoid and tanh always have non-zero results from the gradient, which might not be in
favor for training.

22
4.0.5.3 Pooling

Pooling is actually down sampling the output after performing non linear function and reducing complexity. It
is nothing but reducing the resolution. Pooling does not affect the number of filters. Max pooling and Average
pooling are some example of pooling. Max pooling partitions the image to sub region rectangles, and it only
returns the maximum value of the inside of that sub-region. 2x2 max pooling is the commonly used one. when
pooling is performed in the top-left 2X2 blocks (pink area), it moves 2 and focus on top-right part. ie. stride
of 2 is used here

Figure 4.9: Max pooling

It should not preserve position of information. Therefore, it should be applied only when the presence of
information is important (rather than spatial information).

4.0.5.4 Fully connected layer

It is similar to the neurons in classic neural networks, that are arranged in a single line. So each node of in a
fully-connected layer is directly connected to every node in both the previous and in the next layer. Each of the
nodes in the last frames in the pooling layer are connected as a vector to the first layer from the fully-connected
layer. These are the most parameters used with the CNN within these layers, and take a long time in training.
But the a lot of parameters that need complex computational in training examples.

4.1 Convolutional Architecture for Fast Feature Embedding (CAFFE)


CAFFE (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework, originally
developed at University of California, Berkeley. It is open source, under a BSD license. It is written in
C++, with a Python interface. Convolutional Neural Networks, or CNNs, are discriminatively trained via
back-propagation through layers of convolutional filters and other operations such as rectification and pooling.
Following the early success of digit classification in the 90’s, these models have recently surpassed all known
methods for large-scale visual recognition, and have been adopted by industry heavyweights such as Google,
Facebook, and Baidu for image understanding and search.
While deep neural networks have attracted enthusiastic interest within computer vision and beyond, replication
of published results can involve months of work by a researcher or engineer. Sometimes researchers deem it

23
worthwhile to release trained models along with the paper advertising their performance. But trained models
alone are not sufficient for rapid research progress and few toolboxes offer truly off-the-shelf deployment of
state-of-the-art models—and those that do are often not computationally efficient and thus unsuitable for
commercial deployment. Caffe is introduced to mitigate these issues. Caffe, a fully opensource framework that
affords clear access to deep architectures. The code is written in clean, efficient C++, with CUDA used for GPU
computation, and nearly complete, well-supported bindings to Python/Numpy and MATLAB. Caffe adheres
to software engineering best practices, providing unit tests for correctness and experimental rigor and speed
for deployment. It is also well-suited for research use, due to the careful modularity of the code, and the clean
separation of network definition (usually the novel part of deep learning research) from actual implementation.
While Caffe was first designed for vision, it has been adopted and improved by users in speech recognition,
robotics, neuroscience, and astronomy. We hope to see this trend continue so that further sciences and industries
can take advantage of deep learning.

4.1.1 Highlights of Caffe


Caffe provides a complete toolkit for training, testing, finetuning, and deploying models, with well-documented
examples for all of these tasks. As such, it’s an ideal starting point for researchers and other developers looking
to jump into state-of-the-art machine learning. At the same time, it’s likely the fastest available implementation
of these algorithms, making it immediately useful for industrial deployment.

4.1.1.1 Modularity

The software is designed from the beginning to be as modular as possible, allowing easy extension to new data
formats, network layers, and loss functions. Lots of layers and loss functions are already implemented, and
plentiful examples show how these are composed into trainable recognition systems for various tasks.

4.1.1.2 Separation of representation and implementation

Caffe model definitions are written as config files using the Protocol Buffer language. Caffe supports network
architectures in the form of arbitrary directed acyclic graphs. Upon instantiation, Caffe reserves exactly as
much memory as needed for the network, and abstracts from its underlying location in host or GPU. Switching
between a CPU and GPU implementation is exactly one function call.

4.1.1.3 Test coverage

Every single module in Caffe has a test, and no new code is accepted into the project without corresponding
tests. This allows rapid improvements and refactoring of the codebase, and imparts a welcome feeling of
peacefulness to the researchers using the code.

4.1.1.4 Python and MATLAB bindings

For rapid prototyping and interfacing with existing research code, Caffe provides Python and MATLAB bind-
ings. Both languages may be used to construct networks and classify inputs. The Python bindings also expose
the solver module for easy prototyping of new training procedures.

24
4.1.1.5 Pre-trained reference models

Caffe provides (for academic and non-commercial use—not BSD license) reference models for visual tasks,
including the landmark “AlexNet” ImageNet model with variations and the R-CNN detection model. More are
scheduled for release. We are strong proponents of reproducible research: we hope that a common software
substrate will foster quick progress in the search over network architectures and applications.

4.1.2 Architecture
Caffe stores and communicates data in 4-dimensional arrays called blobs. Blobs provide a unified memory
interface, holding batches of images (or other data), parameters, or parameter updates. Blobs conceal the
computational and mental overhead of mixed CPU/GPU operation by synchronizing from the CPU host to the
GPU device as needed. In practice, one loads data from the disk to a blob in CPU code, calls a CUDA kernel to
do GPU computation, and ferries the blob off to the next layer, ignoring low-level details while maintaining a
high level of performance. Memory on the host and device is allocated on demand (lazily) for efficient memory
usage.
Models are saved to disk as Google Protocol Buffers, which have several important features: minimal-size
binary strings when serialized, efficient serialization, a human-readable text format compatible with the binary
version, and efficient interface implementations in multiple languages, most notably C++ and Python.
A Caffe layer is the essence of a neural network layer: it takes one or more blobs as input, and yields one or more
blobs as output. Layers have two key responsibilities for the operation of the network as a whole: a forward pass
that takes the inputs and produces the outputs, and a backward pass that takes the gradient with respect to the
output, and computes the gradients with respect to the parameters and to the inputs, which are in turn back-
propagated to earlier layers. Caffe provides a complete set of layer types including: convolution, pooling, inner
products, nonlinearities like rectified linear and logistic, local response normalization, elementwise operations,
and losses like softmax and hinge. These are all the types needed for state-of-the-art visual tasks. Coding
custom layers requires minimal effort due to the compositional construction of networks.
In the given figure 5.1 for caffe frame work blue boxes represent layers and yellow octagons represent data blobs
produced by or fed into the layers.

Figure 4.10: Caffe architecture

Caffe does all the bookkeeping for any directed acyclic graph of layers, ensuring correctness of the forward
and backward passes. Caffe models are end-to-end machine learning systems. A typical network begins with
a data layer that loads from disk and ends with a loss layer that computes the objective for a task such
as classification or reconstruction. The network is run on CPU or GPU by setting a single switch. Layers
come with corresponding CPU and GPU routines that produce identical results (with tests to prove it). The
CPU/GPU switch is seamless and independent of the model definition

25
Caffe trains models by the fast and standard stochastic gradient descent algorithm. Figure 5.1 shows a typical
example of a Caffe network training: a data layer fetches the images and labels from disk, passes it through
multiple layers such as convolution, pooling and rectified linear transforms, and feeds the final prediction into
a classification loss layer that produces the loss and gradients which train the whole network. This example is
found in the Caffe source code at lenet-train.prototxt. Data are processed in mini-batches that pass through
the network sequentially. Vital to training are learning rate decay schedules, momentum, and snapshots for
stopping and resuming, all of which are implemented and documented.

4.2 Binary Large Object (BLOB)


It is already explained that the caffe models store and communicate data through a 4 dimensional arrays called
Blob. We can generate the blob from the input image where OpenCV provides required function. The function
cv2.dnn.blobFromImage with image as argument do the generation of blob. Then this blob is fed into the
network that already generated. So it is important to know how blobs works.
Blob Analysis is a fundamental technique of machine vision based on analysis of consistent image regions. As
such it is a tool of choice for applications in which the objects being inspected are clearly discernible from the
background. Diverse set of Blob Analysis methods allows to create tailored solutions for a wide range of visual
inspection problems.
Main advantages of this technique include high flexibility and excellent performance. Its limitations are: clear
background-foreground relation requirement and pixel-precision. For example, in the figure 6.1 design an
algorithm which can figure out how many circles are present in the image or identify the position of the person
in the image. First we have to separate the different objects in the image and then we have to evaluate which
object is the one we are looking for, i.e., circles and humans, respectively. BLOB stands for Binary Large
OBject and refers to a group of connected pixels in a binary image. The term “Large” indicates that only
objects of a certain size are of interest and that “small” binary objects are usually noise.

Figure 4.11: Segmentaion using blob

The purpose of BLOB extraction is to isolate the BLOBs (objects) in a binary image. As mentioned above,
a BLOB consists of a group of connected pixels. Whether or not two pixels are connected is defined by the
connectivity, that is, which pixels are neighbors and which are not. The two most often applied types of
connectivity are illustrated in figure 6.2. The 8-connectivity is more accurate than the 4-connectivity, but the
4-connectivity is often applied since it requires fewer computations, hence it can process the image faster. The
effect of the two different types of connectivity is illustrated in figure, where the binary images contain either
one or two BLOBs depending on the connectivity.

26
Figure 4.12: Blob connectivity

A number of different algorithms exist for finding the BLOBs and such algorithms are usually referred to as
connected component analysis or connected component labeling. In the following we describe one of these
algorithms known as the Grass-fire algorithm. We use 4-connectivity for simplicity.

4.2.1 The Recursive Grass-Fire Algorithm


At some point during the scan an object pixel (white pixel) is encountered and the notion of grass-fire comes
into play. In the binary image in Fig. 6.3 the first object pixel is found at the coordinate (2, 0). At this
point you should imagine yourself standing in a field covered with dry grass. Imagine you have four arms (!)
and are holding a burning match in each hand. You then stretch out your arms in four different directions
(corresponding to the neighbors in the 4-connectivity) and simultaneously drop the burning matches. When
they hit the dry grass they will each start a fire which again will spread in four new directions (up, down, left,
right) etc. The result is that every single straw which is connected to your initial position will burn. This is the
grass-fire principle. Note that if the grass field contains a river the grass on the other side will not be burned.
Returning to our binary image, the object pixels are the "dry grass” and the nonobject pixels are water. So,
the algorithm looks in four different directions and if it finds a pixel which can be "burned”, meaning an object
pixel, it does two things.
Firstly, in the output image it gives this pixel an object label (basically a number) and secondly it “burns” the
pixel in the input image by setting it to zero (black). Setting it to zero indicates that it has been burned and
will therefore not be part of yet another fire. In the real grass field the fire will spread simultaneously in all
directions. In the computer, however, we can only perform one action at the time and the grass-fire is therefore
performed as follows.
Let us apply the principle on Fig. 6.3. The pixel at the coordinate (2, 0) is labeled 1, since it is the first BLOB
and then burned (marked by a 1 in the lower right corner). Next the algorithm tries to start a fire at the first
neighbor (3,0), by checking if it is an object pixel or not. It is indeed an object pixel and is therefore labeled
1 (same object) and “burned”. Since (3,0) is an object pixel, it now becomes the center of attention and its
first neighbor is investigated (4,0). Again, this is an object pixel and is therefore labeled 1, “burned” and made
center of attention. The first neighbor of (4, 0) is outside the image and therefore per definition not an object
pixel. The algorithm therefore investigates its second neighbor (4,1). This is not an object pixel and the third
neighbor of (4,0) is therefore investigated (3,0). This has been burned and is therefore no longer an object
pixel. Then the last neighbor of (4, 0) is investigated (4, -1).

27
Figure 4.13: 4 connectivity

This is outside the image and therefore not an object pixel. All the neighbors of (4, 0) have now been
investigated and the algorithm therefore traces back and looks at the second neighbor of (3, 0), namely (3,1).
This is an object pixel and is therefore labeled 1, burned and becomes the new focus of attention. In this way
the algorithm also finds (3,2) to be part of object 1 and finally ends by investigating the fourth neighbor of
(2, 0). This is outside the image and therefore not an object pixel. All the neighbors of (4, 0) have now been
investigated and the algorithm therefore traces back and looks at the second neighbor of (3, 0), namely (3,1).
This is an object pixel and is therefore labeled 1, burned and becomes the new focus of attention. In this way
the algorithm also finds (3,2) to be part of object 1 and finally ends by investigating the fourth neighbor of
(2, 0). All pixels which are part of the top object have now been labeled with the same label 1 meaning that
this BLOB has been segmented. The algorithm then moves on following the scan path until it meets the next
object pixel (1, 3), which is then labeled 2, and starts a new grass-fire.

4.3 Pose Estimation


Pose estimation is a computer vision technique that predicts and tracks the location of a person or object.
This is done by looking at a combination of the pose and the orientation of a given person/object. We can
also think of pose estimation as the problem of determining the position and orientation of a camera relative
to a given person or object. This is typically done by identifying, locating, and tracking a number of keypoints
on a given object or person. For objects, this could be corners or other significant features. And for humans,
these keypoints represent major joints like an elbow or knee.
There’s also a key distinction to be made between 2D and 3D pose estimation. 2D pose estimation simply
estimates the location of keypoints in 2D space relative to an image or video frame. The model estimates an

28
X and Y coordinate for each keypoint. 3D pose estimation works to transform an object in a 2D image into a
3D object by adding a z-dimension to the prediction. But here I am doing 2D pose estimation to examine the
working posture.
In this project pose estimation is used for identifying key body joints such as head, neck, chest, both shoulders
and wrist etc. total of 15 such key joints are identified using this 2D pose estimation. Lot of image datasets
along with features annotation are available. COCO, MPII are some example. Here I am using MPII dataset.

4.3.1 MPII Dataset


MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation.
It consists of 25 thousand images which covers 410 human activities and each image with label and annotated
body joints. Different key joints are represented with unique joint ID as mentioned below.

Head 0
Neck 1
Right Shoulder 2
Right Elbow 3
Right Wrist 4
Left Shoulder 5
Left Elbow 6
Left Wrist 7
Right Hip 8
Right Knee 9
Right Ankle 10
Left Hip 11
Left Knee 12
Left Ankle 13
Chest 14

4.4 Haar Cascade Classifier


Haar Cascade is a machine learning object detection algorithm used to identify objects in an image or video
and based on the concept of features proposed by Paul Viola and Michael Jones in their paper "Rapid Object
Detection using a Boosted Cascade of Simple Features" in 2001.

4.4.1 Dataset Preparation


It is a machine learning based approach where a cascade function is trained from a lot of positive and negative
images. It is then used to detect objects in other images. Initially we need to make the dataset for classification.
The dataset consists of large number of positive and negative images. For example in face detection, th positive
images include large variety of faces and in negative images it should not be there. Negative image may be any
image other than faces. Now the dataset consists of two folders ie. positive and negative images. This dataset
is now ready for training.

4.4.2 Festure Extraction


Then we need to extract features from it. For this, haar features shown in below image are used. They are
just like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels under

29
white rectangle from sum of pixels under black rectangle.

Figure 4.14: Haar features

Now all possible sizes and locations of each kernel is used to calculate plenty of features. (Even a 24x24 window
results over 160000 features). For each feature calculation, we need to find sum of pixels under white and black
rectangles. To solve this, they introduced the integral images. It simplifies calculation of sum of pixels, how
large may be the number of pixels, to an operation involving just four pixels. It makes things super-fast.
But among all these features we calculated, most of them are irrelevant. For example, consider the image
below. Top row shows two good features. The first feature selected seems to focus on the property that the
region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on
the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks
or any other place is irrelevant. Best features are selected by using Adaboost.

30
Figure 4.15: Haar feature extraction

For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold
which will classify the faces to positive and negative. But obviously, there will be errors or misclassifications.
We select the features with minimum error rate, which means they are the features that best classifies the face
and non-face images.
It is already mentioned that a 24x24 window results more than 16000 features, but just applying 6000 features
to it is so time consuming. For this Paul Viola and Michael Jones introduced the concept of Cascade of
Classifiers. Instead of applying all the 6000 features on a window, group the features into different stages
of classifiers and apply one-by-one. (Normally first few stages will contain very less number of features). If
a window fails the first stage, discard it. We don’t consider remaining features on it. If it passes, apply the
second stage of features and continue the process. The window which passes all stages is a face region. Authors’
detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in first five stages. (Two features
in the above image is actually obtained as the best two features from Adaboost). According to authors, on an
average, 10 features out of 6000+ are evaluated per sub-window.

31
Chapter 5

Project Work

The whole project can be divided into three parts.

5.1 Part 1: Working Posture Estimation


Posture analysis is carried out by the latest image processing tool ie. pose estimation. In this project dif-
ferent body part posture have to be extracted to conduct the ergonomics assessment. To conduct the RULA
assessment I need to get the information about upper arm, lower arm, wrist, neck, trunk and legs. How much
these body parts are deviated and how to score different parts according to the RULA score sheet are the
primary job. Pose estimation identifies the body joints form an input image and it give a prediction for each
joint along with the coordinate position (in terms of pixels). This relative points for each joint can be used for
computation of angle. For example, upper arm angle deviation w.r.t the vertical axis can be estimated by the
help of coordinate information of shoulder and elbow. Similarly for the lower arm we can use elbow and wrist
coordinates. In this way we can have all the information for conducting the RULA assessment.

5.2 Part 2: Object Detection


In case of desktop workstation analysis, sometimes we need to take care about the relative position of computer
peripherals like monitor, keyboard and mouse etc. In this case we need to go for object detection algorithms.
Here I am using haar cascade method for object detection. First, I want to make a custom dataset for detection.
For monitor detection a dataset consist of hundreds of images of monitor is used and then it is used for training
purpose. After training, model is obtained as an xml file and it is ready to deploy. A pretrained haar cascade
model is used for detection of human eye. A side view image of computer working posture is given as an input
and the model successfully detects the monitor and eye with their relative position. By doing this phase, it is
easy to locate the position of monitor w.r.t the eye level so that the model can infer whether the monitor is in
good position or not.

5.3 Part 3: RULA Assessment


After extracting the relevant information, next we have to make use of it. RULA test can be conducted by using
this information. For additional information such as wrist twist and muscle status has to be given through
specially designed python APP. This app consists of a bunch of dropdown menu from which the user can choose

32
appropriate one. Along with these information RULA test can be started. In python Table A, B and C are
replicated and from which the Group score and final REBA score can be estimated. In addition to that severity
level and recommendations are given as the final output.

33
Chapter 6

Result and Discussion

The pose estimation gives an output along with the position of all 15 joints with pretty much decent accuracy
for any given pose. Haar cascade model is trained by using a set of positive and negative images and a model
is obtained as cascade.xml file which is ready to deploy. Using this model monitor is detected and effectively
track the position in that given image. Haar detection for eye is also used. This generates individual position
for monitor and eye contained in the single image and according to their relative position, makes inference on
it. The GUI made from Python helps the user to give additional external influencing factor from a dropdown
menu. This makes more user friendly approach. By incorporating all these information, rest of the RULA
process is automated using Python. For this particular posture, total RULA score is 4. this implies that the
Risk level is low and improvement in the posture can be done but not necessary.

Figure 6.1: Pose estimation result of a random pose

34
Figure 6.2: Positive and Negative images for dataset preparation

Figure 6.3: Custom haar cascade for monitor detection

35
Figure 6.4: Python GUI for addition information

A random image is choosen (left) and the pose estimation is carried out, results (right) are shown in Figure
10.5.

Figure 6.5: Caffe model Pose estimation result

36
Figure 6.6: Final Rula Result conducted for computer workplace posture

From the summary it can be seen that wrist position is having a score of 3 which is caused by the inclination
caused by the wrong posture. Also upper and lower arm posture contribute a score of 2. The summary shows
that these body parts need to be changed. From the Group B section, neck is contributing a score of 3, which
is also an area where investigation is neded.

37
Chapter 7

Conclusions

7.1 Conclusions
With the help of image processing, posture analysis part of ergonomics assessment is automated and it simplifies
the procedure. With this advancement we just need to take an image of employee in his/her normal working
posture and rest of the assessment procedure is done automatically and the output results the action level of
current working procedure so that we can take necessary action to avoid different Musculoskeletal disorders.
Hence to conduct a cost-effective workplace assessment we don’t have to be an ergonomics expert. So this is
suitable for small scale industries and can conduct the test effortlessly.

7.2 Benefits
By this method user need to just take a picture of a particular work posture for example in the figure 10.5,
and also need to choose suitable one from the dropdown list as shown in the figure 10.4. Rest of the thing is
fully automated. A complete summary is given after processing the posture. In this method user don’t need
to by any expensive sensors and don’t need to worry about their placement over the body, no need hiring any
ergonomics expert for assessment. Pose estimation used here is giving very accurate results in predicting the
different body joints, hence the overall results are reliable

7.3 Future Scope


Extracting the necessary information for RULA and REBA assessment (ie. upper arm displacement, lower
arm displacement etc.) can be done using pose estimation. As we are using 2D pose estimation information
such as wrist twist or neck twist cannot be determined, but it is possible in 3D pose estimation.
In case of computer workstation design, detection of monitor, mouse and keyboard etc. can be done using
Haar cascade objection detection model. Haar cascade object detection model can be replaced by TensorFlow
object detection API models for better results. In this project, monitor detection alone is presented. This can
be escalated into keyboard and mouse detection for ensuring their position within the primary work place area.

38
Bibliography

[1] Mr.Ganesh S.Jadhav and Mr.Gurunath V Shind Ergonomic Evaluation Tools RULA and REBA Analysis:
Case study

[2] N. A. Ansari, Dr. M. J. Sheikh. IOSR Evaluation of work Posture by RULA and REBA: A Case Study. .
IOSR Journal of Mechanical and Civil Engineering (IOSR-JMCE) e-ISSN: 2278-1684,p-ISSN: 2320-334X,
Volume 11, Issue 4 Ver. III (Jul- Aug. 2014), PP 18-23 www.iosrjournals.org
[3] Zahra Sedighi Maman a, Mohammad Ali Alamdar Yazdi a, Lora A. Cavuoto b, Fadel M. Megahed. Z.
Sedighi Maman et al A data-driven approach to modeling physical fatigue in the workplace using wearable
sensors Applied Ergonomics 65 (2017) 515e529
[4] Pierre Plantard and Edouard Auvinet Pose Estimation with a Kinect for Ergonomic Studies: Eval-
uation of the Accuracy Using a Virtual Mannequin DOI: 10.3390/s150101785 Musculoskeletal disor-
ders: OWAS review. Marta Gómez-Galán, José Pérez-Alonso. Industrial Health 55(4) · May 2017. DOI:
10.2486/indhealth.2016-0191.

[5] Farshad Soheilifard. Amir Rahbar Ergonomic investigation of workers in tea factories using REBA and
OWAS methods-case study Agricultural Engineering International : The CIGR e-journal 19(3) · October
2017.
[6] Nipun D. Natha, Theodora Chasparib Automated ergonomic risk monitoring using body-mounted sensors
and machine learning https://doi.org/10.1016/j.aei.2018.08.020
[7] Megh Doshi. Harsh Shah, Heetika Gada Wearable DAQ(Data Acquisition System) for measurement
of R.U.L.A.(Rapid Upper Limb Assessment) rating of vehicles IEEE Xplore Compliant Part Number:
CFP18K74-ART; ISBN:978-1-5386-2842-3
[8] Alberto Ranavolo 1,*, Francesco Draicchio 1, Tiwana Varrecchia 2, Alessio Silvetti 1 and Sergio Iavicoli.
Int. J. Environ. Res Wearable Monitoring Devices for Biomechanical Risk Assessment at Work: Current
Status and FutureChallenges
[9] Pierre Plantard a, b, , Hubert P.H. Shum c, Anne-Sophie Le Pierres a, Franck Multon. Plantard, P., et
al Validation of an ergonomic assessment method using Kinect data in real workplace conditions Public
Health 2018, 15, 2001; doi:10.3390/ijerph15092001

[10] Manghisi, V.M., et al Real time RULA assessment using Kinect v2 sensor, Applied Ergonomics Applied
Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2017.02.015
[11] Darius Nahavandi and Mohammed Hossny Skeleton-free RULA ergonomic assessment using Kinect sensors
Intelligent Decision Technologies 11 (2017) 275–284 DOI 10.3233/IDT-170292 IOS Press

39
[12] Saad ALBAWI , Tareq Abed MOHAMMED and Saad AL-ZAWI Understanding of a Convolutional Neural
Network 978-1-5386-1949-0/17/31.00 c 2017 IEEE
[13] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio
Guadarrama, Trevor Darrel Caffe: Convolutional Architecture for Fast Feature Embedding UC Berkeley
EECS, Berkeley, CA 94702 jiayq,shelhamer,jdonahue,sergeyk,jonlong,rbg,sguada,[email protected]
[14] Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon1, Yaser Sheikh1
Single-Network Whole-Body Pose Estimation arXiv:1909.13423vl [cs.CV] 30 Sep 2019

[15] Paul Viola, Michael Jones Rapid Object Detection using a Boosted Cascade of Simple Features CONFER-
ENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001
[16] M.F. Ghazali, M. Mat Salleh, N. Zainon, S. Zakaria1 and C. D. M. Asyraf RULA and REBA Assess-
ments in Computer Laboratories National Symposium on Advancements in Ergonomics and Safety (ER-
GOSYM2009), 1-2 December 2009, Perlis, Malaysia

[17] Ergonomic guide to computer based workstations PN 11334 Version 1 Last updated August 2012

40

You might also like