Final Mini Project Report
Final Mini Project Report
submitted by
AMAL NN
M190125EE
First and foremost I thank God Almighty. His blessings were with me throughout
my preparation.
I would also like to thank Dr. RIJIL RAMCHAND, Professor and Head of
the Department of Electrical Engineering, for providing felicitous environment
and support during the entire course of the project.
AMAL NN
2
DECLARATION
Calicut AMAL NN
30-07-2020
3
CERTIFICATE
Ergonomic analysis for work posture is carried out for identifying the risk behind
that posture and avoid serious musculoskeletal disorders. In many industries
workers have to do many tasks manually and some of them are doing in awkward
posture. But after a long run they will experience several health issues. To avoid
this usually companies hire a group of ergonomic experts to analyze the work
posture in their industries and to take necessary action for preventing undesirable
after effects. Usually small scale industries do not take this as a serious case due to
following reason: 1. They might be unaware about the workposture assessment, 2.
Currently available methodologies are not affordable for them. Here a significant
improvement must be there to solve the problem.
The main part of the assessment in the measurement of the body joint angle
and are done either by direct method or by using expensive wearable sensors. In
direct method supervision from an expert is necessary. As the sensors are costly
(Kinet) and sensors need to place on the body in correct position. So here in
my project, I am trying to the advantages and developments in the Artificial
Intelligence and Image processing to this field so that the solution will be better.
I am using image processing for body pose estimation which identifies the key
body joints and from that data we can estimate the necessary information for the
first phase of assessment. The algorithm can be tested on a smartphone. This
project describes a novel approach towards the ergonomics.
The entire project consists of three parts: 1. Pose estimation with the help of
Image Processing, 2. Haar cascade based object detection for identification of
relative position of peripheral devices and 3. Fully automated RULA assessment.
Using a completely developed model, user need to take a photograph of a partic-
ular working posture and rest of the work is done by the smartphone itself and
hence this is very suitable for ever small scale industries.
Keywords: Pose estimation, RULA assessment, Convolutional Neural Network,
Caffe architecture, Haar cascade.
5
Contents
1 Inroduction 4
1.1 Ergonomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Computer workstation ergonomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Significance of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Report Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 6
2.1 Types of assessment method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Direct Observation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Using wearable sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Using Microsoft Kinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Summary and Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Research Gap objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Image Processing 18
4.0.1 Convolutional Neural Networks (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.0.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.0.3 Stride . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.0.4 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.0.5 Features of CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.0.5.1 Convolution formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.0.5.2 Non linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.0.5.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1
4.0.5.4 Fully connected layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Convolutional Architecture for Fast Feature Embedding (CAFFE) . . . . . . . . . . . . . . . . 23
4.1.1 Highlights of Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.1 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.2 Separation of representation and implementation . . . . . . . . . . . . . . . . . 24
4.1.1.3 Test coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.4 Python and MATLAB bindings . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.5 Pre-trained reference models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Binary Large Object (BLOB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 The Recursive Grass-Fire Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 MPII Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Haar Cascade Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.1 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.2 Festure Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Project Work 32
5.1 Part 1: Working Posture Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Part 2: Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Part 3: RULA Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Conclusions 38
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.2 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2
List of Figures
3
Chapter 1
Inroduction
Work posture analysis is very important in almost every industry. Ergonomics researchers commonly conducts
various tests for this purpose. Till now they have to manually record the relevant information (ie. angular
displacement of different body parts) to perform the test. Even some software automated the entire process
but taking relevant information from the posture is still manual (In the manual it says take a picture of the
posture and measure the angles using a protractor) Main aim is to identify disorders in the posture using image
processing techniques and take information to conduct the test.
1.1 Ergonomics
Ergonomics is the process of designing or arranging workplaces, products and systems so that they fit the
people who use them. Lack of ergonomics and improper workplace design will cause a number of work-related
musculoskeletal disorder for the employees and a huge cost for the company and society. In industry the
main aim is to reduce the excessive force and avoid awkward posture and reduce the problem related to
musculoskeletal disorder. Some common body assessment methods are OWAS, RULA, REBA, NIOSH etc.
We can conduct these tests for workplace design in any area like manufacturing industry, construction site and
desktop setup etc. to reduce MSD.
Researchers use different methods for analysing the working posture and try to determine the severity in the
MSD (Musculoskeletal disorder) and assign a test score for a particular posture. For small scale industries
RULA (Rapid upper limb assessment) and REBA (Rapid entire body assessment) assessment is carried out
for the evaluation of work posture. Due to its simplicity and variety of application, these tests acquired its
popularity among the ergonomics investigators of workplace. Workers have to perform various tasks such as
repetitive lifting, differential lifting height etc. or preforming a task in very awkward posture such as kneeling
posture are to be examined under these assessments. First step being the examination of posture and identify
the relevant information according to the worksheet that given in the manual. Usually these small-scale
industries hire an ergonomic expert to conduct the test and make relevant inference out of the test. They
verify upper and lower body posture separately and give some individual score for upper arm, lower arm, neck,
trunk and leg etc. With the help of worksheet gives a final score which tells the danger level of the posture.
There is many software available for conducting the test, but still it need manual supervision while using it
ie. the user manual for one of the software, but in the first step we need to take a picture of a worker in their
working posture and using a protractor measure the angle between the different body parts according to the
test worksheet.
4
1.2 Computer workstation ergonomics
Beside such small-scale industry, lack of care on sitting posture in front of computer for a long period will
also cause severe after effects. Awkward posture may cause eye strain and back pain due to damage in spinal
cord. Here also RULA and REBA works good in the investigation. In this method also a camera is used to
take series of images of test posture and conduct these assessments as explained before. Here also by using the
virtue of Image processing technique, extracting the relevant data for the test can be easy.
• They have to meet additional expense in hiring ergonomic experts from outside
• The workers might be unaware about ergonomics
Why existing methods are not suitable for such small scale industries?
These are the core reason which explains the significance of this project.
5
Chapter 2
Literature Review
Ergonomics researchers use different methods and assessment tools for evaluating work posture in real time.
Different case studies are conducted in different ways. That may be manual, Direct or using specialized
equipment. Manual method consists of some check list that the researcher has to be check according to that
particular posture. There are chance of error while taking measurement and computation. Researcher has to
manually measure the different angle and categorise according to that. Slight deviations in the measurement
will lead to undesirable result. And this method not reliable as this could lead biased interpretation.
6
One more example for the case study that conducted at MIDC Wardha (Maharashtra, India) [2]. This is also
similar to the previous case study. Assessment is carried out using worksheet. The RULA method determined
that the majority of the workers were under high risk levels and required immediate change. The REBA
method determined that some of the workers were under lower levels and majority at high risk levels. This
shows the importance of study at small industries in India.
N. A. Ansari and Dr. M. J. Sheikh [2] says small scale industries plays an important role in the Indian
economy. But still the workers are doing almost every tasks performed by manually and hence they experience
some musculoskeletal disorder and injuries in various part of their body. They point out this issue through
their paper. They observed that the lack of knowledge about the importance of work posture assessment can
be the main reason for that. Conducting a work posture assessment and identify the area where improvement
can be carried out by using some better tools for reducing the risk. In their study, 15 workers are chosen
in MIDC Wardha. The workers have average stature 168.34 cm + 2.69 S.D., average age 35.8 years + 3.02
S.D., average weight 63.6 kilograms + 6.66 S.D. and average experience 11.2 years. Snapshots are analyzed
for conducting RULA and REBA assessment. In this study, they found that according to RULA assessment,
around 40 percentage of the workers are at high risk level and needs a Investigate and change immediately,
whereas 47 percentage workers were found at medium risk levels and needs a Investigate further and change
soon. Around 13 percentage of the workers are working in Investigate further. And the REBA assessment
result shows around 53 percentage of the workers were working at high risk levels. It was found that, if the
workers continued to work in the same posture, they suffer from the MSDs related to neck, trunk and wrist in
the near future.
Not only RULA and REBA, OWAS is also an important tool in ergonomics assessment. Marta Gómez- Galán
and José Pérez-Alonso conducted a study in their paper [5] with OWAS. They have found has been applied
mainly in three sectors: industry, health, and agriculture and livestock. It is one of the most widely used and
tested semi-direct methods of MSD evaluation in the world, but needs to be complemented with other indirect
or direct methods. Whenever the OWAS has been used, whether individually or together with other methods,
MSD risks have been detected, this perhaps being an indicator to review the evaluation parameters because
overestimating the risk.
Farshad Soheili-Fard and Amir Rahbar [6] conducted the OWAS and REBA method to conduct ergonomic
investigation of workers in a tea factory at Langroud region, Guilan, Iran. They have conducted both REBA
and OWAS tests and results was quite promising. Around 75 percentage workers in the tea factories are
resulting a REBA score in between 4 to 7. It means that their level exposure to risk is medium and corrective
measurements are necessary. And 12 percentage of workers shows a REBA score in between 2 – 3 and 8 – 10.
Just 5 postures were in 11 to 15 score level. High risk was observed in curling and oxidation unit and corrective
measurements is necessary now.
The OWAS results shows curling and oxidation stage need immediate corrective actions as these postures
allocated 78 percentage of all back related postures and also 34 percentage of used postures by the workers of
tea factories had high and very high-risk level, so modification are necessary. The average of energy expenditure
of tea 21.843 kJ min-1.
7
using Least Absolute Shrinkage and Selection Operator (LASSO), a popular variable selection methodology.
The results show that the LASSO model performed well for both physical fatigue detection and modeling.
The experiment consists of three lab sessions which takes upto three hours. In the beginning session, par-
ticipants completed a sleep quality questionnaire, a risk taking behavior task (Balloon Analogue Risk Task
(BART)), and a psycho-motor vigilance task (using PC-PVT). In addition, the subject was asked to lay in a
supine position for five minutes to measure resting heart rate. After baseline measurements, the participant was
provided with instructions on the relevant physically fatiguing task for the session. This study concentrates on
the work process to model physical fatigue, so in order to improve the model from a human work performance
perspective, further analyses concerned with quantitative performance measures (e.g., number of defects in a
time window and average task completion time over a time window) should also be examined.
Nipun D. Natha and Theodora Chasparib [7] shows the the importance of smartphone sensors in ergonomics.
Sensors can be used to collect input data for machine learning algorithms to identify field activities and to
estimate activity durations, and analysis in an experimental setting, findings can be generalized and inform
similar efforts in various occupations including construction, manufacturing, healthcare, transportation, and
agriculture. It uses smart phone for collecting time-stamped motion data from body-mounted smartphones
(i.e., accelerometer, linear accelerometer, and gyroscope signals), automatically detecting workers’ activities
through a classification framework, and estimating activity duration and frequency information. Megh Doshi.
Harsh Shah and Heetika Gada [8] have designed a wearable data acquisition system which measures posture
and angle values of the driver’s body and gives us an accurate R.U.L.A. rating of the driver in order to validate
and compare the software simulation values as obtained on Catia. It consists of neck, shoulder, elbow, wrist
and trunk angle sensor.
They put some effort on cockpit design which starts by first taking basic dimensions of drivers such as driver
height, sitting shoulder height, buttock width, shoulder width and distance between elbows. Adequate clear-
ances are then added to these dimensions to reach basic designing of frame. Considering sub-system integration
along with the above driver dimensions a rough chassis is designed which is inspected for different ergonomic
aspects to ensure driver comfort. RULA rating of designed chassis is done by using CATINA computer software.
The DAQ system measure the rating in both static and dynamic condition. Body joint angles are measured
using flex sensors and gyroscope mounted on drivers suit.
On testing the DAQ on BAJA vehicle continuous change of score is observed to range between a RULA score
of 4 to 6 for our vehicle for continuous testing 4 hours as compared to the RULA score of 3 which had been
obtained on the software Catia v5. They also conclude that software simulations cannot be perfect and as we
know that RULA is a universal rating and not only for driver ergonomics, this wearable DAQ can be used to
find the RULA score for normal everyday work activities as well which can help us determine risk factors for
all activities and not only for driving a vehicle.
Several other papers [9] are there which describe recent implementations of wearable sensors for quantitative
instrumental-based biomechanical risk assessments in prevention of WMSDs. They finds that still too few
researchers foresee the use of wearable technologies for biomechanical risk assessment although the requirement
to obtain increasingly quantitative evaluations and also new innovative technologies for biomechanical risk
assessment is only at its initial stage.
Ranavolo Alberto, Francesco Draicchio and two other researchers in their paper [8] describe recent implemen-
tations of wearable sensors for quantitative instrumental-based biomechanical risk assessments in prevention of
WMSDs. Instrumental approaches based on inertial measurement units and sEMG sensors have been used for
direct evaluations to classify lifting tasks into low and high risk categories. Wearable sensors have also been
used for direct instrumental evaluations in handling of low loads at high frequency activities by using the local
myoelectric manifestation of muscle fatigue estimation. In the field of the rating of standard methods, on-body
wireless sensors network-based approaches for real-time ergonomic assessment in industrial manufacturing have
been proposed. Several motion sensors are used for conducting the assessment. sEMG provide the measure
of electrical activity (on the skin) of the muscles involved in the movement. Single- or double differential
8
bipolar sEMG performed by using wet electrodes is widely and easily used in ergonomics for research activities
and directly at the workplace. IMUs (Inertial measurement units) allow the measure of orientation, position,
velocity and accelerations of each investigated segment and whole body posture.
9
2.2 Summary and Research Gap
Industries are seriously looking towards the musculoskeletal disorder that may be experienced by the workers
due to awkward working posture. So they are spending valuable time for an ergonomics test to identify the
issues and to get suggestion in order to rectify that. A lot of case studies are conducted by several industries
and some of them found that their workers are performing under serious working posture. Different researchers
employ different methods for evaluation. That may be direct observation, using some wearable sensors or much
more advanced sensors like kinet.
In the direct method researchers use some kind of sensors and take external influencing factors like muscle
factor and load factor etc. This method gives quite satisfying result which eliminates the error. But this uses
some sophisticated sensors and require particular arrangements for attaching these sensors with the different
parts of body may not be applicable in all the working posture. And this method need the expert supervision.
Also the sensor position affect the result and finding out a recommended sensor position is so crucial.
To overcome this limitation, recent works have proposed to take the reliability of the Kinect data into account
in the correction process. Reliability can then be integrated into a lazy learning framework to reconstruct a
more reliable posture. Almost all the research shows that the results are quite promising . 3D modelling and
Machine Learning algorithms opens a new era in this field. but, we still need to buy Microsoft Xbox One
Kinect Sensor + Adapter for XBOX One S Windows Motion Controller to conduct the assessment which costs
around 300 USD which is not practical for small scale industries.
• Smart phones are now common to everyone and no need of any expert
• Latest image processing techniques can be deployed in the form of a smart phone application
• Employee just need to take a picture and the rest of the analysis is done by the smartphone itself
• They get instant result within their smartphone
10
2.2.2 Proposed Model
Using python, automate the process of extraction of information from an image to conduct ergonomics analysis
test. Image processing is the best way to solve the issue. There are different ways to the problem. We can use
image classification, detection or pose estimation. To perform each one, plenty of algorithms and open source
libraries are available, where ANN, CNN, RCNN, Fast RCNN, DNN etc are the important ones. This proposed
model consists of three main parts,
2. Detection of surrounding tools. Posture analysis can be automated by pose estimation model and neigh-
bouring tools such as mouse, monitor etc. can be detected by object detection model.
3. Third part is the extraction of relevant information about the posture from the image and conducting
the test.
11
Chapter 3
Awkward working will lead to different musculoskeletal disorders and it is necessary to take proper ergonomic
assessment in these fields. Proper benchmarking should be there in rectify this. To go further many assessment
tools are which ergonomic researchers use. For example, RULA, REBA and OWAS etc. These tests are
conducted over a particular working posture and a final test score is published which in turn gives some
insights related to the severity related to that posture.
• Provide a method of screening a working population quickly, for exposure to a likely risk of work-related
upper limb disorders
• Identify the muscular effort which is associated with working posture, exerting force and performing static
or repetitive work, and which may contribute to muscle fatigue
• Give results which could be incorporated in a wider ergonomics assessment covering epidemiological,
physical, mental, environmental and organizatioral factors
2. Development of the system for grouping the body part posture scores
3. Development of grand score and action list
12
3.1.1 Phase 1
The whole body was divided into two Group A and Group B. Group A consists of upper arm, lower arm and
wrist. Group B consists of neck, trunk and leg. Different body parts are assigned with some individual score
ie. number 1 is given to the range of movement or working posture where the risk factors present are minimal,
higher numbers are allocated to parts of the movement range with more extreme postures.
Figure 3.1 shows the diagrams for scoring the posture of the body parts in Group A, which are the upper arm,
lower arm and wrist, with a section to record the pronation or supination occurring. Similarly for Group B,
scoring scheme is explained in the figure 3.2.
As RULA can be conducted quickly, an assessment can be made of each posture in the work cycle. When
using RULA, only the right or left side is assessed at a tirne. After observing the operator it may be obvious
that only one arm is under load; however, if undecided, the observer would assess both sides.. RULA can be
extended to more detailed examination. Most noticeably, the postural assessment of the fingers and thumb
may be required in some investigations where exposure to risk factors is high for these digits. RULA does not
include such detail, although any force exerted by the fingers or thumb is recorded as part of the assessment
process.
3.1.2 Phase 2
A single score is required from the Groups A and B which will represent the level of postural loading of the
musculoskeletal system due to the combined body part. Muscle use and force scores A scoring svstem was
developed to include the additional load to the musculoskeletal system caused by excessive static muscle work,
repetitive motions and the requirement to exert force or maintain an external load while working. Total score
for both Group A and Group B can be calculated as follows
Total Group A score = Score in Table A + Muscle score + Force/Load score
Total Group B score = Score in Table B + Muscle score + Force/Load score
Muscle score and Force/Load score are the external risk factors which will cause fatigue and subsequent tissue
damage is dependent upon the time that the operators. RULA provides a simplified and conservative rating
system to be used as a guide to indicate whether these risk factors are present. It would be the function
of a subsequent more detailed assessment to establish their extent and effect on the operator’s wellbeing
and work. The muscle use is defined as repetitive if the action is repeated more than four times a minute.
This is acknowledged as a conservative general definition from which a risk may be present; however, further
assessment would be required. If the load or force is 2 kg or less and held intermittently then the score is 0. If
the intermittent load is 2-10 kg a score of 1 is given. If the load of 2-10 kg is static or repeated the score is 2.
The score is also 2 if the load is intermittent but more than 10 kg. Lastly, if the load or a force of more than
10 kg is experienced statically or repeatedly, the score is 3. If a load or force of any magnitude is experienced
with rapid build-up or a jolting action the score is also 3.
3.1.3 Phase 3
In this phase total scores from both groups are used to get overall RULA score according to the Table C. The
inference from the grand score can be as follows.
13
Figure 3.1: RULA Table A
14
Figure 3.2: RULA Table B
15
Figure 3.3: RULA Table C
This will reduce fatigue and strain on the neck, shoulders, back and legs. All users should trial different
positions to work out the best set up for themselves An acceptable and well supported seated position means:
• the head and neck are in a forward facing and midline position (i.e. no backward arching of the neck or
forward extension of the chin)
16
• the shoulders are relaxed and symmetrical and elbows slightly closer to the side of the body
• • using the preferred keying posture, depending on the style of keying used (i.e. traditional style or with
forearm support
• the back is supported by the chair backrest. The curved lower part of the backrest should fit into the
lower back or the lumbar curve
• having an open angle of 100-120 degrees (slightly more than a right angle) at the hip. This can be
achieved by adjusting the seat pan tilt and the backrest
Figure 3.4 below illustrates acceptable sitting position at the workstation. It allows for well supported postures
that can be changed within a comfortable range throughout the day.
17
Chapter 4
Image Processing
4.0.2 Convolution
Assume that the input to the CNN be coloured image of size 32X32 pixels of width and height. Since it is a
coloured image, have 3 more additional channels which contains information about density of primary colours
(RGB). So the input can be represented by a three dimensional matrix of size 32X32X3.
18
Figure 4.2: CNNN input size visualization
Adding one additional neuron to the hidden layer becomes 4 dimensional from 3 dimensional input ie. 32X32X3X2
parameters. So total 6144 parameters needed connect between input to just only two nodes. And two neurons
are not enough for any useful processing of image. We can connect the input image to the neurons in the
next layer with exactly the same values for the height and width. This network can be applied for the edge
detection. So the number of weight connections becomes 32X32X3 by 32X32.
It is a good idea for looking at local regions instead of looking the the whole image and it is more efficient way.
Below given figure 4.3 shows the regional connection for the next layer ie. only a part of the previous layer.
Thus, if we want to have 32X32 neurons in the next layer, then we will have 5X5X3 by 32X32 connections
which is 76,800 connections (compared to 3,145,728 for full connectivity).
This shows number of parameters are drastically dropped. Next assumption for simplification, is keep the local
connection weights fixed for entire neuron of the next layer. It again reduces number of parameters and no: of
weights to just 5x5x3 equals to 75. One of the advantages to these assumptions is no: of connections decreases
from around 3 million to just 75 connections. And also fixing the local connection weights is similar to sliding
window of size 5x5x3. It provides an opportunity to detect and recognize features regardless of their positions
in the image. This is the reason why they are called convolutions.
Figure 4.4 shows the effect of convolution matrix for a window of size 3x3. This figure explains edge detection in
image processing. These sliding windows are also called filters as it acts like classic filter (here the filter extracts
the edges in an image. However, in the convolutional neural network these filters are initialized, followed by
the training procedure shape filters, which are more suitable for the given task. Adding more and more layers
are actually more beneficial in feature extraction. Each layer corresponds to separate filter matrix and there
fore different features can be taken out from an image. In the figure another filter looks at the same part of
the input image.
19
Figure 4.4: CNN Filter visualization
4.0.3 Stride
CNN has more opportunities for reducing the number of parameters and at the same time it is able to reduce
the side effects. In the previous example, it is simply assumed that the next layer’s node has lots of overlaps
with their neighbours by looking at the regions and we can control the degree of this overlapping by controlling
the number of strides. For example the figure 4.5 shows a 7x7 image, If we move the filter one node every time,
we can have a 5x5 output only. Note that the output of the three left matrices have an overlap. if we move
and make every stride 2, then the output will be 3x3. Put simply, not only overlap, but also the size of the
output will be reduced.
For an image of NXN dimension, filter of size FXF and S being the number of strides then the output size O
can be calculated as
N −F
O =1+
S
4.0.4 Padding
In CNN the convolution step causes loss of information at the border of the image. Border pixels are only
captured when the filter slides, they never have the chance to be seen. Append with zeros is one of the solution
for this and is known as zero padding. Zero padding also manages the output size.
For example if N=7, F=3 and stride = 1 then the output size will be 5x5. If we are introducing zero padding
the output remains 7x7 (same as that of input image size). Then the modified formula for output size including
zero padding becomes
N + 2P − F
O =1+
S
20
where P is the number of layers of zero padding. This padding idea helps us to prevent network output size
from shrinking with depth. Therefore, it is possible to have any number of deep convolutional networks.
Convolution can be explained by the given formula for each pixel in the output
XX
(x ∗ w)[i, j] = x[n.m]w[i − m, j − n]
n t
This equation can be visualized by the figure 4.8 given below. The element by element product of the input
and kernel is aggregated, and then represents the corresponding point in the next layer.
21
Figure 4.8: Visualization of Convolution Formula
Layer of non linearity is the layer just after the convolution. This layer is applied in order to saturate the
output or limiting the generated output. Sigmoid was the popular non linear function used for many years
which is described by the equation
1
Sigmoid(x) =
1 − exp−x
Hyperbolic tangent function also used
Now lot of non linear functions are used but Rectified Linear Unit (ReLU) became more popular due to the
following reason. ReLU(x) equals to 0 if x<0 otherwise ReLU(x) equals to x itself. Derivative of ReLU function
equals to 0 if x<0 otherwise equals to 1.
• The ReLU creates a sparser representation. because the zero in the gradient leads to obtaining a complete
zero. However, sigmoid and tanh always have non-zero results from the gradient, which might not be in
favor for training.
22
4.0.5.3 Pooling
Pooling is actually down sampling the output after performing non linear function and reducing complexity. It
is nothing but reducing the resolution. Pooling does not affect the number of filters. Max pooling and Average
pooling are some example of pooling. Max pooling partitions the image to sub region rectangles, and it only
returns the maximum value of the inside of that sub-region. 2x2 max pooling is the commonly used one. when
pooling is performed in the top-left 2X2 blocks (pink area), it moves 2 and focus on top-right part. ie. stride
of 2 is used here
It should not preserve position of information. Therefore, it should be applied only when the presence of
information is important (rather than spatial information).
It is similar to the neurons in classic neural networks, that are arranged in a single line. So each node of in a
fully-connected layer is directly connected to every node in both the previous and in the next layer. Each of the
nodes in the last frames in the pooling layer are connected as a vector to the first layer from the fully-connected
layer. These are the most parameters used with the CNN within these layers, and take a long time in training.
But the a lot of parameters that need complex computational in training examples.
23
worthwhile to release trained models along with the paper advertising their performance. But trained models
alone are not sufficient for rapid research progress and few toolboxes offer truly off-the-shelf deployment of
state-of-the-art models—and those that do are often not computationally efficient and thus unsuitable for
commercial deployment. Caffe is introduced to mitigate these issues. Caffe, a fully opensource framework that
affords clear access to deep architectures. The code is written in clean, efficient C++, with CUDA used for GPU
computation, and nearly complete, well-supported bindings to Python/Numpy and MATLAB. Caffe adheres
to software engineering best practices, providing unit tests for correctness and experimental rigor and speed
for deployment. It is also well-suited for research use, due to the careful modularity of the code, and the clean
separation of network definition (usually the novel part of deep learning research) from actual implementation.
While Caffe was first designed for vision, it has been adopted and improved by users in speech recognition,
robotics, neuroscience, and astronomy. We hope to see this trend continue so that further sciences and industries
can take advantage of deep learning.
4.1.1.1 Modularity
The software is designed from the beginning to be as modular as possible, allowing easy extension to new data
formats, network layers, and loss functions. Lots of layers and loss functions are already implemented, and
plentiful examples show how these are composed into trainable recognition systems for various tasks.
Caffe model definitions are written as config files using the Protocol Buffer language. Caffe supports network
architectures in the form of arbitrary directed acyclic graphs. Upon instantiation, Caffe reserves exactly as
much memory as needed for the network, and abstracts from its underlying location in host or GPU. Switching
between a CPU and GPU implementation is exactly one function call.
Every single module in Caffe has a test, and no new code is accepted into the project without corresponding
tests. This allows rapid improvements and refactoring of the codebase, and imparts a welcome feeling of
peacefulness to the researchers using the code.
For rapid prototyping and interfacing with existing research code, Caffe provides Python and MATLAB bind-
ings. Both languages may be used to construct networks and classify inputs. The Python bindings also expose
the solver module for easy prototyping of new training procedures.
24
4.1.1.5 Pre-trained reference models
Caffe provides (for academic and non-commercial use—not BSD license) reference models for visual tasks,
including the landmark “AlexNet” ImageNet model with variations and the R-CNN detection model. More are
scheduled for release. We are strong proponents of reproducible research: we hope that a common software
substrate will foster quick progress in the search over network architectures and applications.
4.1.2 Architecture
Caffe stores and communicates data in 4-dimensional arrays called blobs. Blobs provide a unified memory
interface, holding batches of images (or other data), parameters, or parameter updates. Blobs conceal the
computational and mental overhead of mixed CPU/GPU operation by synchronizing from the CPU host to the
GPU device as needed. In practice, one loads data from the disk to a blob in CPU code, calls a CUDA kernel to
do GPU computation, and ferries the blob off to the next layer, ignoring low-level details while maintaining a
high level of performance. Memory on the host and device is allocated on demand (lazily) for efficient memory
usage.
Models are saved to disk as Google Protocol Buffers, which have several important features: minimal-size
binary strings when serialized, efficient serialization, a human-readable text format compatible with the binary
version, and efficient interface implementations in multiple languages, most notably C++ and Python.
A Caffe layer is the essence of a neural network layer: it takes one or more blobs as input, and yields one or more
blobs as output. Layers have two key responsibilities for the operation of the network as a whole: a forward pass
that takes the inputs and produces the outputs, and a backward pass that takes the gradient with respect to the
output, and computes the gradients with respect to the parameters and to the inputs, which are in turn back-
propagated to earlier layers. Caffe provides a complete set of layer types including: convolution, pooling, inner
products, nonlinearities like rectified linear and logistic, local response normalization, elementwise operations,
and losses like softmax and hinge. These are all the types needed for state-of-the-art visual tasks. Coding
custom layers requires minimal effort due to the compositional construction of networks.
In the given figure 5.1 for caffe frame work blue boxes represent layers and yellow octagons represent data blobs
produced by or fed into the layers.
Caffe does all the bookkeeping for any directed acyclic graph of layers, ensuring correctness of the forward
and backward passes. Caffe models are end-to-end machine learning systems. A typical network begins with
a data layer that loads from disk and ends with a loss layer that computes the objective for a task such
as classification or reconstruction. The network is run on CPU or GPU by setting a single switch. Layers
come with corresponding CPU and GPU routines that produce identical results (with tests to prove it). The
CPU/GPU switch is seamless and independent of the model definition
25
Caffe trains models by the fast and standard stochastic gradient descent algorithm. Figure 5.1 shows a typical
example of a Caffe network training: a data layer fetches the images and labels from disk, passes it through
multiple layers such as convolution, pooling and rectified linear transforms, and feeds the final prediction into
a classification loss layer that produces the loss and gradients which train the whole network. This example is
found in the Caffe source code at lenet-train.prototxt. Data are processed in mini-batches that pass through
the network sequentially. Vital to training are learning rate decay schedules, momentum, and snapshots for
stopping and resuming, all of which are implemented and documented.
The purpose of BLOB extraction is to isolate the BLOBs (objects) in a binary image. As mentioned above,
a BLOB consists of a group of connected pixels. Whether or not two pixels are connected is defined by the
connectivity, that is, which pixels are neighbors and which are not. The two most often applied types of
connectivity are illustrated in figure 6.2. The 8-connectivity is more accurate than the 4-connectivity, but the
4-connectivity is often applied since it requires fewer computations, hence it can process the image faster. The
effect of the two different types of connectivity is illustrated in figure, where the binary images contain either
one or two BLOBs depending on the connectivity.
26
Figure 4.12: Blob connectivity
A number of different algorithms exist for finding the BLOBs and such algorithms are usually referred to as
connected component analysis or connected component labeling. In the following we describe one of these
algorithms known as the Grass-fire algorithm. We use 4-connectivity for simplicity.
27
Figure 4.13: 4 connectivity
This is outside the image and therefore not an object pixel. All the neighbors of (4, 0) have now been
investigated and the algorithm therefore traces back and looks at the second neighbor of (3, 0), namely (3,1).
This is an object pixel and is therefore labeled 1, burned and becomes the new focus of attention. In this way
the algorithm also finds (3,2) to be part of object 1 and finally ends by investigating the fourth neighbor of
(2, 0). This is outside the image and therefore not an object pixel. All the neighbors of (4, 0) have now been
investigated and the algorithm therefore traces back and looks at the second neighbor of (3, 0), namely (3,1).
This is an object pixel and is therefore labeled 1, burned and becomes the new focus of attention. In this way
the algorithm also finds (3,2) to be part of object 1 and finally ends by investigating the fourth neighbor of
(2, 0). All pixels which are part of the top object have now been labeled with the same label 1 meaning that
this BLOB has been segmented. The algorithm then moves on following the scan path until it meets the next
object pixel (1, 3), which is then labeled 2, and starts a new grass-fire.
28
X and Y coordinate for each keypoint. 3D pose estimation works to transform an object in a 2D image into a
3D object by adding a z-dimension to the prediction. But here I am doing 2D pose estimation to examine the
working posture.
In this project pose estimation is used for identifying key body joints such as head, neck, chest, both shoulders
and wrist etc. total of 15 such key joints are identified using this 2D pose estimation. Lot of image datasets
along with features annotation are available. COCO, MPII are some example. Here I am using MPII dataset.
Head 0
Neck 1
Right Shoulder 2
Right Elbow 3
Right Wrist 4
Left Shoulder 5
Left Elbow 6
Left Wrist 7
Right Hip 8
Right Knee 9
Right Ankle 10
Left Hip 11
Left Knee 12
Left Ankle 13
Chest 14
29
white rectangle from sum of pixels under black rectangle.
Now all possible sizes and locations of each kernel is used to calculate plenty of features. (Even a 24x24 window
results over 160000 features). For each feature calculation, we need to find sum of pixels under white and black
rectangles. To solve this, they introduced the integral images. It simplifies calculation of sum of pixels, how
large may be the number of pixels, to an operation involving just four pixels. It makes things super-fast.
But among all these features we calculated, most of them are irrelevant. For example, consider the image
below. Top row shows two good features. The first feature selected seems to focus on the property that the
region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on
the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks
or any other place is irrelevant. Best features are selected by using Adaboost.
30
Figure 4.15: Haar feature extraction
For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold
which will classify the faces to positive and negative. But obviously, there will be errors or misclassifications.
We select the features with minimum error rate, which means they are the features that best classifies the face
and non-face images.
It is already mentioned that a 24x24 window results more than 16000 features, but just applying 6000 features
to it is so time consuming. For this Paul Viola and Michael Jones introduced the concept of Cascade of
Classifiers. Instead of applying all the 6000 features on a window, group the features into different stages
of classifiers and apply one-by-one. (Normally first few stages will contain very less number of features). If
a window fails the first stage, discard it. We don’t consider remaining features on it. If it passes, apply the
second stage of features and continue the process. The window which passes all stages is a face region. Authors’
detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in first five stages. (Two features
in the above image is actually obtained as the best two features from Adaboost). According to authors, on an
average, 10 features out of 6000+ are evaluated per sub-window.
31
Chapter 5
Project Work
32
appropriate one. Along with these information RULA test can be started. In python Table A, B and C are
replicated and from which the Group score and final REBA score can be estimated. In addition to that severity
level and recommendations are given as the final output.
33
Chapter 6
The pose estimation gives an output along with the position of all 15 joints with pretty much decent accuracy
for any given pose. Haar cascade model is trained by using a set of positive and negative images and a model
is obtained as cascade.xml file which is ready to deploy. Using this model monitor is detected and effectively
track the position in that given image. Haar detection for eye is also used. This generates individual position
for monitor and eye contained in the single image and according to their relative position, makes inference on
it. The GUI made from Python helps the user to give additional external influencing factor from a dropdown
menu. This makes more user friendly approach. By incorporating all these information, rest of the RULA
process is automated using Python. For this particular posture, total RULA score is 4. this implies that the
Risk level is low and improvement in the posture can be done but not necessary.
34
Figure 6.2: Positive and Negative images for dataset preparation
35
Figure 6.4: Python GUI for addition information
A random image is choosen (left) and the pose estimation is carried out, results (right) are shown in Figure
10.5.
36
Figure 6.6: Final Rula Result conducted for computer workplace posture
From the summary it can be seen that wrist position is having a score of 3 which is caused by the inclination
caused by the wrong posture. Also upper and lower arm posture contribute a score of 2. The summary shows
that these body parts need to be changed. From the Group B section, neck is contributing a score of 3, which
is also an area where investigation is neded.
37
Chapter 7
Conclusions
7.1 Conclusions
With the help of image processing, posture analysis part of ergonomics assessment is automated and it simplifies
the procedure. With this advancement we just need to take an image of employee in his/her normal working
posture and rest of the assessment procedure is done automatically and the output results the action level of
current working procedure so that we can take necessary action to avoid different Musculoskeletal disorders.
Hence to conduct a cost-effective workplace assessment we don’t have to be an ergonomics expert. So this is
suitable for small scale industries and can conduct the test effortlessly.
7.2 Benefits
By this method user need to just take a picture of a particular work posture for example in the figure 10.5,
and also need to choose suitable one from the dropdown list as shown in the figure 10.4. Rest of the thing is
fully automated. A complete summary is given after processing the posture. In this method user don’t need
to by any expensive sensors and don’t need to worry about their placement over the body, no need hiring any
ergonomics expert for assessment. Pose estimation used here is giving very accurate results in predicting the
different body joints, hence the overall results are reliable
38
Bibliography
[1] Mr.Ganesh S.Jadhav and Mr.Gurunath V Shind Ergonomic Evaluation Tools RULA and REBA Analysis:
Case study
[2] N. A. Ansari, Dr. M. J. Sheikh. IOSR Evaluation of work Posture by RULA and REBA: A Case Study. .
IOSR Journal of Mechanical and Civil Engineering (IOSR-JMCE) e-ISSN: 2278-1684,p-ISSN: 2320-334X,
Volume 11, Issue 4 Ver. III (Jul- Aug. 2014), PP 18-23 www.iosrjournals.org
[3] Zahra Sedighi Maman a, Mohammad Ali Alamdar Yazdi a, Lora A. Cavuoto b, Fadel M. Megahed. Z.
Sedighi Maman et al A data-driven approach to modeling physical fatigue in the workplace using wearable
sensors Applied Ergonomics 65 (2017) 515e529
[4] Pierre Plantard and Edouard Auvinet Pose Estimation with a Kinect for Ergonomic Studies: Eval-
uation of the Accuracy Using a Virtual Mannequin DOI: 10.3390/s150101785 Musculoskeletal disor-
ders: OWAS review. Marta Gómez-Galán, José Pérez-Alonso. Industrial Health 55(4) · May 2017. DOI:
10.2486/indhealth.2016-0191.
[5] Farshad Soheilifard. Amir Rahbar Ergonomic investigation of workers in tea factories using REBA and
OWAS methods-case study Agricultural Engineering International : The CIGR e-journal 19(3) · October
2017.
[6] Nipun D. Natha, Theodora Chasparib Automated ergonomic risk monitoring using body-mounted sensors
and machine learning https://doi.org/10.1016/j.aei.2018.08.020
[7] Megh Doshi. Harsh Shah, Heetika Gada Wearable DAQ(Data Acquisition System) for measurement
of R.U.L.A.(Rapid Upper Limb Assessment) rating of vehicles IEEE Xplore Compliant Part Number:
CFP18K74-ART; ISBN:978-1-5386-2842-3
[8] Alberto Ranavolo 1,*, Francesco Draicchio 1, Tiwana Varrecchia 2, Alessio Silvetti 1 and Sergio Iavicoli.
Int. J. Environ. Res Wearable Monitoring Devices for Biomechanical Risk Assessment at Work: Current
Status and FutureChallenges
[9] Pierre Plantard a, b, , Hubert P.H. Shum c, Anne-Sophie Le Pierres a, Franck Multon. Plantard, P., et
al Validation of an ergonomic assessment method using Kinect data in real workplace conditions Public
Health 2018, 15, 2001; doi:10.3390/ijerph15092001
[10] Manghisi, V.M., et al Real time RULA assessment using Kinect v2 sensor, Applied Ergonomics Applied
Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2017.02.015
[11] Darius Nahavandi and Mohammed Hossny Skeleton-free RULA ergonomic assessment using Kinect sensors
Intelligent Decision Technologies 11 (2017) 275–284 DOI 10.3233/IDT-170292 IOS Press
39
[12] Saad ALBAWI , Tareq Abed MOHAMMED and Saad AL-ZAWI Understanding of a Convolutional Neural
Network 978-1-5386-1949-0/17/31.00 c 2017 IEEE
[13] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio
Guadarrama, Trevor Darrel Caffe: Convolutional Architecture for Fast Feature Embedding UC Berkeley
EECS, Berkeley, CA 94702 jiayq,shelhamer,jdonahue,sergeyk,jonlong,rbg,sguada,[email protected]
[14] Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon1, Yaser Sheikh1
Single-Network Whole-Body Pose Estimation arXiv:1909.13423vl [cs.CV] 30 Sep 2019
[15] Paul Viola, Michael Jones Rapid Object Detection using a Boosted Cascade of Simple Features CONFER-
ENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001
[16] M.F. Ghazali, M. Mat Salleh, N. Zainon, S. Zakaria1 and C. D. M. Asyraf RULA and REBA Assess-
ments in Computer Laboratories National Symposium on Advancements in Ergonomics and Safety (ER-
GOSYM2009), 1-2 December 2009, Perlis, Malaysia
[17] Ergonomic guide to computer based workstations PN 11334 Version 1 Last updated August 2012
40