Informatics in Medicine Unlocked 26 (2021) 100717

Research on feature mining algorithm and disease diagnosis of pulse signal

based on piezoelectric sensor☆,☆☆
Fan Lin a, b, Jincheng Zhang a, *, Zhongmin Wang a, b, Xiaokang Zhang a, Ruiling Yao a, Yan Li a
School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an, Shaanxi, 710121, China
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an University of Posts and Telecommunications, Xi’an, Shaanxi, 710121, China


Keywords: The human pulse contains various information reflecting the internal environment of the human body. However,
Pulse diagnosis the classical method of pulse diagnosis in traditional Chinese medicine (TCM) has the disadvantages of relying
Pathological feature mining too much on the doctor’s experience and the diagnosis result is too subjective. Based on the principle of TCM
Physiological signal feature
pulse diagnosis, the use of photoelectric sensors to collect the pulse signals of multiple healthy people and pa­
Pulse wave analysis
tients with chronic diseases, and organize the detailed pulse information into a data set and analyze it with
algorithms, is a solution to overcome this problem through modern technology. However, this method is still
difficult to understand the patient’s physiological condition in detail, and it is also difficult to explain the internal
connection between abnormal pulse conditions and their physiological conditions. In the experiment, after
denoising, smoothing, and eliminating the baseline drift of the subjects’ pulse data, we designed two algorithms
to describe the difference between the two-dimensional images of the pulse data of normal people and patients
with chronic diseases. The specific feature values obtained are converted into a multi-dimensional array and
trained in a support vector machine (SVM) classifier. The classification accuracy is higher than the basic tem­
poral features. Experimental results show that it is feasible to use specific feature mining algorithms for disease
detection. Through analysis, this paper found the pathological characteristics reflected in the two-dimensional
pulse image, discovered the internal connection between the pulse waveform characteristics of the human
body and the disease, and tried to describe it through algorithms, trying to establish a method for detecting
specific diseases using photoelectric signals.

1. Introduction the data and extracting spatial features from the fit can effectively di­
agnose the diseases in a short time and at a low cost to some extent [10].
Pulse diagnosis is a very common physical diagnosis method in TCM, However, in the practice implementation of clinical diagnosis, the ac­
that is, the use of fingers to press the pulse to diagnose human diseases. curacy of pulse diagnosis depends heavily on the practitioner’s skills and
Before the popularization of modern medicine, wrist pulse diagnosis has experience. Different practitioners may not give identical results for the
always been the main method to diagnose diseases in TCM [1–3]. same patient [11,12]. The lack of high-precision testing equipment and
However, it is very subjective to diagnose pulse condition only by doc­ the recording and analysis of pulse data are inherent shortcomings of
tors’ fingers, so it is not easy to confirm the clinical holistic diagnosis ancient Chinese medicine, and its clinical diagnosis results are also
(CHD) of TCM systematically [4–8]. Pulse signals can be effectively used difficult to verify.
to analyze a person’s health status and reflect the pathological changes At present, the use of electronic equipment to detect pulse signals is
of a person’s physical condition [9]. As an important source on health the main way to combine modern TCM with computers. Moreover, the
status evaluation, the wrist pulse signal contains important information pulse signal can also reflect the physiological characteristics of the
about the status of the human body, fitting a bi-modal Gaussian model to subject [13,14]. The research of pulse diagnosis in TCM is to analyze the

* Corresponding author.
E-mail addresses: [email protected] (F. Lin), [email protected] (J. Zhang), [email protected] (Z. Wang), [email protected] (X. Zhang),
[email protected] (R. Yao), [email protected] (Y. Li).

pulse of the subject to judge the state of the patient. Using sensors and 2. Data processing and feature extraction of chronic diseases
computers to obtain more detailed information than the experience of
TCM practitioners is a technically feasible method, and machine 2.1. Pulse wave decomposition
learning techniques are exploited to analyze health conditions based on
the acquired pulse signals [15]. Pulse signal, as a biological signal with The research of this paper mainly uses photoelectric sensors to
great clinical diagnostic value, many sensors have been used for pulse measure, and the measured two-dimensional pulse change image will
signal acquisition, including pressure (selected in this paper), photo­ show a typical peak and trough structure (as shown in Fig. 1). After we
electric, electric pulse, and ultrasonic sensors [16–19]. Recently, the use decompose each part of the subject’s image, we can obtain the first-level
of effective signal processing technology to process the pulse pressure feature value that distinguishes the individual.
signal of the wrist has become a research hotspot. For effective methods As Fig. 1 described, pulse waves image of most people have three
that can be used for pulse signal analysis, there is the statistical analysis peaks(red circle at the picture) and three troughs(red triangle at the
of wrist pulse signals, multi-feature fusion, temporal and spatial feature picture) structure, which it can extract several characteristic indexes,
extraction, and independent factor (composite factor) analysis and such as peak value of the first wave peak (h1), the peak value of the
prediction, etc [20–24]. The basic idea of pulse wave analysis (PWA) is second wave peak (h2), the peak value of the third wave peak (h3), the
to use sensor equipment to collect human pulse biological information interval between h1 and h2 (Δta), the interval between h2 and h3 (Δtb)
accurately and without interference [25,26]. The waveform of these the three point-in-time of three troughs(t1, t2, t3). Fig. 1(b) is an
data can be used to extract the physiological characteristics of the test example of a human wrist pulse image eigen decomposition, but some
set. PWA can not only identify the physiological characteristics of in­ people’s pulse wave images do not fully conform to the above because of
dividuals but also can be used to analyze the differences between in­ different measurement methods and physiological status. A few exam­
dividuals caused by specific diseases [13,21]. PWA allows quantifying ples, such as decreased vascular elasticity due to advanced age or illness,
the changes in vascular impedance caused by arterial stiffness and/or patients with arrhythmia or weak heartbeat, and patients with some
endothelial dysfunction in patients with cardiovascular disease. For diseases affecting the circulatory system. By comparing the waveform of
example, a pulse oximeter sensor based on organic materials is used to patients, it is easy to find out the interference of waveforms and take
estimate the risk of disease by analyzing the photoelectric changes of the targeted classification.
blood vessels in the fingertips [27]. Alternative indicators of arterial However, according to our observations on a large amount of data,
stiffness and peripheral arterial resistance [28,29] are variables related not all the individual differences of subjects identified by this method
to the prognosis of cardiovascular disease and can be determined by are due to diseases. Most of the differences in pulse characteristics are
PWA. With technological innovation, the accuracy and data volume of due to the individual’s physiological characteristics, and the degree to
piezoelectric sensors will increase to make the results more reliable. A which the characteristic value is affected will often be affected. Greater
team uses a two-dimensional sensor designed by a piezoelectric array to than pathological features, these differences will make individual
collect pulse signals, which can perform omnidirectional and pathological features more difficult to find. Therefore, due to the
high-efficiency analysis of deformation. Real-time measurement can be different physiological characteristics of the subjects, the segmentation
achieved through a fast three-dimensional digital image correlation algorithm should consider the factors that affect the shape of the pulse
(3D-DIC) method, and finally applied to the PWA method [30,31]. In the waveform. Because the feature extraction algorithm of the experiment is
field of disease recognition, there is evidence that PWA is using pulse designed based on the subject’s two-dimensional waveform image data,
time-domain feature data images, pulse information extracted by in the process of disease recognition, individual physiological differ­
detecting the physiological state of the human body in disease classifi­ ences and the type of sensor used will affect the accuracy of the final
cation research, and can achieve high accuracy in the SVM classification classification result, so it must consider these factors before exper­
model, which can be used as a non-invasive, Reliable evaluation method imenting. Under this premise, we can distinguish which factors have
for cardiovascular disease [32]. However, Using this model for disease caused the difference in pulse waveform between patients with the
identification, the existing research does not take into account the disease and normal people.
interference caused by individual physiological differences. Cause of the As the age of the subject, the elasticity of blood vessels will gradually
different physiological characteristics of each person, the pulse wave­ decrease. This situation is reflected in the elder subject’s pulse image,
form will also be different due to age, gender, and activity [33]. For and the boundary between the systolic and diastolic pulses of their pulse
example, indices such as the augmentation index (AIx) and augmented will also become inconspicuous [28,33]. In addition, according to the
pressure (AP) extracted from the pulse wave of the wrist or proximal information reflected in our data, the heartbeat difference between the
right carotid artery differed between genders and ages [34–36]. Due to sexes will also reflect the different pulse waveform states. Females’
the influence of many factors, it is difficult to define the extracted fea­ hearts are more inclined to “beat”, and males are more inclined to
tures as disease features. This research takes these factors into account in “systole.” Specifically, the ratio of systolic and diastolic blood pressure
the classification algorithm through the comparison of each sample after in the pulse image of women to the entire pulse cycle is lower than that
feature extraction and proposes corresponding solutions. of men [32,34,35].
Therefore, this article first analyzes the characteristics of the data An important finding in this study and the existing works of litera­
graph generated by the data set. The two pathological feature extraction ture is that in the pulse cycle, the ratio of systolic to diastolic phase
methods designed in this paper are the stability index and the three- caused by gender is much higher in men than in women, which makes it
peaks index. These two sets of features are extracted by two algo­ inevitable to consider some physiological differences in pulse diagnosis.
rithms. The pathological characteristics of the subject’s pulse image will It is easier to find the individual physiological differences caused by
be reflected in the two sets of data, so the two sets of indices are used for gender and age. These factors are mainly reflected in the intensity of the
classifier identification. The SVM classification algorithm is used to pulse waveform and the shape of the systolic waveform. As Fig. 2(a)
calculate the extracted features, and finally, the detection of human illustrated, men’s pulse waveforms intensity is usually higher than
diseases can be realized under the premise of individual physiological women’s when other factors are the same, and similarly, the pulse
differences. Under this experimental goal, we also tried to use photo­ profile of the diastolic phase from young people is much clearer than
electric sensors to measure the pulse signal in pulse diagnosis and that of old people (because young people’s elasticity of blood vessels can
explore the internal relationship between the human pulse waveform cause the change of electric pulse waveform), which is the diastolic
distribution and pathological characteristics. phase of the elderly’s pulse waveforms lack of easily observable vaso­
constriction point as Fig. 2(b) illustrated.
Moreover, acupuncture points are also a factor, and different

Fig. 1. (a) Signal image difference between piezoelectric (upper) and photoelectric sensor (lower) (b) Characteristic value of the typical pulse period measured by
the photoelectric sensor (after wavelet denoising).

diagnosis. However, they often lack process, and they lack the analysis
of pulse graph data, and they also lack algorithms that can reflect
pathological differences into values. Although they have a much higher
depth and breadth of digging disease characteristics than TCM practi­
tioners, they are still very limited. Moreover, the feature value extracted
from the signal is limited to single feature data, and it is impossible to
quantify the characteristics of the pulse data reflected in the two-
dimensional image and the rate of change of its waveform. Therefore,
it is impossible to dig deeper into pathological information and use those
data for disease diagnosis. Its accuracy is also very limited.
Usually, the pulse signal is collected by the sensor because the pulse
signal can reflect the physiological characteristics of the subject, the
data returned by the sensor, the measurement of the human pulse can be
used as the basis for the physiological characteristics of the subject, and
the diagnosis of the disease. In the theory of TCM, different acupoints
reflect the pulse signals of different organs of the human body. There are
also differences in the shape and characteristic value of the pulse be­
tween different acupoints of the same person.
Fig. 4 is the standard process for this experiment, the core of this
experiment is data collection and process. Depend on their health status,
the subjects were divided into normal and pancreatitis patients. After
removing the baseline of the recorded data set, we collected the pulse
data extracting the features required for the algorithms, which must
have a certain distinction between the two subjects. We designed two
feature extraction schemes to calculate the results of the pulse dataset,
calculated and grouping all pulse waves data used for SVM
While measuring, controlling measurement errors is a step that
cannot be ignored. For instance, during the measurement, the subject’s
wrist may move or vibrate, which may lead to abnormal pulse waveform
Fig. 2. According to the existing medical research, the main individual dif­ oscillation. But the more systematic error is that the data analysis of
ferences of human pulse images are found. (a) The difference in waveform different subjects needs to consider factors such as measurement loca­
intensity is reflected by the different gender of subjects. (b) Pulse waveforms tion, temperature, subject status, equipment interference, and so on.
shape under the different ages (dicrotic notch), the old man on the left and the Perform horizontal comparisons under a unified standard. The key of the
young man on the right, the elderly subjects lack the trough of vasoconstriction positive experiment is how to extract the features which can distinguish
in pulse waveforms. the difference between the two pulse patterns, and design appropriate
algorithms to make these features more distinguishable.
acupuncture points reflect different images. In the pulse diagnosis the­
ory of TCM, the acupoints on the human wrist are usually divided into
three points: Cun, Guan, and Chi, reflecting the signals conducted by 2.3. Preparation and characterization of the dataset
different organs of the human body, and also corresponding to three
different pulse waveforms. In the measurement of the piezoelectric 2.3.1. Data acquisition
sensor, the three acupoints are measured separately to obtain three This article uses piezoelectric sensors and embedded devices to
different pulse waveform images (see Fig. 3). measure changes in arterial blood flow, connects to a computer and
records pulse data in the form of one-dimensional data, and then sam­
2.2. Research and experiment steps ples these data at regular intervals and converts them into data sets.
In order to verify our experimental method, we selected 57 subjects
There are already existing researches on extracting physiological (29 normal people and 28 patients) of different ages and physiques for
information features and using machine learning algorithms for disease pulse wave data collection. Take 25 effective pulse periods as a group for

2.3.3. Data images analysis

By using MATLAB software, all the collected one-dimensional data
are printed into charts in the form of two-dimensional coordinates, and
the pulse function of each cycle is superimposed, so that the distribution
range of the pulse function image can be directly observed.
We selected 30 groups of subjects (evenly distributed in age and
balanced in gender) with different numbers of pulse wave data and
divided them into groups according to disease. After processing these
data according to Fig. 4, all pulse images are grouped and clustered.
As a control group, first, by transforming the pulse data set into a
chart, we can observe the pulse changes of two kinds of people in the
most intuitive way. Superimpose all pulse waves images of the subject,
we can clearly observe that the pulse waves range of the detected person
in the form of the two-dimensional image, and analyzed the difference
between pancreatitis disease and normal people. Second, by observing
the range of abscissa 30 to 60, there are two very similar peaks in the
pulse waves image of patients with pancreatitis compare with the
normal people have only one higher peak.
The original data need to use the stability algorithm (refer to Equa­
tion (1)) for anomaly screening, that is, the stability value is significantly
greater than the mean value. And make sure the corresponding period
has been tailored, the principle is to find the fixed reference point of
each cycle (as Fig. 5(a) shown). It is convenient to observe the pulse data
distribution of this type of subject and find out the unique pathological
characteristics of patients with chronic diseases by superimposing the
multi-cycle pulse image signals. If the periodic distribution of the pulse
image is disordered, the pulse image that deviates from the reference
value can be manually adjusted horizontally and aligned with other
periodic images at the peak position.
The pulse waves of sixteen groups of normal people of four to twenty
groups superposition (Fig. 6(a)) have higher stability compare with
Fig. 6(b). By observing the last two-third of pulse waves images in 20
groups, which is 40 to 80 scale with the abscissa, the pulse waves of
pancreatitis patients have obvious disturbance phenomenon.
Two pulse forms are divided by a specific algorithm, sampling 40 to
Fig. 3. Examples of data acquisition instruments and measurement methods 80 scales of the pulse waves cycle. In the horizontal axis, the data is
(Above) The three acupoints of human pulse and the corresponding physio­
sampled at intervals of a short distance, and then the difference between
logical information in the theory of TCM (Bottom).
the distances of two adjacent sampling points (unit amplitude is ordi­
nate, millisecond time is abscissa) on the two-dimensional image co­
single pulse measurement, and record 5–25 groups for each patient with ordinates is taken (Refer to Fig. 7). Because the pulse waves image of
chronic diseases. Similarly, we choose healthy people to record 4–20 patients with pancreatitis is more vibration than that of normal people in
groups’ pulse wave data in the same way (the number of groups is less the control group, therefore, it can be used as an important indicator to
than the number of patients). divide the two subjects by SVM classification. The goal of making this
method more accurate is to set an appropriate sampling frequency to
2.3.2. Pulse waves processing make the entire partition the most effective. Generally, for pulse elec­
Before the data analysis step, only after the required pulse wave trical signals, the most suitable sampling frequency is between 30 Hz
image records are processed, can they be converted into a useable data and 90Hz [37], and our sampling frequency is set at about 40 Hz, the
set, that is, to remove the pulse baseline changes caused by the mea­ pulse image is the clearest.
surement or instrument algorithm, remove the noise in the original
waveform data, and extract Pulse wave curve.
2.4. Classification algorithm design
The principle of the baseline drift elimination algorithm is to
calculate the average period of the pulse of the subject. After the pulse
Based on the previous analysis and observations to visualize the
data is made into a two-dimensional graph, a point is selected at the
existing PWA data, one of the factors that caused the pathological dif­
peak position of the first pulse image, and its height is obtained, and
ference was discovered, namely the stability of the pulse waveform.
other pulses at that point have calculated the difference of the period
Therefore, a classification algorithm that can amplify the difference
position, and use this value to move the entire pulse period vertically.
between normal and pancreatic pulse wave images is designed, and can
We overlap the pulse images of each subject with multiple cycles and
effectively extract the value of the pulse image difference between the
store them in the same file as a sample for analysis. Therefore, we can
two types of cases.
analyze the pulse image of each subject more accurately. Secondly,
Take the black circle on the pulse waveform graph as the sampling
when the data is integrated, the entire data model is more visualized, so
point, and each black circle has its own height value. Select one of the
it is easier to observe the pulse distribution of each subject, making the
cyan pulse waveform lines for sampling, starting from 40 Hz and sam­
comparison between the normal person and the patient group clearer.
pling from the abscissa position backward (the interval between the
Based on a comparative analysis of massive data, this article makes a
abscissa of the sampling point is the sampling frequency). As shown in
preliminary comparison between the two groups with the largest dif­
Fig. 7, the absolute value of the height difference between two adjacent
ferences (pancreatitis and normal people) as test samples for SVM
sampling points is the absolute value difference. The instability of this
part of the line graph is reflected in the absolute value difference of the

Fig. 4. General flowchart of pulse diagnosis experimental research program and description of each step.

Fig. 5. (a) Select the datum points in the pulse images and align them to the unified baseline (b) The effect of denoised and smoothed the original pulse waveform.

adjacent sampling points (cyan curve stability Deviation from the

w ∗ Di + b < 0(∀Xi = − 1)
normal value, real experiments usually remove the extremely unstable (4)
w ∗ Di + b > 0(∀Xi = +1)
blue pulse line) We can use this idea to design an algorithm, then the
equation for calculating the instability of the pulse wave line is: In some cases, it makes sense to do this, and certain practices make it
an algorithm model that can distinguish normal people from patients

Di = | Xt+ni − ⃗ Xt+(n− 1)i |
2 with pancreatitis. On this basis, setting optimal sampling frequency can
i=0 maximize the utility of the whole classification method.

2 2
Also shown in Fig. 6(c), the data show that the control group of
= (xt+ni − xt+(n− 1)i ) + (yt+ni − yt+(n− 1)i ) patients with other chronic diseases will also have a double peak
structure, which is also applicable to this method. Compared with

n ∈ SF , Xi ∈ {(xt , yt ), (xt+n , yt+n ), …, (xt+N , yt+N )} (1)

normal people, there are two peaks and obvious concave arc structure in
The equation was recorded the change rate of the second half of each the image of patients with pulse pancreatitis (within the scope of the
pulse wave image of the same tester. Di represents the sampling point considered region). We design an algorithm to find the double-slit
variance of each pulse. Parameter Xi represents the absolute coordinate structure in the pulse image of patients with pancreatitis:
value of the sampling point at the abscissa position i. The subscript t of X
1 ∑N ∑
represents the starting position of the sampling points, and parameter n A= yt+ni Di = N (yt+ni − A)
can represent the sampling frequency SF. Different sampling frequencies N i=0 i=0
can not only get different results but also affect the accuracy of the n ∈ SF , Y ∈ {(yt ), (yt+n ), (yt+2n ), …, (yt+N )}
classification algorithm. Equation (1) calculated and recorded the
changes in the second half of each pulse wave line chart for each subject, t = max(Y), T = max(Cy t)
and as a dataset divided by SVM classification. Select the perceptron Equation (5) was designed by compared and analyzed the differences
strategy to divide the two kinds of subjects into SVM classification. between the two kinds of testers. First, calculate the mean values of the
To accurately determine whether a patient has certain chronic dis­ vertical coordinate height of each sampling point of every normal person
eases, it is usually necessary to extract multiple pathological features in the considered region (this area is located in the position of two
from the subject’s pulse data for judgment. An effective classification troughs to the left and right of the second wave peak, as Fig. 8 shown).
model is to extract multiple pathological characteristic values from Parameter Di recorded the difference between each sampling point of
several sets of pulse waveforms of subjects and represent them by multi- pancreatitis patients and the normal mean value A. Parameter t and T
dimensional data. Through our observation of massive sample data, we represent two peaks respectively in the considered region. The Di value
discovered a phenomenon. The data calculated by the method in the of patients with pancreatitis is usually lower than that of normal people.
PWA experimental process designed in this article is usually distributed This algorithm is used to calculate whether there are three peaks in the
in clusters. In various fields of machine learning, the core method of pulse image. The value of normal people must be higher than pancrea­
SVM can quickly and effectively classify massive multi-dimensional data titis patients (see Fig. 9).
when the data present a clustered distribution. Therefore, each group of
multi-dimensional data is converted into a data set for storage according 3. Disease classifier and results
to different recording times of different subjects, which can effectively
classify. 3.1. The calculation of pulse data
For a dataset:
D = {D1 , D2 , D3 , …, DN } (2) Based on the above algorithm designed for support vector machine
classification, the stability of each group of pulse data can be quantified.
Using the perceptron model and the perceptron strategy we designed Through this algorithm, the pulse wave image of the second half of each
to make the whole model linearly separable and calculated all of Di from tester is sampled at different frequencies to find the most suitable ac­
two subjects. Therefore, suppose that there is a hyperplane that can curacy to distinguish the two testers. In order to verify the accuracy of
divide sampling points variance of each pulse from pancreatitis and this method, we designed 20 groups of subjects as a training set, and
normal in D accurately into two sides of S, as follows: their pulse stability is as follows:
∃Π : w ∗ X + b = 0 (3) From Fig. 8, every value from each tester in two multiline lists all
reflect from Fig. 7, the number of pulse groups varies from 1 to 15, and
Making them into a linearly separable data set is like: each pulse image in each group is calculated by Equation (1), and

Fig. 6. (a) Pulse waves of sixteen groups of normal people of four to twenty groups superposition. (b) Pulse waves of four groups of pancreatitis people of five to
twenty-five groups superposition. (c) Pulse waves of three groups of patients with appendicitis (A), acute appendicitis (AA), duodenal ulcer (DBU).

sampling frequency n is 40 Hz, starting position i is located between The average value obtained by this algorithm will be slightly
wave peak h2 and trough t3 as the Fig. 2 shown. Two types of data from different. This is due to the increase of abnormal values caused by the
normal people and patients apply for SVM classification calculation (the oscillation of the pulse waveform. Using the given algorithm to segment
calculation accuracy can be adjusted by changing the starting position i and eliminate the abnormal pulse waveform, a more accurate value can
and sampling frequency n) in a certain proportion. be obtained.
The following twenty sets of data are randomly selected from all The Table 1 is calculated by using Equation (1), which are values
samples and used as a training set for classification testing. We bring the used to distinguish between normal and pancreatitis patients and
data of each set of randomly selected subjects into Equation (1), and Table 2 shows the detailed distribution of the stability index of the pulse
train the calculated values that reflect the stability of their pulse waveform of subject N4, and Table 1 shows the mean values of the pulse
waveforms. To avoid the bias in the calculation results caused by cycle images of the integration groups. This calculation of pulse waves
deliberate selection of data, each subject we select does not have a value can be used as indicators of the unitary classification algorithm.
specific number of groups, and the selected subjects are also completely However, to find the maximum partition plane and classify SVM, we
randomly selected from the entire sample. need to add another set of indexes for the SVM model to transform into a

Fig. 7. The pictorial diagram of one of pancreatitis patient pulse information (stability index, SI) extraction. Blue pulse wave line wave marked by a black dot is the
selected line chart of pulse wave. The black circle on the line chart is the sampling point (the sampling frequency in the diagram is 80 Hz).

Fig. 8. The comparison of pulse lines between pancreatitis and normal persons (concave structure index, CSI).

typical convex quadratic programming problem. By observing the pulse set for training for nuclear method classification.
image in Fig. 6, there another obvious feature can be found to distin­ The above data is the recognition accuracy rate of several sets of
guish the two types of testers. pulse data of four pancreatitis patients under different methods. Table 3
shows the classification success rate of the above two algorithms under
3.2. The classification results different division methods. To verify the classification accuracy of the
SVM kernel method, we will divide the two major features of SI and CSI
Through Equation (1) and Equation (5) could obtain two kinds of extracted from the pulse image with a linear division method, and use
calculated data and transform the quadratic programming problem in SVM to divide the two algorithms respectively and calculate the two sets
the SVM module into a dual problem. Two kinds of data as training of data. The result is converted into a two-dimensional array, and the
dataset satisfying.Karush–Kuhn–Tucker (KTT) condition, the training support vector machine and the kernel method of the support vector
dataset is converted to the standard unit features metric space. machine are used for classification, and the classification effect of the
We screened the pulse waveform data of several groups of normal experimental scheme is verified according to the accuracy of the clas­
people and patients with chronic diseases, selected each group of pulse sification result. Using multiple sets of pulse data of people with specific
cycle data that best reflected the characteristics of the subject, and diseases and normal people as the SVM training set, the classification
labeled whether they were sick or not, and used these data as a training accuracy shown in Table 3 can be obtained (the positive and negative

Fig. 9. Distribution of stability index (average value of photoelectric signal samples in each group) of normal and pancreatitis samples.

Table 1
The Calculated SI of Pulse Waves of Each Tester (part of).

values of the accuracy in the table are for the training set error). Judging people in Table 3 is 82% as the standard, and the test subject who is
from the classification results obtained from the training set we selected, higher than this value is judged as a disease patient, and this standard is
the results show that the accuracy of the SVM kernel method is signifi­ used as a disease diagnosis basis. We randomly sampled three sets of
cantly higher than the former, and can reach an accuracy of more than pulse data for all subjects (30 normal people and 30 disease patients)
95% or even higher. and applied a confusion matrix to analyze the success rate of the clas­
In this experiment, we use the classification algorithm code estab­ sifier (between normal people and four specific chronic diseases). As
lished by Python to process the MATLAB data set. The unique feature shown in Table 4, from the results of a single test, the success rate of this
mining algorithms of the paper can be used as a breakthrough in the classifier in identifying patients with four chronic diseases is 83.33%,
current pulse image recognition, and the purpose of these algorithms is 88.89%, 66.67%, 94.44%, and the overall misdiagnosis rate is 97.22%.,
mainly to determine whether the subject has a chronic disease. We 96.03%, 87.03%, 93.30%. Judging from the sample classification results
predict that the classification accuracy of the classification model be­ of existing test subjects, except for disease AA, as long as other diseases
tween chronic patients and normal people is usually higher than that of are monitored multiple times, an almost 100% diagnosis success rate can
normal people and normal people. According to the data rows of N1-16 be achieved. The result proves that the classification method is feasible
in Table 3, if the classification accuracy of the PW data of a subject is for the result of disease prediction.
higher or much higher than 81.5%, the risk of chronic disease can be We have added a new PWA-based disease recognition model to the
judged for the subject. Based on the currently selected subjects, if the standard pulse diagnosis experiment process, which can improve the
standard is set at 82%, then the model’s prevalence recognition rate for success rate of disease recognition, but whether this model can be used
randomly selected patients with chronic diseases is 100%. in other special population individuals (such as athletes, children, the
Then, the classification success rate between normal and normal elderly, obesity) To get more extensive verification, more experimental

Table 2
The calculation of SI of pulse waves of tester N4.
Avg. 182.2439471 125.1787536 121.2582868 119.2583642 88.64006496 126.9122154 123.9768548

The Calculation of Each Pulse Waves 243.4307764 89.4746307 147.4907703 90.3968228 73.65077466 144.7883605 130.0546737
154.4141778 203.9817782 113.7453793 80.56851777 76.10171886 104.3149293 114.8419646
163.0019556 98.768238 115.5987904 129.9065796 121.1505807 137.7865722 121.6705657
176.7413977 108.523761 101.4687908 117.2729878 70.76725107 140.1592647 113.1422266
182.6170942 115.8351541 113.8500397 106.9658541 85.4812433 104.7098793 114.8424235
114.6728744 113.6551208 155.6160473 117.9268775 77.0621336 175.0761616 125.1233579
104.980123 80.4526249 72.262149 98.6220622 94.4377 96.2729062 92.6679422
224.8037486 139.8371943 143.5160426 99.8817306 121.8542903 100.2172017 132.7198609
194.9709663 64.0655696 125.5684841 67.9219226 75.364019 135.1276061 112.124694
177.7616218 143.7572835 150.8910187 88.7445184 79.4428787 78.3096277 119.1937235
211.2481987 120.9563947 101.7888941 145.3608918 76.4314924 118.8401516 131.4446227
130.282267 110.382883 104.6147982 167.8756261 75.10691955 100.7401093 117.215563
187.442179 158.5483736 101.3721552 180.6183058 145.4863357 106.5292259 140.5061528
227.54534 134.6337038 155.0842084 180.9914472 89.97892348 194.804423 150.6289353
140.4671416 199.0331473 161.6320331 120.8065387 59.1087315 128.5514876 128.7167642
139.572658 120.9541999 132.8602114 120.8386309 85.60335796 165.2340945 122.0032128
324.1945802 109.7607738 112.692878 99.8527535 126.0456608 140.7098476

Pulse values of one of the normal subjects, some abnormal data are excluded.

Table 3
The division accuracy achieved by various Linear and SVM algorithms with normal people.
Tester SI by Linear Classification CSI by Linear Classification SI by SVM CSI by SVM Two calculated index by SVM kernel method

P1 0.8793 0.931 0.895 ± 0.02 0.954 ± 0.03 0.9827 ± 0.002

P2 0.7586 0.8668 0.801 ± 0.02 0.91 ± 0.05 0.9791 ± 0.003
P3 0.7414 0.6034 0.754 ± 0.01 0.62 ± 0.01 0.9112 ± 0.02
P4 0.9138 0.9483 0.912 ± 0.01 0.947 ± 0.01 0.9962 ± 0.001
A1 0.8675 0.9295 0.895 ± 0.01 0.963 ± 0.04 0.9879 ± 0.003
AA1 0.7123 0.6221 0.718 ± 0.02 0.623 ± 0.01 0.8612 ± 0.003
AA2 0.9345 0.9574 0.887 ± 0.01 0.961 ± 0.01 0.9954 ± 0.002
DBU1 0.6842 0.5891 0.648 ± 0.09 0.733 ± 0.15 0.8524 ± 0.008
DBU2 0.7284 0.6963 0.791 ± 0.05 0.786 ± 0.08 0.9258 ± 0.004
N1-16(mean) 0.598 0.624 0.652 0.759 0.8148

designed in this essay is how to detect subjects with multiple chronic

Table 4
diseases. We selected the group with the disease that has been confirmed
Recognition success rate of multiple groups of pulse data of four diseases.
to have pancreatitis, but whether the test subjects with worse health
Real Condition Disease Type status or two chronic diseases can also achieve this accuracy is unknown.
Pancreatitis Appendicitis Acute Duodenal As Fig. 6(b) shown, the pulse waveforms from the third pancreatitis
Appendicitis Ulcer patient group could be a person with more than two diseases, this affects
Yes No Yes No Yes No Yes No the accuracy of the classification algorithm, it has become a factor we
Healthy 0 90 1 89 8 82 3 87
have to consider. On the other hand, what is more, difficult is that even if
Chronic Disease 15 3 32 4 12 6 17 1 our model is only a simple dichotomy, it has a high accuracy rate for
patients with specific chronic diseases, but it is also a difficult point for
the subjects with multiple chronic diseases to detect which chronic
subjects and data verification are needed. But using the data obtained by diseases they are suffering from. Usually, patients who need a quick
the above two algorithms, combined with the SVM kernel method for check don’t know their health.
classification, the model can already achieve high accuracy. If you need Besides, among all the factors affecting the shape of pulse waveforms
to further optimize the algorithm, please change the parameter n in image, age is a very difficult factor to overcome. As one ages the elas­
Equation (1) to a more suitable value, and perform more algorithm ticity of the blood vessels will substantially decrease, and the strength of
optimizations on the SVM classification model to obtain higher classi­ his heartbeat will weaken, Their pulse waveforms image will be very
fication accuracy and two calculation indicators. different from the standard pulse waveform image of the human body.
Although the data indicate that the method described in Fig. 4 is This greatly increases the difficulty of the disease monitoring method
feasible to detect three chronic diseases as pancreatitis, appendicitis, proposed in this paper to determine whether the subjects have a certain
and duodenal ulcer. However, no matter what classification method is disease. Different diseases will have different pulse waveforms image
used, it will be greatly affected by individuals with special physiological manifestation, the above two algorithms may not have the same effect
conditions (such as children, the elderly, patients with arrhythmia, and on other chronic diseases. In fact, in the detection of each disease, it is
other factors that cause abnormal heart rate). Therefore, how to opti­ necessary to design a set of different algorithms according to the pulse
mize the algorithm, make some people with abnormal heart rate and waveform image characteristics of each disease, to extract the feature
special pulse waves also maintain high accuracy, the further solutions values that can be classified by the SVM. Although, as the number of
are needed to explore. algorithms increases, the accuracy of disease monitoring will also in­
crease, the detection time and computational resources consumed will
3.3. The difficulties and limitations in implementing this application also increase. How to choose a balance between the two is an aspect that
we need to consider for a long time.
First of all, the biggest problem of the disease detection method Finally, the scheme designed in this paper only verifies that it is

feasible to distinguish between normal people and several diseases we interests or personal relationships that could have appeared to influence
mention above. But to complete the diagnosis of patients with other the work reported in this paper.
diseases, this still needs more testers and further experimental confir­
mations. I believe that in the future, according to the experimental ideas Acknowledgment
of this paper, we can design a prototype machine of PWA to complete
disease diagnosis, to verify whether the pulse diagnosis theory of TCM is We are grateful to Zhongmin Wang for the assistance with the ex­
still feasible in modern medicine. periments, and the pulse datasets provided by the 211 Hospital of the
People’s Liberation Army for our institute.
4. Conclusion
This paper designs two algorithms through analysis and uses the
