Application of Neural Network and Cluster Analyses To Differentiate TCM Patterns in Patients With Breast Cancer
Application of Neural Network and Cluster Analyses To Differentiate TCM Patterns in Patients With Breast Cancer
Edited by: Administration, College of Management, Fu Jen Catholic University, New Taipei City, Taiwan, 3 Research Center of Big Data,
Min Ye, College of Management, Taipei Medical University, Taipei, Taiwan, 4 Information Technology Office, China Medical University
Peking University, China Hospital, Taichung, Taiwan, 5 College of Management, Taipei Medical University, Taipei, Taiwan, 6 Executive Master Program
of Business Administration in Biotechnology, College of Management, Taipei Medical University, Taipei, Taiwan, 7 School of
Reviewed by:
Chinese Medicine, China Medical University, Taichung, Taiwan, 8 Research Center for Traditional Chinese Medicine,
Xu Wu,
Department of Medical Research, China Medical University Hospital, Taichung, Taiwan, 9 Chinese Medicine Research Center,
Southwest Medical University,
China Medical University, Taichung, Taiwan, 10 Research Center for Chinese Herbal Medicine, China Medical University,
China
Taichung, Taiwan, 11 Department of Chinese Medicine, An-Nan Hospital, China Medical University, Tainan, Taiwan
Gang Bai,
Nankai University, China
*Correspondence: Background and Purpose: Pattern differentiation is a critical element of the prescription
Ben-Chang Shia process for Traditional Chinese Medicine (TCM) practitioners. Application of advanced
[email protected]
Sheng-Teng Huang
machine learning techniques will enhance the effectiveness of TCM in clinical practice. The
[email protected]; aim of this study is to explore the relationships between clinical features and TCM patterns
[email protected]
in breast cancer patients.
†
These authors have contributed
equally to this work Methods: The dataset of breast cancer patients receiving TCM treatment was recruited
from a single medical center. We utilized a neural network model to standardize
Specialty section: terminologies and address TCM pattern differentiation in breast cancer cases. Cluster
This article was submitted to
Ethnopharmacology,
analysis was applied to classify the clinical features in the breast cancer patient dataset. To
a section of the journal evaluate the performance of the proposed method, we further compared the TCM
Frontiers in Pharmacology
patterns to therapeutic principles of Chinese herbal medication in Taiwan.
Received: 26 January 2020
Accepted: 23 April 2020 Results: A total of 2,738 breast cancer cases were recruited and standardized. They
Published: 08 May 2020 were divided into 5 groups according to clinical features via cluster analysis. The pattern
Citation: differentiation model revealed that liver-gallbladder dampness-heat was the primary TCM
Huang W-T, Hung H-H, Kao Y-W,
Ou S-C, Lin Y-C, Cheng W-Z, Yen Z-R,
pattern identified in patients. The main therapeutic goals of the top 10 Chinese herbal
Li J, Chen M, Shia B-C and Huang S-T medicines prescribed for breast cancer patients were to clear heat, drain dampness, and
(2020) Application of Neural Network
detoxify. These results demonstrated that the neural network successfully identified
and Cluster Analyses to Differentiate
TCM Patterns in Patients patterns from a dataset similar to the prescriptions of TCM clinical practitioners.
With Breast Cancer.
Front. Pharmacol. 11:670.
Conclusion: This is the first study using machine-learning methodology to standardize
doi: 10.3389/fphar.2020.00670 and analyze TCM electronic medical records. The patterns revealed by the analyses were
highly correlated with the therapeutic principles of TCM practitioners. Machine learning
technology could assist TCM practitioners to comprehensively differentiate patterns and
identify effective Chinese herbal medicine treatments in clinical practice.
Keywords: traditional Chinese medicine, electronic medical records, breast cancer, neural network analysis,
cluster analysis, pattern differentiation
can convert TCM patterns into several codes, and label the the lowest ranking was not more than 10 and all frequencies of
standard TCM terminologies. For each case being analyzed as this variable in each cluster were more than 5% among clusters,
input, the specific TCM pattern was identified by determining were considered the primary features of breast cancer cases, since
the higher-weighted code of symptoms and signs. A forward and these symptoms had similar importance in each cluster. When
backward propagation of the neural network, consisting of the cluster analytical result of KPI has the most number of
several hidden layers, was used to calculate the weightings of primary features, it will be defined as the best KPI.
each code. The weighting of each pattern was based on different A symptom is defined as a subjective experience of a disease
symptoms and signs, calculated by using the well-known or physical ailment reported by a patient, while a sign is defined
heuristic equation, Term-Frequency-Inverse Document as any abnormal indication of disease that is identified by TCM
Frequency (TF-IDF), with some modifications. practitioners (Dodd et al., 2001). Pulse and tongue inspections
TF = (the frequencies of symptom A in code B/code) are the primary diagnostic methods applied by TCM
practitioners to collect the data of clinical signs. Despite the
Term frequency = ft,d ∕ o ft ,d
t 0 ∈d
0 correlation between symptoms and signs, the data collection
methodologies are different; therefore, we separately collected
and analyzed data of symptoms and three types of signs for
Inverse document frequency smooth = logð1 + N ∕ nt Þ subsequent TCM pattern differentiation.
The efficacies, as well as the details of related methods, have Clinical signs including tongue appearance, tongue coating,
been demonstrated in our previous study (Lin et al., 2019). The and pulse were analyzed individually due to variables. The
website accessing the demo version of DeepMedic software can symptoms and signs were ranked according to the frequency
be found at: http://bigdata-demo.deepmedic.cn/. of concurrent events. To make the high-ranking symptom and
sign variables more representative, we excluded variables with a
Cluster Analysis frequency of less than 5%, and the remaining variables were
In statistical methodologies, the purpose of cluster analysis is to regarded as secondary features (SF) in each cluster.
group the classification objects according to the characteristics of
the particular dataset. Study objects classified to the same group TCM Pattern Identification With Various PF
have similar characteristics, while those classified to different and SF
groups indicate that there are considerable differences in the From the previous analysis, we obtained the PF and SF of each
characteristics. We used K-means cluster to divide data into cluster in the cluster analysis with the best KPI. Each SF had
groups, and the number of clusters was determined by using the different chances in the cluster due to differing frequencies. In
smallest total within the sum of squares. order to analyze various possibilities, we disassembled the SF in a
cluster and combined them into “Sx_n”. Where “x” was the
Key Performance Indicators (KPI) number of a cluster, and “n” was the top number of symptoms of
Each variable in the dataset of this study was recorded by binary the SF. For example, S1_5 represented the top five symptoms of
classification of “yes/no”. Additionally, more even variables are the SF in cluster 1 and its frequency was judged by the fifth
more effective at finding similarity between each cluster. symptom. Finally, these were combined with the PF as “P +
Therefore, we calculated the mean and standard deviation Sx_n”. DeepMedic software was applied to objectively analyze
from all variables according to the concept of coefficient of the general TCM pattern of all combinations. We counted the
variation. The KPI obtained from dividing the standard number of various types of patterns and weighted each pattern
deviation by the mean is used for selecting variables. The with the frequency of the last symptom in each combination to
statistical formula is shown below. The higher value of this calculate the percentage of this pattern occurring in the cluster.
statistic represents more even variables. In order to find the The percentage of a pattern equal to the average frequency of a
optimal KPI, we limited the capture frequency of the variable to pattern was divided by the sum of average frequency of all
more than 5%. Starting from the minimum KPI, we increased the patterns. The statistical formula is shown below.
interval by 0.01 to find the best one.
fij
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Percentageij = Fi
itemyes itemno pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s itemall itemall itemyes itemno itemno Fi = Sum of average frequency of patterns in cluster i
KPIitem = = = =
x itemyes itemyes itemyes fij = Average frequency of pattern j in cluster i
itemall
i = 1, 2, …, 5
The Analysis of Symptoms and Signs in
j = 1, 2, …, number of patterns in cluster i
Cluster Model
If there were no statistically significant differences and greater
than 5% frequency of a variable among clusters, this variable Chinese Herbal Prescriptions in Breast
would be determined as a primary feature (PF). Additionally, Cancer Patients
symptoms that had significant differences in frequency but TCM herbs were classified into several categories based on their
similar rankings, where the difference between the highest and usage. To prove that the study objects are compatible with the
clinical prescriptions, we analyzed the top 10 single herbs and 2,738 cases contained records of the specific herbs and formulas
formulas prescribed by clinical TCM practitioners in Taiwan prescribed. The flowchart of our data acquisition is shown in
(Huang et al., 2017). To compare the usage in frequency and dose Figure 2.
of each herb and formula, we ranked these medications
according to the value obtained by the number of person-days The Standardization of Clinical Features
multiplied by average daily dose. In the 2,738 analyzable records, the top twenty symptoms in
Overall, the architecture (see Figure 1) of this study is frequency included “insomnia”, “dry mouth”, “lack of strength”,
primarily composed of five steps, as shown below. “dizziness”, “loss of appetite”, “abdominal distention”, “profuse
dreaming”, “bitter taste of mouth”, “lumbago”, “back pain”,
1. Standardize the terminologies of TCM. “afraid of cold”, “loose stool”, “headache”, “nausea”, “absence
2. Find the best KPI to indicate that cluster analytical result has of thirst”, “cough”, “acid regurgitation”, “soreness”, “nocturia”,
the most number of primary features. and “dry eyes”. The top five tongue appearances included “pale
3. Combine primary features and secondary features into red tongue”, “red tongue”, “teeth-marked tongue”, “dark red
different arrangements in each cluster. tongue”, and “dry tongue”. The top five tongue coatings included
4. Identify TCM patterns of each combination in each cluster “white coating”, “thin coating”, “thin white tongue”, “slimy
through machine-learning confirmation. coating”, and “thick coating”. The top five pulses included
5. Compare the similarity between TCM patterns in each cluster “string-like pulse”, “slippery pulse”, “fine pulse”, “weak pulse”,
and the therapeutic principles of Top 10 Chinese herbal and “sunken pulse”. The ranking and frequency of each
prescriptions in Taiwan. symptom and sign are listed in Table 1.
Cluster Analysis
The declining slope of total within the sum of squares moderated
RESULTS when the data was divided into five groups, indicating that it was
an acceptable number of groups for the analysis of breast cancer
Data Extraction patient records (Supplementary Figure 1).
We selected only the initial visit records of individual patients,
and excluded the remaining follow-up records, which contained Symptoms and Signs of PF and SF in
incomplete data. All of these records must have included Each Cluster
patient's gender, age, and details concerning symptoms and The minimum KPI for this study of breast cancer patients was
signs. A total of 78,917 breast cancer patients' records were 0.231, and the best one was 0.252045. The frequency ranking
recruited, including 2,913 complete initial visit records, of which differences of tongue appearances, tongue coatings, and pulses in
FIGURE 1 | The analytical architecture of TCM EMR in the patients with breast cancer. The workflow (1) ~ (5) describes the process of multiple analyses to classify
TCM clinical features and patterns of breast cancer patients, compared with therapeutic principles of Chinese herbal prescriptions by TCM practitioners in Taiwan.
TABLE 1 | The top 20 symptoms and the top five signs in breast cancer
patients.
Categories PF S1* S2 S3 S4 S5
*SX, Secondary features in cluster X; PF, Primary features; SF, Secondary features; AP, Amount of people.
TABLE 3 | Average frequency and percentage of the main TCM patterns in each main TCM pattern (70%) in cluster 2, followed by DLTF (22%).
cluster subgroup.
DLTF was the main TCM pattern (74%) in cluster 3, followed by
ALL AF % C1* AF % C2 AF % LDSD (13%). In cluster 4, LGDH was still the main TCM pattern
(59%), followed by DLTF (19%), and LDSD (9%). LGDH
LGDH 85% 43% DLTF 12% 35% LGDH 70% 70%
accounted for the main TCM pattern (64%) in cluster 5,
DLTF 38% 20% RDH 10% 30% DLTF 22% 22%
RDT 22% 12% LDSD 8% 23% LDSD 9% 9%
followed by RDT (15%), and spleen-stomach qi deficiency
LKYD 11% 6% SSQD 4% 13% (SSQD) (11%). For detailed definition of each pattern from
LDSD 11% 6% WHO (World Health Organization. Regional Office for the
SSQD 11% 5% Western, 2007), please refer to Supplementary Table 2.
RDH 10% 5%
QDBS 7% 4%
The Top 10 of Chinese Herbal
C3 AF % C4 AF % C5 AF %
DLTF 71% 74% LGDH 99% 59% LGDH 99% 64%
Prescriptions in Breast Cancer Patients
LDSD 13% 13% DLTF 33% 19% RDT 22% 15% As shown in Tables 4 and 5, the top 10 of Chinese herbal
LKYD 6% 6% LDSD 15% 9% SSQD 17% 11% prescriptions in breast cancer patients included those that could
QDBS 6% 6% LKYD 12% 7% LDSD 10% 7% clear heat, drain dampness and detoxify (29%), harmonize the
QDBS 8% 5% QDBS 6% 4%
liver and spleen (19%), tonify qi (18%), nourish the heart to
*CX, Cluster X; TCM, Traditional Chinese medicine; AF, Average frequency; LGDH, Liver- tranquilize (15%), activate blood and resolve stasis (12%), tonify
gallbladder dampness-heat; DLTF, Depressed liver qi transforming into fire; RDT, Retained yin (4%), clear heat and resolve phlegm (2%), and offensive
dampness-toxin; LKYD, Liver-kidney yin deficiency; LDSD, Liver depression and spleen
deficiency; SSQD, Spleen-stomach qi deficiency; RDH, Retained dampness-heat; QDBS,
purgative (1%). The components of each formula were
Qi deficiency with blood stasis. summarized in Supplementary Table 3.
FIGURE 3 | The TCM pattern distribution in each cluster subgroup. LKYD, Liver-kidney yin deficiency; LGDH, Liver-gallbladder dampness-heat; DLTF, Depressed
liver qi transforming into fire; LDSD, Liver depression and spleen deficiency; QDBS, Qi deficiency with blood stasis; SSQD, Spleen-stomach qi deficiency; RDT,
Retained dampness-toxin; RDH, Retained dampness-heat.
TABLE 4 | The top 10 of Chinese herbal prescription including single herbs and combination use of TCM (Liu et al., 2008; Huang et al., 2017).
formulae in breast cancer patients in Taiwan.
Some Chinese medicinal herbs have demonstrated effects in
Herbal prescription Total Therapeutic Effect controlling the progression, increasing the susceptibility to
consumption (g)* radiotherapy and chemotherapy, elevating immunity, and
decreasing the toxicities or side effects of cancer therapies (Yin
Single herb
Hedyotis diffusa Willd. 553153.5 Clear heat, drain dampness
et al., 2013). Based on the potential therapeutic effects of TCM,
and detoxify we explored the relationships between clinical features and TCM
Scutellaria barbata D. Don 498634 Clear heat, drain dampness patterns in breast cancer patients via the applications of machine
and detoxify learning techniques. TCM clinical records were gathered in this
Taraxacum mongolicum 442880.6 Clear heat, drain dampness
study for text analysis.
Hand.-Mazz. and detoxify
Spatholobus suberectus 277550.5 Activate blood and resolve Text analysis is a subfield of natural language processing
Dunn stasis (NLP). In the past, the lack of a widely adopted and consistently
Zizyphus jujuba Mill var. 236498.7 Nourish the yin to tranquilize implemented medical terminology limited the use of machine-
spinosa learning in medical research, especially in the field of TCM. In
Salvia miltiorrhiza Bge. 220686.4 Activate blood and resolve
this study, we used the DeepMedic software to analyze
stasis
Astragalus membranaceus 209072.8 Tonify qi unstructured electronic TCM clinical records. The software
(Fisch.) Bunge standardized and integrated key TCM terminology via the
Polygonum multiflorum 154919.8 Nourish the heart to application of an NLP system and neural network. A total of
Thunb. tranquilize 2,738 breast cancer records were standardized and divided into 5
Fritillaria thunbergii Miq. 148233 Clear heat and resolve
phlegm
subgroups via cluster analysis according to the frequency of
Rheum palmatum L. 66898.8 Offensive purgative clinical features reported in each case. Since patterns were not
directly observable, the TCM patterns were differentiated via
Formula DeepMedic software by analyzing the PF and SF in each
Jia-Wei-Xiao-Yao-San 1604612.1 Harmonize the liver and
cluster subgroup.
spleen
San-Zhong-Kui-Jian-Tang 523792.8 Clear heat, drain dampness
and detoxify The TCM Patterns in Breast
Xue-Fu-Zhu-Yu-Tang 517921.8 Activate blood and resolve Cancer Patients
stasis As shown in Table 3 and Figure 3, LGDH was the main TCM
Xiang-Sha-Liu-Jun-Zi-Tang 517717.2 Tonify qi
pattern (43%) identified in breast cancer patients, which was
Gui-Pi-Tang 464660 Nourish the heart to
tranquilize compatible with the analysis of PF. According to the TCM
Bu-Zhong-Yi-Qi-Tang 403355.6 Tonify qi patterns including LGDH, DLTF, and RDH, the liver is the
Suan-Zao-Ren-Tang 402041.9 Nourish the heart to main disease location of breast cancer, while dampness and heat
tranquilize were the main pathological mechanisms. According to TCM
Zhen-Ren-Huo-Ming-Yin 388555.2 Clear heat, drain dampness
and detoxify
theory, the liver is related to the nerve-endocrine-immune
Zhi-Bai-Di-Huang-Wan 367875 Tonify yin network, it is responsible for the regulation of emotion, the
Sheng-Mai-Yin 336522.3 Tonify qi promotion of digestion and absorption, and the maintenance of
*The total consumption of the herb is the number of person-days multiplied by average
qi and blood circulation via the nerves and endocrine (Liu et al.,
daily dose. 2017). In TCM theory, “fire” is the advanced status of “heat” in
severity, while “toxin” indicates faster transmission of heat and
worsening condition. Since heat and fire will damage the yin, and
the depressed liver qi will impair the function of the spleen, some
TABLE 5 | The therapeutic effects of the commonly used Chinese herbs in patients exhibit both yin and spleen qi deficiencies. Qi deficiency
breast cancer patients in Taiwan. with blood stasis (QDBS) was also one of the SF identified in
breast cancer patients, since qi deficiency will result in stagnated
Therapeutic effect Total Percentage
consumption (g)*
blood circulation. As exhibited in Table 3 and Figure 3, the
frequency of the LGDH and DLTF patterns had great impact on
Clear heat, drain dampness and detoxify 2407016.1 29% these cluster subgroups. The presence of some minor TCM
Harmonize the liver and spleen 1604612.1 19% patterns also helped to distinguish these five subgroups.
Tonify qi 1466667.9 18%
Nourish the heart to tranquilize 1258120.4 15%
Cluster 1
Activate blood and resolve stasis 1016158.7 12%
Tonify yin 367875 4% The percentage the DLTF pattern (35%) was similar to that of the
Clear heat and resolve phlegm 148233 2% RDH pattern (30%). Additionally, the percentage of the
Offensive purgative 66898.8 1% LDSD (23%) and SSQD (13%) patterns were higher than
*The total consumption of the herb is the number of person-days multiplied by average those of other cluster subgroups. This indicates that there was
daily dose. no dominant TCM pattern in cluster 1.
REFERENCES Liu, S., Zhao, J., Liu, J., Sun, Z. -P., Hua, Y. -Q., Lu, D. -M., et al. (2008). Effects of
Ru'ai Shuhou Recipe on 5-year recurrence rate after mastectomy in breast
Balneaves, L. G., Bottorff, J. L., Hislop, T. G., and Herbert, C. (2006). Levels of cancer. J. Chin. Integr. Med. 6 (10), 1000–1004. doi: 10.3736/jcim20081003
commitment: exploring complementary therapy use by women with breast Liu, Z. W., Shu, J., Tu, J. Y., Zhang, C. H., and Hong, J. (2017). Liver in the Chinese
cancer. J. Altern. Complement Med. 12 (5), 459–466. doi: 10.1089/ and Western Medicine. Integr. Med. Int. 4 (1-2), 39–45. doi: 10.1159/
acm.2006.12.459 000466694
Boon, H. S., Olatunde, F., and Zick, S. M. (2007). Trends in complementary/ Wang, Y., Yu, Z., Jiang, Y., Liu, Y., Chen, L., and Liu, Y. (2012). A framework and
alternative medicine use by breast cancer survivors: comparing survey data its empirical study of automatic diagnosis of traditional Chinese medicine
from 1998 and 2005. BMC Womens Health 7 (4). doi: 10.1186/1472-6874-7-4 utilizing raw free-text clinical records. J. BioMed. Inform 45 (2), 210–223.
Chen, Z., Gu, K., Zheng, Y., Zheng, W., Lu, W., and Shu, X. O. (2008). The use of doi: 10.1016/j.jbi.2011.10.010
complementary and alternative medicine among Chinese women with breast World Health Organization. Regional Office for the Western, P. (2007). WHO
cancer. J. Altern. Complement Med. 14 (8), 1049–1055. doi: 10.1089/acm.2008.0039 international standard terminologies on traditional medicine in the Western
Chung, V. C., Wu, X., Lu, P., Hui, E. P., Zhang, Y., Zhang, A. L., et al. (2016). Pacific Region: (Manila : WHO Regional Office for the Western Pacific).
Chinese Herbal Medicine for Symptom Management in Cancer Palliative Care: Yin, S. Y., Wei, W. C., Jian, F. Y., Yang, N. S.Therapeutic Applications of Herbal
Systematic Review And Meta-analysis. Med. (Baltimore) 95 (7), e2793. Medicines for Cancer Patients. (2013). Evid. Based Complement. Alternat.
doi: 10.1097/MD.0000000000002793 Med. 2013, 302426. doi: 10.1155/2013/302426
Crocetti, E., Crotti, N., Feltrin, A., Ponton, P., Geddes, M., and Buiatti, E. (1998). Zhang, Y., Liang, Y., and He, C. (2017). Anticancer activities and mechanisms of
The use of complementary therapies by breast cancer patients attending heat-clearing and detoxicating traditional Chinese herbal medicine. Chin. Med.
conventional treatment. Eur. J. Cancer 34 (3), 324–328. doi: 10.1016/s0959- 12 (20). doi: 10.1186/s13020-017-0140-2
8049(97)10043-0 Zhang, Z., Beck, M. W., Winkler, D. A., Huang, B., Sibanda, W., Goyal, H., et al.
Dodd, M., Janson, S., Facione, N., Faucett, J., Froelicher, E. S., Humphreys, J., et al. (2018). Opening the black box of neural networks: methods for interpreting
(2001). Advancing the science of symptom management. J. Adv. Nurs. 33 (5), neural network models in clinical applications. Ann. Transl. Med. 6 (11), 216.
668–676. doi: 10.1046/j.1365-2648.2001.01697.x doi: 10.21037/atm.2018.05.32
Huang, K. C., Yen, H. R., Chiang, J. H., Su, Y. C., Sun, M. F., Chang, H. H., et al.
(2017). Chinese Herbal Medicine as an Adjunctive Therapy Ameliorated the Conflict of Interest: The authors declare that the research was conducted in the
Incidence of Chronic Hepatitis in Patients with Breast Cancer: A Nationwide absence of any commercial or financial relationships that could be construed as a
Population-Based Cohort Study. Evid. Based Complement Alternat. Med. 2017, potential conflict of interest.
1052976. doi: 10.1155/2017/1052976
Lin, Y. C., Huang, W. T., Ou, S. C., Hung, H. H., Cheng, W. Z., Lin, S. S., et al. Copyright © 2020 Huang, Hung, Kao, Ou, Lin, Cheng, Yen, Li, Chen, Shia and
(2019). Neural network analysis of Chinese herbal medicine prescriptions for Huang. This is an open-access article distributed under the terms of the Creative
patients with colorectal cancer. Complement Ther. Med. 42, 279–285. Commons Attribution License (CC BY). The use, distribution or reproduction in other
doi: 10.1016/j.ctim.2018.12.001 forums is permitted, provided the original author(s) and the copyright owner(s) are
Ling, S., and Xu, J. W. (2013). Model organisms and traditional chinese medicine credited and that the original publication in this journal is cited, in accordance with
syndrome models. Evid. Based Complement Alternat. Med. 2013, 761987. accepted academic practice. No use, distribution or reproduction is permitted which
doi: 10.1155/2013/761987 does not comply with these terms.