A Text-Image Feature Mapping Algorithm Based on Tr
A Text-Image Feature Mapping Algorithm Based on Tr
2018; 16:1139–1148
Research Article
Open Access. © 2018 D. Pan and H. Yang, published by De Gruyter. This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 License
1140 | D. Pan and H. Yang
ing [4], nor unsupervised learning and semi-supervised each cluster, and focus on some specific clusters for further
learning, but a new machine learning method. During fea- analysis. At the same time, clustering technology can also
ture migration, even if the data in the source data space be used as a pre-treatment step of other algorithms to ef-
and the target data space do not intersect at the instance fectively improve the performance of other algorithms [10].
level, they may be related at the feature level [5]. Data with
two feature perspectives can be used to establish a link be-
tween two different feature spaces. These data are not nec- 2.1.2 Text representation and text similarity formula
essarily used as training data for knowledge learning, but
they can act as a role of dictionary. Taking a subject event According to the traditional vector space model (VSM)
as the background, the sufficient text-image information representation, the text content can be expressed as a
about the event on the Internet is used as a basis for knowl- weighted feature vector. Let D be a text set, d i is a text in
edge migration. the set, t is a feature word, t i is the i-th feature word, and
Aiming at these problems, in this paper a text-image wi is a weight of the i-th feature word.
feature mapping algorithm based on transfer learning is
d i = (t1 , w1 ; t2 , w2 ; ...; t n .w n ) (1)
proposed, which uses clustering technology to filter the
existing data to find the data very similar to the target Where, the weight wi can be represented by the tf-idf
data [6]. The significant text features are calculated by LDA weight of each feature. The tf-idf formula is as follows:
model based on Gibbs sampling and information gain. ∑︁ |D|
The BOVW model and naive Bayesian method are used tf − idf = tf (d, t) · log (2)
df (t)
to model the subject of image data. With the help of the d∈D
text-image co-occurrence data [7] in same event, the text Where, tf (d, t) is the word frequency of word t in text d,
feature distribution is mapped to the image feature space, df (t) is the number of text containing word t in text set D,
and the feature distribution of image data under the same and |D| represents the number of text contained in text set
event is approximated. D.
The similarity between two texts can be calculated by
the cosine of the angle α between two vectors. Assuming
that two texts are d1 = (t1 , w1 ; t2 , w2 ; . . . ; t n , w n ) and
2 A text-image feature mapping d2 = (t1 , σ1 ; t2 , σ2 ; . . . ; t n , σ n ), the similarity between d1
algorithm based on transfer and d2 is expressed as follows:
learning n
∑︀
wi × σi
i=1
∼ (d1 d2 ) = cos α = (︂ )︂ 21 (3)
2.1 Transfer learning algorithm for n
∑︀
ωi 2 ×
n
∑︀
σi 2
clustering text i=1 i=1
The greater the value of ∼ (d1 d2 ) is, the more similar the
Although the existing auxiliary data is out of date, there two texts are, of which w is the weight of the feature and σ
should be some data in the existing data that is very sim- is the approximate text weight.
ilar to the test data and can be used to help target tasks
learning [8]. Therefore, clustering technology is used to
find data that is very similar to test data from existing data. 2.1.3 Algorithm principle
Definition 2.1.1 * set Xb as the target sample space and Xa feature spaces are connected.These data are not necessar-
as the auxiliary sample space. * set Y = {0, 1} as a class ily used as training data for knowledge learning, but they
space. can act as a dictionary. Taking a subject event as the back-
ground, sufficient text-image information about the event
Definition 2.1.2 (test data set) S = x ti , of which x ti ∈
{︀(︀ )︀}︀
on the internet is used as a basis for knowledge migration.
Xb, i = 1, 2, . . . , k, and k are the number of elements of
set S.
2.2.1 Text-image co-occurrence data constrained by
Definition 2.1.3 (training data set) The training dataset events
consists
{︁(︁ of two(︁ parts:
)︁)︁}︁
Tb = x j , c x bj
b
, where x bj ∈ Xb, j = 1, 2, . . . , m; In the heterogeneous spatial learning model, the difficulty
{︀(︀ a (︀ a )︀)︀}︀
Ta = x i , c x i , where x ai ∈ Xa, i = 1, 2, . . . , n; of the whole learning process will be greatly reduced if a
where, c (x) is the real class label of the instance, t is the data with two feature spatial perspectives is used as an
feature word, i and j are the number of permutations, is the aid [15]. The heterogeneous spatial learning model under
target training data set, Ta is the auxiliary training data event constraint provides the possibility. The text-image
set. M and n are the size of target training dataset and aux- co-occurrence data under event constraints are given here.
iliary training dataset respectively. E is an event set, event e ∈ E;V is the whole image data
set, and the relevant image {v} ∈ V under event e. D
is the whole text data set, and the text set under event
2.1.4 Algorithm steps e is {d} ∈ D; U v is the image feature space, and U D is
the text feature space. Text-image co-occurrence data in-
Input: two training datasets Ta and Tb, a test data set S. stances vd ∈ S and S are co-occurrence data set, ∫ are op-
Output: classification result h t (X t ). eration coefficients, and u v ∈ U v and u d ∈ U D are corre-
Read the training data Ta and Tb. sponding features of image data instances and text data
The training data are classified into N classes according to instances, respectively. Under the constraint of events, the
class labels: T i (i = 1, . . . , N ), of which T i is the instances text-image co-occurrence data vd is formally described at
set of classes labeled i; the feature level as follows:
ht : X → Y .
2.2.2 Text subject modeling
Test the performance of the classification model on S and
output it [13]. The LDA model based on Gibbs sampling is used to extract
subject information from text sets for modeling [16], and
the probability model is:
2.2 A text-image feature mapping algorithm (︁ )︁
w i |z i , ϕ(z i ) ∼ disc ϕ(z i ) , ϕ ∼ dir (β) (7)
based on transfer learning ⃒ (︁ )︁
z i ⃒θ d i ∼ disc θ(d i ) , θ ∼ dir (α)
⃒
Based on the previous section, the existing data is filtered
by clustering technique [14], and the data which is very In order to deal with the new text outside the event train-
similar to the target data is obtained. Data with two feature ing text and facilitate parameter reasoning, the symmetric
perspectives are used to establish a link, and two different dir (α) and dir (β) prior probability assumptions are made
1142 | D. Pan and H. Yang
[1 + n (f , w, c, S)] / [|F | + n (w, c, S)] bird’s nest incidents. Depending on the duration of the in-
cidents [24], the number of related text downloads ranged
∑︁ from 800 to 2000, with text-image accompanying text ac-
n (f , w, c, S) = n (f |v )P (w, c |d ) (14)
counting for about one-third to one-second.A text-image
(v,d)∈S
accompanying sample is regarded as a co-occurrence data
∑︁
n (w, c, S) = n (v)P (w, c |d ) (15) instance, but in the case of multiple images in a sample, it
(v,d)∈S is considered that one image corresponds to the same ac-
companying text, and the number of co-occurrence data
instances is calculated according to the number of images.
2.2.5 Evaluation Criteria Based on artificially collecting, the image data of each food
safety event from the Internet search engine and related
The goal of the text-to-image feature mapping algorithm is web pages are searched. For each event, 200~400 images
to estimate the feature distribution of image information are collected separately. The BOVW model is used to repre-
under event categories [21]. According to the feature inde- sent each image in a bag of visual word, and the histogram
pendence hypothesis of the BOVW model, image features vector expression of each image is obtained.
are regarded as random variables which appear indepen- Firstly, the feature distribution of the reference im-
dently. Image feature distribution can be represented as a age is constructed. Using all the images under each event
vector with the same size as the bag of character words. category c, an image feature distribution is obtained as
|F|−1
∑︁ the base feature distribution by the Naive Bayesian clas-
P (f |c ) = ⟨P i ≥ 0⟩i=0 , Pj = 1 (16)
i
sifier. Theoretically, the Naive Bayesian classifier can cal-
culate the real image feature distribution under the tar-
Cosine similarity and K-L (Kullback-Leibler) diver-
get category when the training data is sufficient. The two
gence dispersion is used as the performance evaluation
intuitive methods are compared with the text-image fea-
scale [22], it is assumed that p probability distribution is
ture mapping algorithm. The first method is the uniform
the datum distribution, and the other probability distribu-
distribution algorithm, assuming that each image feature
tion q is the approximation of distribution p. The greater
appears randomly under the concept of each event tar-
the cosine similarity of the two approximations is, the
get with the same probability. The second method is the
closer the two feature distributions are and the higher the
tagged query algorithm, which uses the name of category
approximation degree is. The formula for cosine similarity
c as the query keyword, searches in the Internet search en-
is as follows:
⎛ ⎞ gine [25], and uses the returned K image to train the Naive
√︃∑︁ √︃∑︁ Bayesian model to get the image feature distribution. The
Pi 2 +
∑︁
CS (P, q) = Pi qi / ⎝ qi 2⎠ (17) K value of the experiment is 50 based on experience [26–
i i i
31].
K-L dispersion is an asymmetric measure to evaluate the
difference between two probability distributions. Its value
reflects the approximation of distribution q to distribution
3 Results
p. In the determination of the characteristic distribution of
the reference image data, the K-L dispersion is defined as:
The K value of this experiment is 50 according to experi-
∑︁ P ence. The comparison between the three algorithms under
KL (P ‖q ) = P i 1b i (18)
qi cosine similarity is shown in Figure 2, 3, and 4.
i
Analyzing the results of the three algorithms in the
Based on the above methods, 15 categories of video secu-
cosine similarity, we can see that the maximum value of
rity incidents on the Internet are analyzed as data sets [23],
the uniform distribution algorithm is 0.94%, the minimum
corresponding categories are: E1: Sanlu milk powder inci-
value is 0.74%, the maximum value of the label query algo-
dents; E2: red-cored duck egg incidents; E3: Turbot inci-
rithm is 0.97%, the minimum value is 0.76%, and the max-
dents; E4: Jinhao tea oil incidents; E5: Maile chicken inci-
imum value of the proposed algorithm is 0.99% and the
dents; E6: Plasticizer incidents; E7: Clenbuterol incidents;
minimum value is 0.76%.
E8: paraffin wax in hot pot incidents; E9: Gutter oil event;
Through data comparison, the cosine similarity of the
E10: Crayfish event; E11: Fushou snail incidents; E12: Poi-
proposed algorithm is always higher than that of the uni-
sonous steamed bread incidents; E13: Maggot citrus inci-
form distribution algorithm and the label query algorithm.
dents; E14: Bursting watermelon incidents; E15: Poisonous
1144 | D. Pan and H. Yang
4 Discussion
In the traditional machine learning framework, the task of
learning is to learn a classification model based on given
sufficient training data, and then use this learning model
to classify and predict test documents. However, we see
that machine learning algorithms have a key problem in
the current Internet mining research: a large amount of
training data in some emerging areas is difficult to obtain.
It can be seen that the development of internet applica-
Figure 9: Comparison of different algorithms for estimating distribu-
tion under K-L discrete values tions is very fast. A large number of new areas are emerg-
ing, from traditional news, to web pages, pictures, blogs,
podcasts and so on. Traditional machine learning needs
event category from the text data of related events and text- to calibrate a large amount of training data in each field,
image co-occurrence data. which will consume manpower and material resources.
Under the 100 events, the similarity distribution of Without a large amount of annotated data, a lot of learn-
text- image data is simulated. The proposed algorithm is ing related researches and applicationscan’t be carried
simulated with the uniform distribution algorithm and la- out. Secondly, traditional machine learning assumes that
bel query algorithm. The average optimal fitness and aver- training data and test data obey the same data distribu-
age operation time of the two algorithms are obtained. The tion. However, in many cases, the same distribution hy-
detailed results are described in Table 1. From the analy- pothesis is not satisfied. In addition, training data is often
out of date. This often requires re-labelling a large volume
Table 1: Simulation results of approximate distribution of image of training data to meet our training needs, but labeling
data distribution under 100 events new data is expensive and requires manpower and mate-
rial resources. On the other hand, if we have a lot of train-
Type of algorithm Average Mean ing data with different distributions, it would be wasteful
optimum operation to discard them completely. How to make rational use of
fitness time /s this data is the aim of transfer learning.
/% Main problems are solved. Transfer learning can trans-
Uniform distribution algorithm 7.51 54.09 fer knowledge from existing data to help future learn-
Tagged query algorithm 8.22 53.69 ing. The goal of transfer learning is to use the knowl-
Algorithm in this paper 9.85 34.72 edge learned from one environment to assist learning tasks
in the new environment. Therefore, transfer learning will
not assume the same distribution assumption as tradi-
sis of Table 1, we can see that the optimal fitness of the tional machine learning. At present, the work on trans-
proposed algorithm is 9.85%, the uniform distribution al- fer learning can be divided into two parts: case-based
gorithm is 7.51%, the label query algorithm is 8.22%, and transfer learning in isomorphic space and feature-based
the fitness of the proposed algorithm is the highest. In the transfer learning in isomorphic space. It is pointed out
comparison of the average operation time, the proposed that case-based transfer learning has stronger knowledge
algorithm takes 34.72 seconds, the uniform distribution al- transfer ability, while feature-based transfer learning has
gorithm takes 54.09s, and the label query algorithm takes wider knowledge transfer ability. These two methods have
53.69s, indicating that the proposed algorithm takes the their own merits. Transfer learning is a relatively new re-
shortest time and has the highest efficiency. This algorithm search direction in machine learning. The current research
can quickly and effectively extract the approximate feature mainly focuses on data mining, natural language process-
distribution of text-image data under 100 events. This al- ing, information retrieval and image classification. Ma-
A text-Image feature mapping algorithm based on transfer learning | 1147
chine learning has provided extensive research findings Trans. Biomed. Eng. PP, 2018, (99), 1-1.
and results, but research into transfer learning is mini- [4] Cazade P.A., Zheng W., Pradagracia D., A Comparative Anal-
ysis of Clustering Algorithms: O2 Migration in Truncated
mal. Features and samples are two important aspects of
Hemoglobin I from Transition Networks, J. Chem. Phys., 2015,
text categorization. It is important to consider these two
142(2), 025-103.
factors comprehensively. Sample based transfer learning [5] Wan S., Niu Z., A Learner Oriented Learning Recommendation
is another method to solve the problem of transfer learn- Approach based on Mixed Concept Mapping and Immune Algo-
ing. Traditional methods also use feature-based or sample- rithm, Knowledge-Based Syst., 2016, 103(C), 28-40.
based transfer learning methods, but there is a lack of com- [6] Han X.H., Xiong X., Duan F., A New Method for Image Segmen-
tation based on BP Neural Network and Gravitational Search Al-
prehensive use of these two methods. The algorithm pro-
gorithm Enhanced by Cat Chaotic Mapping, Appl. Intel., 2015,
posed in this paper can find the data very similar to the 43(4), 855-873.
test data from the existing data and improve the accuracy [7] Zhou T., Hu W., Ning J., An Eflcient Local Operator-based Qcom-
of the model. pensated Reverse Time Migration Algorithm with Multistage Op-
timization, Geophys., 2018, 83(3), S249-S259.
[8] Gorodnitskiy E., Perel M., Geng Y., Depth Migration with Gaus-
sian Wave Packets based on Poincaré Wavelets, Geophys. J. Int.,
5 Conclusions 2016, 205(1), 301-318.
[9] Rastogi R., Srivastava A., Khonde K., An Eflcient Parallel Algo-
rithm: Poststack and Prestack Kirchhoff 3D Depth Migration Us-
In this paper, a text-image feature mapping algorithm
ing Flexi-depth Iterations, Comp. Geosci., 2015, 80, 1-8.
based on transfer learning is proposed. Firstly, clustering [10] Tosun S., Ozturk O., Ozkan E., Application Mapping Algorithms
technology is used to filter the existing data and find the for Mesh-based Network-on-chip Architectures, J. Supercomp.,
data which is similar to the target data to help the learning 2015, 71(3), 995-1017.
of the target task and improve the performance of the clas- [11] Kalantar B., Mansor S.B., Sameen M.I., Drone-based Land-cover
Mapping Using a Fuzzy Unordered Rule Induction Algorithm In-
sifier. Then, the event text data is modeled by the potential
tegrated into Object-based Image Analysis, Int. J. Remote Sens.,
Dirichlet assignment method, and the most prominent text 2017, 38(8-10), 2535-2556.
features are selected by calculating the information gain [12] Mackenzie C., Pichara K., Protopapas P., Clustering Based Fea-
of the topic features; the event images are modeled using ture Learning on Variable Stars, Astrophys. J., 2016, 820(2), 138.
the visual word bag model and the naive Bayesian method; [13] Li H., Zhu G., Cui C., Energy-eflcient Migration and Consolidation
The approximate extraction of the image feature distribu- Algorithm of Virtual Machines in Data Centers for Cloud Comput-
ing, Comput., 2016, 98(3), 303-317.
tion is realized by the text data feature distribution and
[14] Xiang T., Yan L., Gao R., A Fusion Algorithm for Infrared and Vis-
the text-image co-occurrence data feature distribution un- ible Images based on Adaptive Dual-channel Unit-linking PCNN
der the same event. Compared with the traditional uniform in NSCT Domain, Infrared Phys. Technol., 2015, 69, 53-61.
distribution algorithm and labeled query algorithm, the [15] Dong J., Xiao X., Menarguez M.A., Mapping Paddy Rice Plant-
average cosine similarity of the proposed algorithm is 92%, ing Area in Northeastern Asia with Landsat 8 Images, Phenology
based Algorithm and Google Earth Engine, Remote Sens. Envir.,
that of the uniform distribution algorithm is 76%, and the
2016, 185, 142-154.
labeled query algorithm is 84%. The average dispersion of [16] Li Q., Zhou H., Zhang Q., Eflcient Reverse Time Migration based
the proposed algorithm is 0.06%, that of the uniform dis- on Fractional Laplacian Viscoacoustic Wave Equation, Geophys.
tribution algorithm is 0.17%, and the labeled query algo- J. Int., 2016, 204(1), 488-504.
rithm is 0.09%. The experimental data shows that the pro- [17] Medrano E.A., Wiel B.J.H.V.D., Uittenbogaard R.E., Simulations
posed algorithm has the advantage of high cosine similar- of the Diurnal Migration of Microcystis Aeruginosa, based on
a Scaling Model for Physical-biological Interactions, Ecolog.
ity and low dispersion.
Mod., 2016, 337, 200-210.
[18] Matsubayashi A., Asymptotically Optimal Online Page Migration
on Three Points, Algorithmica, 2015, 71(4), 1035-1064.
[19] Yap W. S., Phan C.W., Yau W.C., Cryptanalysis of a New Image
References Alternate Encryption Algorithm based on Chaotic map, Nonlin.
Dyn., 2015, 80(3), 1483-1491.
[1] Wang F., Youh J., Fux Y., Auto-Adaptive Well-Distributed Scale- [20] Rastogi R., Londhe A., Srivastava A., 3D Kirchhoff Depth Migra-
Invariant Feature for SAR Images Registration, Geomat. Inform. tion Algorithm, Comp. Geosci., 2017, 100(C), 67-75.
Sci. Wuhan Univ., 2015, 40(2), 159-163. [21] Thierry P., Lambaré G., Podvin P., 3-D Preserved Amplitude
[2] Wang K., Shil Z., Design and Implementation of Fast Connected Prestack Depth Migration on a Workstation, Geophys., 2015,
Component Labeling Algorithm based on FPGA, Comp. Eng. 64(1), 222-229.
Appl., 2016, 52(18), 192-198. [22] Zheng X.W., Lu D.J., Wang X.G.A., Cooperative Coevolutionary
[3] Lozoya R.C., Berte B., Cochet H., Model-based Feature Augmen- Biogeography-based optimizer, Appl. Intel., 2015, 43(1), 1-17.
tation for Cardiac Ablation Target Learning from Images, IEEE
1148 | D. Pan and H. Yang
[23] Wang M., Study on Operation Reliability of Transfer System of [28] Gao W., Farahani M.R., Aslam A., Hosamani S. Distance Learn-
Urban Transportation Hub based on Reliability Theory, Automat. ing Techniques for Ontology Similarity Measuring and Ontology
Instrument., 2016, (1), 418-534. Mapping, Cluster Computing - The J. Net. Soft. Tools Appl., 2017,
[24] Cong S., Gao M.Y., Cao G., Ultrafast Manipulation of a Dou- 20(2SI), 959-968.
ble Quantum-Dot Charge Qubit Using Lyapunov-Based Control [29] Ge S.B., Ma J.J., Jiang S.C., Liu Z., Peng W.X., Potential Use of
Method. IEEE J. Quant. Electr., 2015, 51(8), 1-8. Different Kinds of Carbon in Production of Decayed Wood Plastic
[25] Yan X., Yang S., Hong H.E., Load Adaptive Control Based on Fre- Composite, Arabian J. Chem., 2018, 11(6), 838-843.
quency Bifurcation Boundary for Wireless Power Transfer Sys- [30] Singh K., Gupta N., Dhingra M., Effect of Temperature Regimes,
tem, J. Pow. Supp., 2017, 43(4), 1025-1084. Seed Priming and Priming Duration On Germination and
[26] Lokesha V., Deepika T., Ranjini P.S., Cangul I.N. Operations of Seedling Growth On American Cotton, J. Envir. Biol., 2018, 39(1),
Nanostructures Via Sdd, Abc4 and Ga5 Indices. Applied Mathe- 83-91.
matics and Nonlinear Sciences, 2017, 2(1), 173-180. [31] Hosamani S.M., Correlation of Domination Parameters with
[27] Molinos-Senante M., Guzman C., Benchmarking Energy Efl- Physicochemical Properties of Octane Isomers, Appl. Math.
ciency in Drinking Water Treatment Plants: Quantification of Po- Nonlin. Sci., 2016, 1(2), 345-352.
tential Savings, J. Clean. Prod., 2018, 176, 417-425.