0% found this document useful (0 votes)
16 views5 pages

Shen18 Interspeech

Uploaded by

ishakalkainar12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

Shen18 Interspeech

Uploaded by

ishakalkainar12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Interspeech 2018

2-6 September 2018, Hyderabad, India

User Information Augmented Semantic Frame Parsing


using Progressive Neural Networks
Yilin Shen, Xiangyu Zeng, Yu Wang, Hongxia Jin

Samsung Research America, Mountain View, CA, USA


{yilin.shen,shane.z,yu.wang1,hongxia.jin}@samsung.com

Abstract train a new semantic frame parsing model. Thus, it is critically


Semantic frame parsing is a crucial component in spoken desirable to design a new semantic frame parsing model to
language understanding (SLU) to build spoken dialog systems. alleviate the needs of both large amount of annotated training
It consists of two main tasks: intent detection and slot data and long training time.
filling. State-of-the-art deep learning models have demonstrated In this paper, we investigate how user information can
good results on these tasks. However, these models require be incorporated into semantic frame parsing to overcome the
not only a large scale annotated training set but also a above drawbacks. We design a novel progressive attention-
long training procedure. In this paper, we aim to alleviate based recurrent neural network (Prog-BiRNN) model that first
these drawbacks for semantic frame parsing by utilizing the annotates the information types and then distills the related
ubiquitous user information. We design a novel progressive prior knowledge w.r.t. each type of information to continue
deep neural network model to incorporate prior knowledge of learning intent detection and slot filling. Our approach
user information intermediately to better and quickly train a is motivated by the recent success of attention-based RNN
semantic frame parser. Due to the lack of benchmark dataset model [2] for joint learning of intent detection and slot filling
with real user information, we synthesize the simplest type and progressive neural networks [13] in many multi-tasking
of user information (location and time) on ATIS benchmark learning applications. Our model includes a main RNN
data. The results show that our approach leverages such simple structure stacked with a set of different layers and they are
user information to outperform state-of-the-art approaches by trained one by one in a progressive manner.
0.25% for intent detection and 0.31% for slot filling using Organization: Section 2 describes the background and
standard training data. When using smaller training data, the related work. We discuss our new problem definition in Section
performance improvement on intent detection and slot filling 3. Section 4 includes our proposed model and its training
reaches up to 1.35% and 1.20% respectively. We also show that procedure details. We show the experimental results in Section
our approach can achieve similar performance as state-of-the- 5. Section 6 concludes the whole paper.
art approaches by using less than 80% annotated training data.
Moreover, the training time to achieve the similar performance 2. Background & Related Work
is also reduced by over 60%.
2.1. Semantic Frame Parsing
Index Terms: Spoken Language Understanding, User
Information Augmentation, Progressive Neural Networks Intent detection and slot filling are two main tasks to build
a semantic frame parser for spoken language understanding
1. Introduction (SLU). That is, the goal of semantic frame parsing is
to understand all varieties of user utterances by correctly
With the emergence of artificially intelligent voice-enabled identifying user’s intents and slot tags. Given an input utterance
personal assistants in daily life, spoken language understanding as a sequence x of length T , intent detection identifies the intent
(SLU) system has attracted increasing research attentions. As class I for x and slot filling maps x to the corresponding label
the key component in a SLU system, semantic frame parsing sequence y of the same length T (Table 1).
aims to identify user’s intent and extract semantic constituents Intent detection is treated as an utterance classification
from a natural language utterance, a.k.a. intent detection problem, which can be modeled using conventional classifiers
and slot filling. Existing approaches includes the independent such as support vector machine (SVM) [1] and RNN based
models for learning intent detection [1, 2] and slot filling models [2]. As a sequence labeling problem, slot filling
[3, 4, 5, 6, 7, 8] separately as well as joint models to learn these can be solved using traditional machine learning approaches
two tasks together [9, 10, 11, 2]. including maximum entropy Markov model [3] and conditional
Unfortunately, the aforementioned approaches suffer from random fields (CRF) [14], as well as recurrent neural network
several main drawbacks. First, they require the existence of a (RNN) based approaches which takes and tags each word in an
large scale annotated corpus to train a high quality parser. Since utterance one by one [4, 5, 6, 7, 8]. Recent research focuses on
a SLU system aims to understand all varieties of user utterances, the joint model to learn two tasks together [9, 10, 11, 2].
the corpus is further required to extensively cover all varieties of
utterances. However, the collection of such an annotated corpus
2.2. Joint Attention-based RNN Model
is very expensive and needs heavy human labor. Secondly, the
training of existing parser models oftentimes takes a long time We recall the state-of-the-art approach in [2], referred to as Att-
to achieve a good performance. These drawbacks are magnified BiRNN model, which will be used as the base of our approach.
especially with the recent quick growth of capabilities in Att-BiRNN is a joint RNN model to learn the two tasks together.
personal assistants [12]. To develop a new domain, we need It first uses a bidirectional RNN with a basic LSTM cell to read
to generate a new utterance dataset and take a long time to the input utterance as a sequence x. At each time stamp t, a

3464 10.21437/Interspeech.2018-1149
Flight
context vector ct is learned to concatenate with the RNN hidden
state ht , i.e., ct ⊕ ht , to learn a slot attention for predicting Intent
Detection
the slot tag yt . All hidden states of slot filling attention layer
are used to predict the intent label in the end. The objective O B-fromloc O B-toloc
function of Att-BiRNN model is as follows: Slot
YT
Filling
P (y|x) = max P (yt |y1 , . . . , yt−1 , x; θr , θs , θI ) (1) weighting weighting weighting weighting
θr ,θs ,θI
t=1 𝐝1 𝐡1 𝐜1 𝐝2 𝐡2 𝐜2 𝐝3 𝐡3 𝐜3 𝐝4 𝐡4 𝐜4
where θr , θs , θI are the trainable parameters of different
components (utterance BiRNN, slot filling attention layer and User
intent classifier) in Att-BiRNN model. Info
Tagging
O B-loc O B-loc

3. Problem Definition 𝐡1 𝐜1 𝐡2 𝐜2 𝐡3 𝐜3 𝐡4 𝐜4
Utterance
We propose the User Info Augmented Semantic Frame Parsing BiLSTM BiLSTM BiLSTM BiLSTM BiRNN
problem for the same two tasks, intent detection and slot filling,
… between Miami …
by considering the following additional inputs. NY and
Figure 1: Progressive Attention based RNN Model
User Info Dictionary: This defines the categorical relations
between user info type and slots. In other words, each key semantic meaning for each type of user info is distilled into the
in the dictionary is a type of user info and its corresponding model to continue training for intent detection and slot filling.
value is the slots belonging to this type. The generation of
this dictionary is not the focus of our paper since it can be Table 1: ATIS corpus sample with intent and slot annotations
simply generated by a software developer when he generates with additional user info and its corresponding user info
slots during the development of a new domain in practice. sequence (in gray)
Each type of user info is associated with an external utterance (x) round trip flights between ny and miami
or pre-trained model to extract their semantically meaningful
slots (y) B-round trip I-round trip O O B-fromloc O B-toloc
prior knowledge. For example, the semantics of a location is
represented by its longitude and latitude such that the distance intent (I) atis flight
between two locations reflect their actual geographical distance. user info (U ) {“User Location” : “Brooklyn, NY”}
User Info for Each Utterance: Each input sequence x is user info seq (z) O O O O B-loc O B-loc
associated with its corresponding user info U . U is represented
as a set of tuples, hInfo Type, Info Contenti. As an example
utterance in Table 1, the first gray row shows our generated user As shown in Figure 1, our proposed Prog-BiRNN model
info with type “User Location” and content “Brooklyn, NY”. is designed based on the state-of-the-art Att-BiRNN model [2],
Learning user info has been well studied, such as user which consists of the following four main components.
contextual information (e.g., time, location, activity, etc.) via Utterance BiRNN Layer: We use the same bidirectional RNN
smartphone [15], Internet of Things [16] and user interests (e.g., (BiRNN) to encode an utterance with LSTM cells (BiLSTM)
favorite food, etc.) using recommendation models [17]. as in [2]. The hidden state ht at each time step t is the
concatenation of forward state fht and backward state bht , i.e.,
Remarks: One may argue that this is a simple extension of
ht = fht ⊕ bht .
semantic frame parsing problem in which the user info can
be simply encoded into an existing model as a new input or a User Info Tagging Layer: This component labels the user
new state. However, these naive approaches ignore the different info type for each word in the input utterance. Since the
semantic meanings between user info and language context in labeling is based on the language context of input utterance, we
an utterance, as well as between different types of user info. follow the previous work [2] to use a language context vector
Thus, as we later show in experiment (Section 5), these baseline ct at each time stamp t via the P weighted sum of all hidden
T
approaches do not show any advantage over existing approaches states {hk }∀1≤k≤T i.e., ct = k=1 αt,k hk . Here, α t =
exp(et,j )
without user info. softmax(et ), i.e., αt,j = PT exp(e ) . et,k = g(sut−1 , hk )
k=1 t,k
is also learned from a feed forward neural network g with the
4. Proposed Approach previous hidden state sut−1 defined as the concatenation of ht−1
and ct−1 , i.e., sut−1 = ht−1 ⊕ ct−1 . At each time step t, the
In this section, we describe the main idea and details of our user info tagging layer outputs Pu (t) as follows:
proposed Prog-BiRNN model as well as its training procedure. Ptu = softmax(Wu sut ); z̃t = arg max Ptu (2)
θu

4.1. Progressive Attention-based RNN Model Slot Filling Layer: This is the key layer for distilling user info
into the model to help reduce the need of annotated training
As the name indicates, our main idea is to train the semantic data. It shares the same hidden state ht and language context ct
frame parsing model progressively with an intermediate task with the user info tagging layer. For each word in the utterance,
before achieving the final goal of intent detection and slot we use external knowledge to derive the prior distance vectors
filling. This is motivated by the recent success of progressive dt = {dt (1), . . . , dt (|U |)} for each time stamp t (green in
neural networks [13]. Specifically, for each utterance x, Figure 1) where |U | is the number of user info types in IOB
we first define the user info sequence z using the user info format. And each element djt is defined as follows:
dictionary. In Table 1, the last row shows the user info sequence 
corresponding to this example. Our approach first trains a dt (j) = sigmoid β (j) δ t (j) (3)
user info tagging to derive z. Then, the prior knowledge with where stands for element-wise multiplication. (j) is a |U |
β

3465
dimensional trainable vector; and δ t (j) is the distance between where |S| is the number of slots in IOB format and |I| is the
the tth word and user info w.r.t. the prior knowledge of type j. number of intents. P (i) stands for the probability P (X = xi ).
Next, we define the calculation of distance δt (j) for each Moreover, θr , θu , θs , θI are the parameters in utterance BiRNN,
info type j at time stamp t, through the example in Figure 1. Let user info tagging, slot filling and intent detection components in
δt (loc) be the distance w.r.t. the location type of user info. It is our proposed Prog-BiRNN model.
a one-dimensional scalar in this case. Taking the second word
“NY” as an example, we have its following location distance 4.2.2. Details of IOB Format Support
since it is tagged as “Location” type of user info:
Thanks to the progressive training procedure, the IOB format
δ2 (loc) = dist(“NY”, “Brooklyn, NY”) ≈ 4.8 (miles)
will be naturally supported in our model. As shown in Figure
by using external location based services, i.e., Google Maps
2, in the case of “New York” with “B-loc I-loc” user info tags,
Distance Matrix API [18]. If the word and user info are of
we take them together to extract the prior geographical distance
different types, we set the distance δt (j) as -1 such that its
dist(“New York”, “Brooklyn, NY”). Moreover, since B-loc and
corresponding dt (j) will be close to 0 via the sigmoid function.
I-loc are considered as different tags in the output Ptu of user
To feed the prior distance vectors dt into the slot filling info tagging component, they can be directly used to infer B-
layer, we weight each element dt (j) and the language context fromloc and I-fromloc in slot filling component accordingly.
ct over the softmax probability distribution Ptu from the user
In the case that the type of user info for the tth word is
info tagging layer. Intuitively, this determines how important
incorrectly tagged, the hidden state ht and language context ct
a type of user info or the language context in utterance is to
will be used to infer the slot tags since the user info tagging
predict the slot tag of each word in the utterance. Thus, we have
output Ptu will weight more on ht , ct in this case. In addition,
the input Φ t of LSTM cell at each time step t in slot filling layer
the second phase of training procedure for joint training of all
as follows:
components also leans to use more language context to correct
Φ t = Ptu (1)dt (1) ⊕ · · · ⊕ Ptu (|U |)dt (|U |) ⊕ PtO ct (4)
the incorrectly tagged type of user info.
where Ptu (j) and PtO stand for the probability that the tth word
is predicted as j type of user info and as “O” meaning none of B-fromloc I-fromloc
the types. Note that we will discuss how to deal with IOB format
Slot
in Section 4.2.2. At last, the state sst at time step t is computed Filling
as ht ⊕ Φ t and the slot tag is predicted as follows: 𝐝1 𝐝2
Pts = Ws sst ; ỹt = arg max Pts (5) 𝐡1 𝐜1 𝐡2 𝐜2
θs User
𝛿1 (loc)=𝛿2 (loc)
Intent Detection Layer: We add an additional intent detection =dist(NY, Brooklyn, NY) Info
Tagging
layer as in [2] to generate the probability distribution PI of B-loc I-loc
intent class labels by using the concatenation of hidden states … New York …
from slot filling layer, i.e., sI = ss1 ⊕ . . . ⊕ ssT . Figure 2: Support of IOB Format (omitted other model details)
P I = softmax(WI sI ); I˜ = arg max P I
θI
Remarks: The sharing of hidden state ht and language context Remarks: The capability of prior knowledge distillation in
ct between user info tagging and slot filling layers is crucial to our approach leverages user information to largely improve
reduced the required annotated training data. For the user info the performance and reduce the requirement of annotated
tagging layer, ht , ct are mainly used to tag the words which training data. Moreover, the overall training time is also
belong to one type of user info. The semantic slots of these largely shortened since our approach divides SLU into simpler
words can be easily tagged in slot filling layer by utilizing the subproblems in which each subproblem is much easier to train.
distilled prior knowledge instead of using ht , ct again. The
slot filling then depends on ht , ct to tag the rest of words not 5. Experimental Evaluation
belonging to any type of user info.
5.1. Dataset
4.2. Progressive Training with IOB Format Support We evaluate our approach on the ATIS (Airline Travel
4.2.1. Training Algorithm Information Systems) dataset [19], a widely used dataset in
SLU research. The training set contains 4,978 utterances from
The training procedure is progressively conducted step by step. the ATIS-2 and ATIS-3 corpora, and the test set contains 893
The first step is to train user info tagging component with loss utterances from the ATIS-3 data sets. There are 127 distinct
function Lu as follows: slot labels and 22 different intent classes.
|U | n
1 XX Due to the lack of benchmark datasets with user info,
Lu (θr , θu ) , − zt (i) log Ptu (i) (6) we design the following two mechanisms to synthesize two
n i=1 t=1
types of user info, user contextual location and user preferred
where |U | is the number of user info types in IOB format. time periods in ATIS dataset. We first construct the user
Then, we train the slot filling layer with loss function Ls info dictionary by including all slots with ”loc” keyword in
and intent classifier with loss function LI simultaneously. In contextual location and including all slots with ”time” keyword
the meanwhile, we also allow the fine tuning of parameters θr in user preferred time period.
and θu in utterance BiRNN and user info tagging layers. The prior distance δ of contextual location are computed
|S| n
1 XX using Google Maps Distance Matrix API [18]. For time period,
Ls (θr , θI , θs , θu ) , − yt (i) log Pts (i) (7) we calculate δ by using the difference between the tagged time
n i=1 t=1
stamp in an utterance and the middle time stamp of the user
|I|
X preferred time period.
LI (θr , θI , θs , θu ) , − I(i) log P I (i) (8)
i=1
Contextual Location: W.l.o.g., we synthesize user contextual

3466
Table 2: Examples of synthesized user info in ATIS dataset 97.5 96.2

Intent Classifier Accuracy (%)


96
97 95.8
User Info 95.6
Utterance

F1 Score
96.5 95.4
Type Content 95.2
96 95

i need a flight from dallas to san francisco 94.8

contextual location Fort Worth,TX 95.5 Prog-BiRNN 94.6 Prog-BiRNN

{“fromloc.city name”: “dallas”} Att-BiRNN Baseline without User Info


Att-BiRNN Baseline with User Info
94.4
Att-BiRNN Baseline without User Info
Att-BiRNN Baseline with User Info
95 94.2
2000 3000 4000 5000 2000 3000 4000 5000
all flights to baltimore after 6 pm preferred
evening Size of Training Set Size of Training Set
{“depart time.time”: “6 pm”} depart period
i want to fly from boston at 838 am and
(a) Contextual Location Only
contextual location
arrive in denver at 1110 in the morning Cambridge,MA 97.5 96.4

Intent Classifier Accuracy (%)


96.2
{“fromloc.city name”: “boston”} 97 96
preferred
{“arrive time.time”: “1110”} morning 95.8

arrive period

F1 Score
96.5 95.6
{“arrive time.period of day”: “morning”} 96
95.4
95.2
95
95.5 Prog-BiRNN 94.8 Prog-BiRNN

locations based on the intuitive assumption that user’s location 95


Att-BiRNN Baseline without User Info
Att-BiRNN Baseline with User Info
94.6
94.4
Att-BiRNN Baseline without User Info
Att-BiRNN Baseline with User Info

is usually close to flight depart city. We first extract all 2000 3000
Size of Training Set
4000 5000 2000 3000
Size of Training Set
4000 5000

values (real locations) of slots which contains ”fromloc” in their


names. Then, for each real location, we use Google Places API (b) Contextual Location & Preferred Time Periods
[20] to find the nearby cities within 50 km. For each utterance Figure 3: Performance results with different sizes of training set
having slots containing ’fromloc’, we add the nearby city of this
our reported intent detection accuracy is different from that
slot value as its location. When there are more than one nearby
in baseline paper [2] since we use all 22 intents in ATIS
cities, we randomly select one from them.
dataset. In particular, when using smaller training data, i.e.,
Preferred Time Periods: We follow Oxford dictionary to 2000 training data, the performance improvement on intent
consider four periods of a day: morning (6am-12pm), afternoon detection and slot filling reaches 1.35% and 1.20% respectively.
(12pm-6pm), evening (6pm-12am), night (12am-6am). In each More significantly, our Prog-BiRNN model can use less than
utterance having the slots with ”time” keyword, we generate 4000 (80%) annotated utterances with simple user location
one depart and one arrive time preference by selecting from and preferred time period as training data to achieve the
these four periods as follows: If there is a slot containing performance of baseline approaches for both intent detection
’depart time’, we set the preferred time period based on and slot filling.
the value of this slot. For example, if the slot value is
“8pm”, we set the preferred time period to be “evening” 5.4. Training Time Results
since “8pm” belongs to the period 6pm-12am. For the slot
’depart time.period of day’, we simply match the key words to We also report the training time between our Prog-BiRNN and
synthesize the user preferred depart time period. We synthesize baseline approaches. Since our approach mainly focuses on
the arrive period preference in the same way. improving slot filling, Figure 4 reports the averaged slot filling
F1 score after each epoch of training. Thanks to the small
5.2. Baseline Competitors & Implementation Details number of user info types, the first user info tagging training
phase only takes 3 epochs to achieve over 92% accuracy, which
In addition to the state-of-the-art baseline Att-BiRNN in [2], is sufficient for the second training phase. As one can see,
we also design another baseline competitor using user info as the number of epochs (3 epochs included) takes to achieve
discussed at the end of Section 3. For the sake of fairness, we a competitive performance of slot filling is around over 60%
consider concatenating the user info directly to the input of slot smaller than both two baseline approaches.
filling layer in the Att-BiRNN. All user info is concatenated
together without distinguishing different types. We call these 100

two baselines Att-BiRNN with/without User Info respectively. 90


80
Also, we follow the exact same hyperparameters in the 70
F1 Score

original paper of the base Att-BiRNN model [2] since our model 60
50
does not have additional hyperparameters. 40
30 Prog-BiRNN
20 Att-BiRNN Baseline without User Info

5.3. Results with Different Sizes of Training Set 10


Att-BiRNN Baseline with User Info

10 20 30 40 50 60 70 80 90

We evaluate our Prog-BiRNN model on subsets of full size Number of Epochs

ATIS training set and randomly sampled 3 different sizes Figure 4: Training time results on full size training set using
(2,000, 3,000 and 4,000) utterances out of the total 4,978 both contextual location & preferred time periods as user info
utterances. Figure 3 reports the average performance results on
10 differently sampled training set of each size.
Since location related slots are the majority of all slots in 6. Conclusion
ATIS dataset, we first consider only using contextual location
as user info. As shown in Figure 3a, the F1 score of slot We present a novel progressive neural network model to
filling outperforms both baseline approaches with around 0.2% train a semantic frame parsing model by incorporating user
absolute gain of each size. The accuracy improvement of information. By using simple user information, we show that
intent detection is around 0.1% and up to 0.2% for full size our approach not only significantly improves the performance
training set. This slightly smaller improvement margin is but largely reduces the needs of annotated training set as well.
due to the small number of intent classes. When using both In addition, our approach also shows its ability to shorten the
contextual location and preferred time period as user info, we training time for achieving the competitive performance. Thus,
observe more significant improvement with 0.25% gain for we enable the quick development of a semantic frame parsing
intent detection and 0.31% gain for slot filling. Note that model with less annotated training set in new domains.

3467
7. References [18] https://developers.google.com/maps/documentation/distance-
matrix.
[1] P. Haffner, G. Tur, and J. H. Wright, “Optimizing svms
for complex call classification,” in Acoustics, Speech, and [19] C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The
Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE atis spoken language systems pilot corpus,” in Proceedings
International Conference on, vol. 1. IEEE, 2003, pp. I–I. of the Workshop on Speech and Natural Language, ser. HLT
’90. Stroudsburg, PA, USA: Association for Computational
[2] B. Liu and I. Lane, “Attention-based recurrent neural network Linguistics, 1990, pp. 96–101.
models for joint intent detection and slot filling,” arXiv preprint
arXiv:1609.01454, 2016. [20] https://developers.google.com/places/web-service.
[3] A. McCallum, D. Freitag, and F. C. N. Pereira, “Maximum
entropy markov models for information extraction and
segmentation,” in Proceedings of the Seventeenth International
Conference on Machine Learning, ser. ICML ’00. San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000,
pp. 591–598.
[4] K. Yao, B. Peng, Y. Zhang, D. Yu, G. Zweig, and Y. Shi, “Spoken
language understanding using long short-term memory neural
networks,” in 2014 IEEE Spoken Language Technology Workshop
(SLT), Dec 2014, pp. 189–194.
[5] G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. Hakkani-
Tur, X. He, L. Heck, G. Tur, D. Yu et al., “Using recurrent neural
networks for slot filling in spoken language understanding,”
IEEE/ACM Transactions on Audio, Speech and Language
Processing (TASLP), vol. 23, no. 3, pp. 530–539, 2015.
[6] B. Peng and K. Yao, “Recurrent neural networks with
external memory for language understanding,” arXiv preprint
arXiv:1506.00195, 2015.
[7] B. Liu and I. Lane, “Recurrent neural network structured
output prediction for spoken language understanding,” in Proc.
NIPS Workshop on Machine Learning for Spoken Language
Understanding and Interactions, 2015.
[8] G. Kurata, B. Xiang, B. Zhou, and M. Yu, “Leveraging
sentencelevel information with encoder lstm for natural language
understanding,” arXiv preprint, 2016.
[9] D. Guo, G. Tur, W.-t. Yih, and G. Zweig, “Joint semantic utterance
classification and slot filling with recursive neural networks,”
in Spoken Language Technology Workshop (SLT), 2014 IEEE.
IEEE, 2014, pp. 554–559.
[10] P. Xu and R. Sarikaya, “Convolutional neural network based
triangular crf for joint intent detection and slot filling,” in
2013 IEEE Workshop on Automatic Speech Recognition and
Understanding, Dec 2013, pp. 78–83.
[11] D. Hakkani-Tür, G. Tür, A. Celikyilmaz, Y.-N. Chen, J. Gao,
L. Deng, and Y.-Y. Wang, “Multi-domain joint semantic frame
parsing using bi-directional rnn-lstm.” in INTERSPEECH, 2016,
pp. 715–719.
[12] http://www.businessinsider.com/amazon-alexa-how-many-skills-
chart-2017-7.
[13] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer,
J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell,
“Progressive neural networks,” arXiv preprint arXiv:1606.04671,
2016.
[14] C. Raymond and G. Riccardi, “Generative and discriminative
algorithms for spoken language understanding,” in
INTERSPEECH, 2007.
[15] . Yürür, C. H. Liu, Z. Sheng, V. C. M. Leung, W. Moreno, and
K. K. Leung, “Context-awareness for mobile sensing: A survey
and future directions,” IEEE Communications Surveys Tutorials,
vol. 18, no. 1, pp. 68–93, 2016.
[16] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos,
“Context aware computing for the internet of things: A survey,”
IEEE Communications Surveys Tutorials, vol. 16, no. 1, pp. 414–
454, 2014.
[17] X. Su and T. M. Khoshgoftaar, “A survey of
collaborative filtering techniques,” Adv. in Artif. Intell.,
vol. 2009, pp. 4:2–4:2, Jan. 2009. [Online]. Available:
http://dx.doi.org/10.1155/2009/421425

3468

You might also like