0% found this document useful (0 votes)

60 views10 pages

STJ Post

The document summarizes three research articles related to emerging intelligent technologies and their relevance to Japan's vision of a "Super Smart Society". The first article presents a method to infer room semantics using acoustic monitoring on mobile devices. The second introduces a novel reward function for reinforcement learning that encourages curiosity and knowledge exploration. The third enhances interpretability in generative adversarial networks. These technologies support applications like intelligent indoor assistance on mobile devices and development of autonomous systems, advancing Japan's goal of a society with high-quality, personalized services.

Uploaded by

Chris Ayling

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views10 pages

STJ Post

Uploaded by

Chris Ayling

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Emerging Intelligent Technologies and the Japanese

Super Smart Society

Science and Technology in Japan Final Report

Christopher Ayling

August 2018

Selected Field Computer Science / AI

Selected Subject Super Smart Society

1 Topic Outline
As seen in the 2016 White Paper released by the Japan Ministry of Edu-
cation, Culture, Sports, Science and Technology (MEXT), artificial intelli-
gence plays a central role in the functioning of the future Japanese Society
(dubbed a ”Super Smart Society”). [4] describes a Super Smart Society as

”a society that is capable of providing the necessary goods and

services to those who need them at the required time and in just
the right amount; a society that is able to respond precisely to a
wide variety of social needs; a society in which all kinds of people
can readily obtain high-quality services, overcome differences of
age, gender, region and language, and live active, comfortable
lives.”

Intelligent technologies are technologies with the ability to perform tasks

without being explicitly programmed. This includes statistical models used
for predicting the market, the deep learning technology used for self driv-
ing cars and the reinforcement learning algorithms behind humanoid robot
motor control. The development of intelligent systems draws insights, inspi-
ration and technical knowledge from many different areas of mathematics,
engineering, physics, biology, human computer interaction and more.

1
2 Introduction
This report presents three articles of research related to computer science
and AI technologies relevant to the advent and functioning of Japan’s Super
Smart Society. For each presented article; the context, technical details,
results, relevance and role will be summarized and discussed.
The first article is a paper which presents a novel approach to detecting
information about a room using a mobile device’s microphone [3]. The
second proposes a novel reward function for use in reinforcement learning
[1] while the third introduces an interpretability enhancing variation for the
modern generative adversarial network (GAN) architecture [9].

3 Research Articles
3.1 Inferring Room Semantics Using Acoustic Monitoring [3]
3.1.1 Context
in 2017, the 2017 IEEE International Workshop on Machine Learning for
Signal Processing was held in Roppongi, Tokyo. At the event, recent ad-
vances in machine learning for signal processing were presented and tutored.
These advances included using convolution neural networks (CNN) for inter-
pretable EEG analysis, the performance of sketch to photo inversion using
generative GANs as well as techniques for inferring room semantics from
audio. This section focuses on the paper about room semantic inference and
provides a summary of the paper’s technical details and results along with a
discussion on the technology’s impact on and relevance to the Super Smart
Society.

3.1.2 Technical Description

The method proposed and investigated by the paper is as follows. To infer
the semantics of a room, two features are extracted from the audio data; am-
bient sounds and impulse response. Room impulse response (RIR) models
effects of the room on acoustic signal and is extracted using a non-negative
de-convolutional approach while ambient sounds are extracted using Mel-
Frequency Cepstral Coefficients (MFCC). For classification, Gaussian Mix-
ture Models (GMM) are used for RIR and Suport Vector Machines (SVM)
are used for the ambient sounds. The outputs of these classifiers are then
linearly combined.

2
Figure 1: Test set confusion matrix from [3]

Due to the monoaural nature of mobile device microphones, extracting

the exact RIR proved difficult. For the purpose of inferring room semantics
it is fortunate that and approximation of the RIR contained the information
necessary for accurate classification.
In order to test the effectiveness of the theorized technique, a data set
was compiled and experiments were run. The data set was recorded us-
ing smartphones and includes over 12,000 samples. The data set contains
recordings from bathrooms, offices, pantries, classrooms and lecture halls.
Both classifiers were tested independently and in aggregate.

3.1.3 Results
Results can be seen in 1. The SVM which used ambient sounds performed
better than the GMM using RIR. It was noted that because of distinctive
structural features, RIR is effective in rooms like bathrooms and lecture
theatres.

3.1.4 Relevance to ”Super Smart Society”

Technology able to operate in-doors and automatically is relevant to the
Japanese society, especially for members living in cities. Detecting the pur-
pose of a room is a key step towards making mobile devices more intelligent
and autonomous in ways that they assist humans. A smartphone could au-
tomatically be silent while on a train or turn off notifications not related to
work while at the office.
The paper stated that the research was part of a larger project for cre-
ating an automated annotation system for indoor floor plans.

3
3.2 Curiosity-driven reinforcement learning with homeostatic
regulation [1]
3.2.1 Context
Araya is a Japanese research laboratory based in Tokyo. Araya’s mission is
”to transform information into value for society” and as such they are con-
stantly exploring new technologies. The skills and fields represented among
the members of Araya include data science, neuroscience, physics, maths
and psychiatry. Research produced by Araya has been published in journals
such as Nature and PNAS and presented at conferences such as NIPS and
ICIIBMS. This section will focus on the paper presented the International
Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)
held in Okinawa, Japan in late 2017. The paper’s title is ”Curiosity-driven
reinforcement learning with homeostatic regulation”.
Reinforcement learning (RL) is a discipline of machine learning where
problems are formulated to involve an agent in an environment aims to
maximize a reward function. An industrial robot in a warehouse attempt-
ing to correctly store items or a humanoid robot learning to dance for the
entertainment of humans are examples of problems suited for RL.
In RL, the reward function captures the goal of the agent. The agent
then learns the actions to take in each possible state that maximize the
reward. Rewards can be either extrinsic or intrinsic. Examples mentioned
earlier such as box stacking are extrinsic while an an aim to learn something
new is intrinsically motivational. Intrinsic rewards are important because
as opposed to the honing of narrowly applicable skills, the development of
broad competencies is favoured [7].

3.2.2 Summary
The paper Curiosity-driven reinforcement learning with homeostatic reg-
ulation presents a novel reward function. The proposed reward function
encourages both actions leading to new situations and actions that lead to
future situations where future actions will yield more information about a
future state.
Encouraging actions which lead to new situations and knowledge is
known as curiosity. The amount of new information learned is quantified by
calculating the difference between the agent’s predicting and the observed
state [6]. This difference is known as the forward model error.

4
3.2.3 Technical Description
The proposed reward function is an extension on the function proposed by
[6].

RewardPATHAK2017 = ||St+1 − f (st , at )||2

RewardABRIL2018 = ||St+1 − f (st , at )||2 − α||St+1 − k(St , at , at+1 )||2

Where f (st , at ) is the forward model and k(St , at , at+1 ) is the extended
model. These models were implemented using deep neural networks but
should work with any function suitable for predicting the future state. The
first equation (from [6]) captures heterostatic motivation and encourages
actions which lead to large forward model errors. The second equation
combines the heterostatic motivation with homeostatic motivation by also
encouraging action which lead to forward model errors but in areas of the
state-action space which is already familiar to the agent. The advantage of
the new approach is the regulation of the motivation to explore completely
new state spaces by balancing it with a motivation to fill out its knowledge
of existing state spaces. α determines strength of regulation.

3.2.4 Results
The hypothesis tested through experimental validation was ”exploring an
environment with several non-linearities could be optimized by regulating
the agent curiosity with homeostatic drive”. The experiment setup was a
three room environment where an agent learns a control policy with varying
levels of homeostatic regularization.

5
Figure 2: Accuracy of the forward model learned by the agent as a function
of homeostatic regulation (α). [1]

3.2.5 Relevance to ”Super Smart Society”

Reinforcement learning techniques aim at developing agents with under-
standings and skills that generalize to applications and domains separate
from what it was explicitly trained for. Featured in [4] are various intelli-
gent systems such as cars and social robots. Due to their complexity, AI
learning technology is necessary for their practical implementation. The
currently popular deep feed forward neural networks are too narrow and
lack an architecture to support using it to control a social effectively. Rein-
forcement learning techniques integrated with deep learning algorithms have
been shown to be effective at a wide range of tasks [5]. This success makes
deep reinforcement learning a certain candidate for the algorithmic tech-
nologies powering the Super Smart Society. Choosing the correct reward
function when implementing and RL algorithm is crucial for determining
the agent’s motives and goals and ensuring that these align with that of the

6
Super Smart Society. It is important for society that smart agents align
ethically in all stages of decision making and acting.

3.3 Generative Adversarial Network for Understanding La-

tent Space [9]
3.3.1 Context
Generative Adversarial Networks (GANs) were first introduced in 2014 by
[2]. The function of a GAN is as a generative model estimator. GANs
operate by training two opposing networks, hence ”adversarial”. One model
is a generator and the other is a discriminator. Their respective aims are to
capture a data distribution and to estimate the probability that the sample
is from the training data as opposed to a generated example [2].

Figure 3: GAN architecture [9]

The outputs of the generator network are based on inputs of random

noise.

3.3.2 Technical Description

The aim of the paper titled ”Generative Adversarial Network for Under-
standing Latent Space” discussed in this section was to design a GAN capa-
ble of inferring semantics from the random noise input. The random noise
input is also known as the latent space of the generator’s outputs. The goal
of the research was to develop a GAN architecture (named SelfExGAN) for
understanding the the meaning of the latent space.

7
SelfExGAN has three components; an encoder (E), generator (G) and
discriminator (D). An existing architecture known an adversarial generator
encoder networks (AGE) uses the same components [8]. SelfExGAN makes
use of a Nash equilibrium between the components in order to relate latent
inputs to training data while AGE networks aim to minimize reconstruction
error.

3.3.3 Results

Figure 4: Visualisation of the 2D t-SNE of the latent space representation

on MNIST dataset [9]

The SelfExGAN architecture has multiple applications including the gen-

8
eration of new labelled training data and evaluation of similarities. 5 shows
examples of fake data and 4 shows a visualization of the latent space. The
clustered structure seen in the visualization indicates that the latent space
of the SelfEXGan is indeed understandable.

Figure 5: Fake data generated from the cifar10 dataset [9]

3.3.4 Relevance to ”Super Smart Society”

The aim of using generative adversarial networks is to train a neural network
which instead of processing data (e.g. a classifier or regressor) outputs con-
tent such as images, text, music and more. Generative technologies represent
a paradigm shift from traditional discrimination to computable, optimized
and automated creation. Within a Super Smart Society there are many use
cases such as product design, manufacturing, graphic design as seen in in
[4]. As such, developing a higher understanding of the workings and mean-
ings of both inputs and outputs of a generative system is important for
interpret-ability and development.

4 Conclusion
Three papers which varied in domain, scope, methods and technology were
summarized and their relevance to the Super Smart Society discussed.

References
[1] Ildefons Magrans de Abril and Ryota KANAI. Curiosity-driven rein-
forcement learning with homeostatic regulation. https://arxiv.org/

9
pdf/1801.07440.pdf, 2018. Accessed: 12/08/2018.

[2] Goodfellow Ian J, J Pouget-Abadie, B Mirza M, Xu, D Warde-Farley,

S Ozair, A Courville, and Y Bengio. Generative adversarial nets. https:
//arxiv.org/pdf/1406.2661.pdf, 2014. Accessed: 13/08/2018.

[3] B. Raj M. A. Shah and K. A. Harras. Inferring room semantics us-

ing acoustic monitoring. https://arxiv.org/abs/1710.08684, 2017.
Accessed: 30/07/2018.

[4] MEXT. White paper on science and technology 2016 (provisional

translation). http://www.mext.go.jp/en/publication/whitepaper/
title03/detail03/1384513.htm, 2016. Accessed: 12/06/2018.

[5] Kavukcuoglu K. Silver D. Rusu A A. Veness J. Bellemare M G. Graves

A Riedmiller M. Fidjeland A k. Ostrovski G. Peterson S. Beattie C.
Sadik A. Antonolglou I . King H. Kumaran D. Wiestra D. Legg S.
Minh, V. and D. Hassabis. Human-level control through deep reinforce-
ment learning. https://storage.googleapis.com/deepmind-media/
dqn/DQNNaturePaper.pdf. Accessed: 13/08/2018.

[6] Agrawal P . Efros A A. Pathak, D. and T Darrell. Curiosity-driven ex-

ploration by self-supervised prediction. https://arxiv.org/pdf/1705.
05363.pdf, 2017. Accessed: 15/08/2018.

[7] S Sing, A. G Barto, and N Chentanez. Intrinsically motivated rein-

forcement learning. https://web.eecs.umich.edu/~baveja/Papers/
FinalNIPSIMRL.pdf, 2005. Accessed: 13/08/2018.

[8] Vedaldi A. Ulyanov, D. and V Lempitsky. It takes (only) two: Adversar-

ial generator-encoder networks. https://arxiv.org/pdf/1704.02304.
pdf, 2017. Accessed: 15/08/2018.

[9] 劉永 . Generative adversarial network for understanding latent space.

https://repository.dl.itc.u-tokyo.ac.jp/?action=pages_view_
main&active_action=repository_view_main_item_detail&item_
id=49537&item_no=1&page_id=41&block_id=85, 2018. Accessed:
12/08/2018.