Skip to main content

Showing 1–17 of 17 results for author: Salvi, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16547  [pdf, other

    eess.AS cs.AI cs.SD

    Developing Acoustic Models for Automatic Speech Recognition in Swedish

    Authors: Giampiero Salvi

    Abstract: This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 16 pages, 7 figures

    MSC Class: 68T10 ACM Class: I.5.0; I.2.0; I.2.7

    Journal ref: European Student Journal of Language and Speech, 1999

  2. arXiv:2401.06588  [pdf, other

    eess.AS cs.AI cs.CV cs.LG cs.SD

    Dynamic Behaviour of Connectionist Speech Recognition with Strong Latency Constraints

    Authors: Giampiero Salvi

    Abstract: This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolut… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    ACM Class: I.5.0; I.2.7; E.4

    Journal ref: Speech Communication Volume 48, Issue 7, July 2006, Pages 802-818

  3. arXiv:2401.05717  [pdf, other

    eess.AS cs.IT cs.LG cs.SD

    Segment Boundary Detection via Class Entropy Measurements in Connectionist Phoneme Recognition

    Authors: Giampiero Salvi

    Abstract: This article investigates the possibility to use the class entropy of the output of a connectionist phoneme recogniser to predict time boundaries between phonetic classes. The rationale is that the value of the entropy should increase in proximity of a transition between two segments that are well modelled (known) by the recognition network since it is a measure of uncertainty. The advantage of th… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    ACM Class: I.5.0; I.2.7; E.4

    Journal ref: Speech Communication Volume 48, Issue 12, December 2006, Pages 1666-1676

  4. arXiv:2307.06701  [pdf, other

    cs.CV cs.AI cs.LG

    S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

    Authors: Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi

    Abstract: We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capab… ▽ More

    Submitted 11 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: 14 pages, 7 figures, 3 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence on 2023-07-12

    ACM Class: I.2.10; I.4.10; I.4.5; I.4.2; I.2.6

  5. arXiv:2208.04554  [pdf, other

    cs.CV cs.LG

    Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation

    Authors: Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi

    Abstract: We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. By utilizing a novel objective function, each layer in HR-VQVAE learns a discrete representation of the residual from previous layers through a vector quantized encoder. Furthermore, the representations at each layer are hierarchically linked to those at previou… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

    Comments: 12 pages plus supplementary material. Submitted to BMVC 2022

    ACM Class: I.4; I.2

  6. arXiv:2106.06147  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    NAAQA: A Neural Architecture for Acoustic Question Answering

    Authors: Jerome Abdelnour, Jean Rouat, Giampiero Salvi

    Abstract: The goal of the Acoustic Question Answering (AQA) task is to answer a free-form text question about the content of an acoustic scene. It was inspired by the Visual Question Answering (VQA) task. In this paper, based on the previously introduced CLEAR dataset, we propose a new benchmark for AQA, namely CLEAR2, that emphasizes the specific challenges of acoustic inputs. These include handling of var… ▽ More

    Submitted 12 January, 2024; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) in April 2021 (first revision February 2022)

    ACM Class: I.2.7; I.2.10; I.5.0

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, Volume: 45 Issue: 4, Page(s): 4997-5009

  7. arXiv:2009.05184  [pdf, other

    eess.SP cs.CR eess.SY

    STEP-GAN: A Step-by-Step Training for Multi Generator GANs with application to Cyber Security in Power Systems

    Authors: Mohammad Adiban, Arash Safari, Giampiero Salvi

    Abstract: In this study, we introduce a novel unsupervised countermeasure for smart grid power systems, based on generative adversarial networks (GANs). Given the pivotal role of smart grid systems (SGSs) in urban life, their security is of particular importance. In recent years, however, advances in the field of machine learning, have raised concerns about cyber attacks on these systems. Power systems, amo… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  8. arXiv:1902.11280  [pdf, ps, other

    cs.LG cs.SD eess.AS stat.ML

    From Visual to Acoustic Question Answering

    Authors: Jerome Abdelnour, Giampiero Salvi, Jean Rouat

    Abstract: We introduce the new task of Acoustic Question Answering (AQA) to promote research in acoustic reasoning. The AQA task consists of analyzing an acoustic scene composed by a combination of elementary sounds and answering questions that relate the position and properties of these sounds. The kind of relational questions asked, require that the models perform non-trivial reasoning in order to answer… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.

  9. arXiv:1902.09705  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Beyond the Self: Using Grounded Affordances to Interpret and Describe Others' Actions

    Authors: Giovanni Saponaro, Lorenzo Jamone, Alexandre Bernardino, Giampiero Salvi

    Abstract: We propose a developmental approach that allows a robot to interpret and describe the actions of human agents by reusing previous experience. The robot first learns the association between words and object affordances by manipulating the objects in its environment. It then uses this information to learn a mapping between its own actions and those performed by a human in a shared environment. It fi… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: code available at https://github.com/gsaponaro/tcds-gestures, IEEE Transactions on Cognitive and Developmental Systems

    Journal ref: IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, pp. 209-221, June 2020

  10. arXiv:1811.10561  [pdf, other

    cs.CL cs.LG cs.SD eess.AS stat.ML

    CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning

    Authors: Jerome Abdelnour, Giampiero Salvi, Jean Rouat

    Abstract: We introduce the task of acoustic question answering (AQA) in the area of acoustic reasoning. In this task an agent learns to answer questions on the basis of acoustic context. In order to promote research in this area, we propose a data generation paradigm adapted from CLEVR (Johnson et al. 2017). We generate acoustic scenes by leveraging a bank elementary sounds. We also provide a number of func… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: NeurIPS 2018 Visually Grounded Interaction and Language (ViGIL) Workshop

  11. arXiv:1804.02772  [pdf, other

    stat.ML cs.AI cs.LG

    Active Mini-Batch Sampling using Repulsive Point Processes

    Authors: Cheng Zhang, Cengiz Öztireli, Stephan Mandt, Giampiero Salvi

    Abstract: The convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches. We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch. In particular, we prove that such repulsive sampling schemes lowers the variance of the gradient estimator. This generalizes recent work on using Determinantal Point Processes (DPP… ▽ More

    Submitted 20 June, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

  12. arXiv:1711.09714  [pdf, other

    cs.RO cs.CL cs.HC stat.ML

    Language Bootstrapping: Learning Word Meanings From Perception-Action Association

    Authors: Giampiero Salvi, Luis Montesano, Alexandre Bernardino, José Santos-Victor

    Abstract: We address the problem of bootstrapping language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a mapping between robot actions, ro… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

    Comments: code available at https://github.com/giampierosalvi/AffordancesAndSpeech

    ACM Class: I.2.9; I.2.10; I.2.7; I.2.6

    Journal ref: in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Volume: 42 Issue: 3, year 2012, pages 660-671

  13. arXiv:1711.09055  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Interactive Robot Learning of Gestures, Language and Affordances

    Authors: Giovanni Saponaro, Lorenzo Jamone, Alexandre Bernardino, Giampiero Salvi

    Abstract: A growing field in robotics and Artificial Intelligence (AI) research is human-robot collaboration, whose target is to enable effective teamwork between humans and robots. However, in many situations human teams are still superior to human-robot teams, primarily because human teams can easily agree on a common goal with language, and the individual members observe each other effectively, leveragin… ▽ More

    Submitted 24 November, 2017; originally announced November 2017.

    Comments: code available at https://github.com/gsaponaro/glu-gestures

    Journal ref: International Workshop on Grounding Language Understanding (GLU), Satellite of Interspeech 2017

  14. arXiv:1711.08992  [pdf, other

    cs.CV cs.CL cs.HC cs.LG stat.ML

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Authors: Kalin Stefanov, Jonas Beskow, Giampiero Salvi

    Abstract: This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robus… ▽ More

    Submitted 18 July, 2019; v1 submitted 24 November, 2017; originally announced November 2017.

    Comments: 10 pages, IEEE Transactions on Cognitive and Developmental Systems

    ACM Class: I.2; I.4; I.5

  15. arXiv:1610.00520  [pdf, other

    stat.ML cs.CL cs.LG

    Semi-supervised Learning with Sparse Autoencoders in Phone Classification

    Authors: Akash Kumar Dhaka, Giampiero Salvi

    Abstract: We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition based on deep neural net- works. As opposed to unsupervised initialisation followed by supervised fine tuning, our method takes advantage of both unlabelled and labelled data simultaneously through mini- batch stochastic gradient descent. We tested the m… ▽ More

    Submitted 3 October, 2016; originally announced October 2016.

    Comments: 5 pages, 1 figure, 2 tables

  16. arXiv:1606.09163  [pdf, other

    cs.CL cs.CV cs.NE stat.ML

    Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

    Authors: Akash Kumar Dhaka, Giampiero Salvi

    Abstract: We present a systematic analysis on the performance of a phonetic recogniser when the window of input features is not symmetric with respect to the current frame. The recogniser is based on Context Dependent Deep Neural Networks (CD-DNNs) and Hidden Markov Models (HMMs). The objective is to reduce the latency of the system by reducing the number of future feature frames required to estimate the cu… ▽ More

    Submitted 29 June, 2016; originally announced June 2016.

    Comments: 4 pages, 3 figures

  17. arXiv:1305.4544  [pdf, other

    cs.CV

    Efficient Image Retargeting for High Dynamic Range Scenes

    Authors: Govind Salvi, Puneet Sharma, Shanmuganathan Raman

    Abstract: Most of the real world scenes have a very high dynamic range (HDR). The mobile phone cameras and the digital cameras available in markets are limited in their capability in both the range and spatial resolution. Same argument can be posed about the limited dynamic range display devices which also differ in the spatial resolution and aspect ratios. In this paper, we address the problem of display… ▽ More

    Submitted 20 May, 2013; originally announced May 2013.