Research Interests:
This paper investigates the impact of different telephone channels, represented by impairments as introduced by modern telecommunication networks (e.g. speech coding, bandwidth limitation, packet loss, etc.), on the intelligibility of... more
This paper investigates the impact of different telephone channels, represented by impairments as introduced by modern telecommunication networks (e.g. speech coding, bandwidth limitation, packet loss, etc.), on the intelligibility of synthesized speech. Both subjective and objective assessments are used. Two different speech intelligibility prediction models, namely PESQ Intelligibility and POLQA Intelligibility, are evaluated by comparing the predictions with subjectively obtained intelligibility scores. The results show that all the investigated degradations seriously impact the intelligibility of the synthesized speech measured subjectively. Furthermore it is shown that PESQ Intelligibility provides too low correlations between predicted objective measurements and subjective scores for accurate prediction of speech intelligibility while POLQA Intelligibility is capable of providing good intelligibility predictions in the case that a closed response experimental set up is used.
Since the beginning of the 21st century, there has been a great development in different studies related to encephalography and the psychological consequences of binaural audio. The objective of this paper is to analyze the effects of... more
Since the beginning of the 21st century, there has been a
great development in different studies related to encephalography
and the psychological consequences of binaural audio. The objective
of this paper is to analyze the effects of binaural audio as a
relaxation method for people that suffer anxiety and stress.
The study took place at ITESM, Santa Fe Campus, where twenty
participants around 21 years old took part of the study. They were
divided into two groups, Experimental and Control. The first one
listened to the Theta binaural stimuli and the second one listened to
a stimulus with nature sounds (birds and wind).
The results show that the experimental group got to conscious
relaxation. The control group had an unconscious relaxation
(sleep), but when the stimuli stopped they recovered their stress
level, for some participants it increased.
Research Interests:
This paper investigates the impact of frequent and small playout delay adjustments (time-shifting) of 30 ms or less introduced to silence periods by Voice over IP (VoIP) jitter buffer strategies on listening quality perceived by the end... more
This paper investigates the impact of frequent and small playout delay adjustments (time-shifting) of 30 ms or less introduced to silence periods by Voice over IP (VoIP) jitter buffer strategies on listening quality perceived by the end user. In particular, the quality impact is assessed using both a subjective method (quality scores obtained from subjective listening test) and an objective method based on perceptual modelling. Two different objective methods are used, PESQ (Perceptual Evaluation of Speech Quality, ITU-T Recommendation P.862) and POLQA (Perceptual Objective Listening Quality Assessment, ITU-T Recommendation P.863). Moreover, the relative accuracy of both objective models is assessed by comparing their predictions with subjective assessments. The results show that the impact of the investigated playout delay adjustments on subjective listening quality scores is negligible. On the other hand, a significant impact is reported for objective listening quality scores predicted by the PESQ model i.e. the PESQ model fails to correctly predict quality scores for this kind of degradation. Finally, the POLQA model is shown to perform significantly better than PESQ. We conclude the paper by identifying further related research that arises from this study.
A recent neuro-spiking coding scheme for feature extraction from biosonar echoes of various plants is examined with a variety of stochastic classifiers. Feature vectors derived are employed in well-known stochastic classifiers, including... more
A recent neuro-spiking coding scheme for feature
extraction from biosonar echoes of various plants is examined with a
variety of stochastic classifiers. Feature vectors derived are employed
in well-known stochastic classifiers, including nearest-neighborhood,
single Gaussian and a Gaussian mixture with EM optimization.
Classifiers’ performances are evaluated by using cross-validation and bootstrapping techniques. It is shown that the various classifiers perform equivalently and that the modified preprocessing configuration yields considerably improved results.
Research Interests:
An improved processing description to be employed in biosonar signal processing in a cochlea model is proposed and examined. It is compared to conventional models using a modified discrimination analysis and both are tested. Their... more
An improved processing description to be employed
in biosonar signal processing in a cochlea model is proposed and
examined. It is compared to conventional models using a modified
discrimination analysis and both are tested. Their performances are
evaluated with echo data captured from natural targets (trees).
Results indicate that the phase characteristics of low-pass filters
employed in the echo processing have a significant effect on class
separability for this data.
Research Interests:
Audio steganography is the technique of hiding secret information into the samples of an audio signal. In this work a new least significant bit (LSB) audio steganographic technique is introduced. Proposed technique works on the basis of... more
Audio steganography is the technique of hiding secret information into the samples of an audio signal. In this work a new least significant bit (LSB) audio steganographic technique is introduced. Proposed technique works on the basis of random bits steganography. These number of bits for steganography of nth sample is be defined by (n-1)th sample’s 3LSB decimal value. The proposed technique does not affect the quality of the resultant audio signal and the difference between original and stego audio is unnoticeable as well as it is more secure than the conventional LSB technique due to random amount of data in samples of audio.
Research Interests:
Upload File
Per il settore forense, le analisi audio sono un ambito scientifico ancora in piena espansione ed estremamente frammentato. Nonostante ciò, esso racchiude grandissime potenzialità, non soltanto per quanto concerne la voce umana, evidente... more
Per il settore forense, le analisi audio sono un ambito scientifico ancora in piena espansione ed estremamente frammentato. Nonostante ciò, esso racchiude grandissime potenzialità, non soltanto per quanto concerne la voce umana, evidente portatrice di indizi e significante, ma anche per l’estrapolazione di tutti quei particolari sonori utili per l’indagine quali il colpo di pistola (di conseguenza la tipologia di arma), il contesto ambientale, il tipo di autovettura, il sesso del parlatore, eccetera. Durante l’incontro verranno esposte le principali modalità di analisi digitale delle registrazioni tramite esempi e casi reali inerenti temi quali: il riconoscimento del parlatore, la verifica dell’integrità del materiale audio o il problema dell’interpretazione soggettiva del contenuto vocale in presenza di forti disturbi ambientali (quali radio, traffico automobilistico, pioggia, eccetera). L’intercettazione del resto, sia essa telefonica o di altra tipologia, è uno strumento fondamentale per l’indagine preliminare o l’impianto accusatorio e difensivo, ma nel contempo rappresenta un materiale delicato, che necessita l’utilizzo di corrette metodologie di analisi. Dato un quesito, la scienza in tale campo non fornisce ancora certezze probatorie, bensì indizi a favore o sfavore. � invece purtroppo molto più facile creare un danno irreparabile per la società quando l’eventuale prova sia stata analizzata in maniera errata. Questo concetto sarà esplicitato tramite l’esempio di un caso reale (la revisione di una sentenza di condanna) per fornire un ulteriore strumento di difesa da analisi basate su metodi scientificamente non corretti.
I destinatari

Tecnici informatici, manager d’azienda, amministratori di rete, autorità giudiziarie, inquirenti, avvocati, periti informatici, consulenti e altre persone attive nel settore tecnico/giuridico interessate alle attività che riguardano l'informatica forense.
Research Interests:
For an improved neuro-spike representation of auditory signals within cochlea models, a new digital ARMA-type low-pass ¯lter structure is proposed. It is compared to more conventional AR-type counterpart on a classi¯cation of biosonar... more
For an improved neuro-spike representation of auditory signals within
cochlea models, a new digital ARMA-type low-pass ¯lter structure is proposed.
It is compared to more conventional AR-type counterpart on a classi¯cation of
biosonar echoes, in which echoes from various tree species insoni¯ed with a bat-like
chirp call are converted to biologically plausible feature vectors. Next, paramet-
ric and non-parametric models of the class-conditional densities are built from the
echo feature vectors. The models are deployed in single-shot and sequential-decision
classi¯cation algorithms. The results indicate that the proposed ARMA ¯lter struc-
ture o®ers an improved single-echo classi¯cation performance, which leads to faster
sequential-decision making than its AR-type counterpart.
Research Interests:
Cuando hablamos de desplazamiento del eje de cero absoluto, nos referimos al fenoÌ?meno que en ingleÌ?s es comuÌ?nmente conocido como DC offset, que refiere a un error presente en algunas senÌ?ales de audio, manifiesto por las desigualdades... more
Cuando hablamos de desplazamiento del eje de cero absoluto, nos referimos al fenoÌ?meno que en ingleÌ?s es comuÌ?nmente conocido como DC offset, que refiere a un error presente en algunas senÌ?ales de audio, manifiesto por las desigualdades que tenga la misma, en los dominios positivos y negativos con respecto al eje X, al que en este artiÌ?culo preferimos llamar eje de cero absoluto.
Este desplazamiento, seguÌ?n el caso, puede tener muchas variantes. Puede ser uniforme en toda la senÌ?al, como tambieÌ?n puede estar presente en una o maÌ?s secciones. Si la senÌ?al es esteÌ?reo, puede encontrarse en solo un canal, en ambos por igual, o incluso puede estar presente en ambos canales pero con distintos niveles.
En este artiÌ?culo trataremos al desplazamiento del eje desde su geÌ?nesis, identificando sus causas, las problemaÌ?ticas que acarrea y proponiendo una metodologiÌ?a para removerlo satisfactoriamente de una onda de audio.
Research Interests:
Research Interests:
Upload File
@ ICMC-SMC 2014 of Athen This research integrates sensory and scientific instruments to analyze the relationship between subjective evaluations of digitally restored audio and its computer extracted perceptual descriptors. Statistical... more
@ ICMC-SMC 2014 of Athen
This research integrates sensory and scientific instruments
to analyze the relationship between subjective evaluations
of digitally restored audio and its computer extracted perceptual
descriptors. Statistical methods have been used
to compare the displacement of three types of remediated
content in subspaces obtained by data expressed both by
individuals and by feature extraction algorithms.
Qualitative demands in audio restoration are tightly connected
to the information embedded in remediated content:
it is crucial the awareness that every choice is re-balancing
it and affecting its reception. Listeners in their turn don’t
do an acousmatic reduction of auditory information but recode
it interleaving contextual and aesthetic approaches,
according to their sensitivity and being influenced by their
cultural background.
Thanks to the analysis of the displacement in subspaces
related to the descriptive characteristics with greater variability,
the semantic divergence resulting from the operations
of improving the quality of sound was interpreted
and a predictive model aimed at their optimization was assumed.
Research Interests:
This writing summarizes and reviews Deep Learning and Its Applications to Signal and Information Processing.
Research Interests:
The quality and quantity of acoustical data available to researchers are rapidly increasing with advances in technology. Recording cetaceans with a 500 kHz sampling rate provides a more complete signal representation than traditional... more
The quality and quantity of acoustical data available to researchers are rapidly increasing with
advances in technology. Recording cetaceans with a 500 kHz sampling rate provides a more
complete signal representation than traditional sampling at 96 kHz and lower. Such sampling
provides a profusion of data concerning various parameters, such as click duration, inter-click
intervals, frequency, amplitude and phase. However, there is disagreement in the literature in the
use and definitions of these acoustic terms and parameters. In this study, Amazon River dolphins
(Inia geoffrensis) were recorded using a 500 kHz sampling rate in the Peruvian Amazon River
watershed. Subsequent spectral analyses, including time waveforms, fast Fourier transforms and
wavelet scalograms, demonstrate acoustic signals with differing characteristics. These high-
frequency, broadband signals are compared, and differences are highlighted, despite the fact that
currently an unambiguous way to describe these acoustic signals is lacking. The need for
standards in cetacean bioacoustics with regard to terminology is emphasized.
Research Interests:
This article addresses the cultural and historical significance of the modern interpretation of extended range guitars while also discussing specifications of the instruments and the context of a mix. The extended range guitar refers to... more
This article addresses the cultural and historical significance of the modern interpretation of extended range guitars while also discussing specifications of the instruments and the context of a mix. The extended range guitar refers to the added strings on a traditional six string guitar by adding a seventh, eight, and even ninth string to the electric guitar. First, this article explores how artists such as Steve Vai, Korn, Meshuggah, and Animals As Leaders have impacted and shaped the commercial creation, success, and cultural effects on the musical community with these extended range guitars, specifically within heavy metal. Secondly, it examines the specifications of the extended range guitars, such as scale length, string gauges, hardware, wood, and price in relation to seventh, eighth, and ninth strings. Lastly, it discusses how to approach a mix with extended range instruments, considering that these guitars are now interfering with low frequencies in relation to a drum kit and bass guitar.
Research Interests:
This Report expresses an integral functionality description of a new Digital Sensor Board called SAVANT node, giving the main theoretical background, of the specific algorithms, data and criteria that I took into account to de- velop a... more
This Report expresses an integral functionality description of a new Digital Sensor Board called SAVANT node, giving the main theoretical background, of the specific algorithms, data and criteria that I took into account to de- velop a WSN platform. The global design of the SAVANT node is based on the humans symptoms of SAVANT syndrome [1]. The WSN (Wireless Sensor Network) environment evokes the use of a homogeneous network structure, the inclusion of a new special node with a more powerful com- putational resources which could change this network’s dynamic (commonly WSN)
Research Interests:
Since Barry Truax’s early pioneering work, the increasing availability of powerful and inexpensive desktop computers has led many composers, researchers and artists to experiment and work with granular synthesis and/or granulation... more
Since Barry Truax’s early pioneering work, the increasing availability of powerful and inexpensive desktop computers has led many composers, researchers and artists to experiment and work with granular synthesis and/or granulation processes. During the last two decades a stream of granulation software has appeared, but despite this spread, all these applications have important limitations in terms of efficiency, usability, flexibility and control. This paper describes an abstraction of an efficient and flexible granular processing system built into the Max/MSP environment.
Research has used the cardiac orienting response to show that structural changes in the auditory environment cause people to briefly but automatically pay attention to messages such as radio broadcasts, podcasts, and web streaming. The... more
Research has used the cardiac orienting response to show that structural changes in the auditory environment cause people to briefly but automatically pay attention to messages such as radio broadcasts, podcasts, and web streaming.  The voice change--an example of an auditory structural feature--elicits orienting across multiple repetitions. This article reports two experiments designed to investigate whether automatic attention allocation to repeated instances of other auditory structural features--namely production effects, jingles, and silence--is a robust phenomenon or if repetition leads to habituation. In Study 1 we show that listeners of a simulated radio broadcast exhibit orienting responses following the onset of auditory structural features that differ in semantic content.  The prediction that listeners would not habituate to feature repetition was not supported.  Instead, both jingles and synthesized production effects result in more iconic orienting responses to the second repetition compared to the first.  However orienting significantly diminished following the third repetition of both.  Study 2 replicates this result using multiple repetitions of structural features containing identical semantic content.
Research Interests:
In this work, we are concerned by a new iterative Kalman filtering scheme where a linear predictor model parameters are estimated from noisy speech. However, when only noise-corrupted speech is available, the enhancement performance of... more
In this work, we are concerned by a new iterative Kalman filtering scheme where a linear predictor model parameters are estimated from noisy speech. However, when only noise-corrupted speech is available, the enhancement performance of the Kalman filter is somewhat dependent on the accuracy of the linear prediction coefficients (LPCs) and excitation variance estimates. Nevertheless, linear prediction based speech (LPC) analysis is known to be sensitive to the presence of additive noise. To overcome this problem we present in this paper an analysis and application of the LPC-based formant enhancement method by modifying the log magnitude spectrum of the LPC model and then re-evaluating new LPCs to be apply on the Kalman filter. These enhanced LPCs are useful indicator of Kalman filter performance. Our enhancement experiments use a NOIZEUS speech corpus where the proposed method achieves higher objective and subjective results compared with other enhancement methods.
Research Interests:
In the quest for synthetic sound of a natural acoustic character, fretted instruments such as the guitar present numerous challenges. For fully articulated synthesis, sample-based methods become unwieldy due to the large range of subtle... more
In the quest for synthetic sound of a natural acoustic character, fretted instruments such as the guitar present numerous challenges. For fully articulated synthesis, sample-based methods become unwieldy due to the large range of subtle variations in timbre and resultant storage requirements. A physical modeling approach thus becomes an attractive option. Here, a vibrating string is subject to intermittent contact/recontact phenomena along the length of the fretboard---and furthermore, the string is driven by a plucking interaction, and stopped by a finger, the position of which and force applied by are gestural parameters. The hypotheses underlying this model thus depart significantly from those which inform standard physical modeling methodologies, such as digital waveguides or modal synthesis, and an appeal to direct time space integration techniques is of interest. In this article, a finite difference time domain method is developed, with a penalty potential allowing for a convenient model of distributed collision. Implementation details are discussed, and simulation results and visualisations are presented illustrating a variety of typical playing gestures. Finally, given that such methods for highly nonlinear systems are prone to numerical instability, a brief description of an energy-balanced or Hamiltonian framework is provided, allowing for convenient numerical stability conditions.
Upload File
Academia © 2015