Sound

Authors and titles for recent submissions

[ total of 45 entries: 1-25 | 26-45 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 26 Apr 2024

[1] arXiv:2404.16619 [pdf, other]: Title: The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

Comments: Accepted in Grand Challenge of ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2404.16436 [pdf, ps, other]: Title: Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

Authors: Ben Williams, Bart van Merriënboer, Vincent Dumoulin, Jenny Hamer, Eleni Triantafillou, Abram B. Fleishman, Matthew McKown, Jill E. Munger, Aaron N. Rice, Ashlee Lillis, Clemency E. White, Catherine A. D. Hobbs, Tries B. Razak, Kate E. Jones, Tom Denton

Comments: 18 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2404.16259 [pdf, other]: Title: An Experiment with Electric Guitar Signals for Exploring the Virtuosity based on the Entropy of Music

Authors: Igor Lugo, Martha G. Alatriste-Contreras

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2404.16743 (cross-list from cs.CL) [pdf, other]: Title: Automatic Speech Recognition System-Independent Word Error Rate Estimatio

Authors: Chanho Park, Mingjie Chen, Thomas Hain

Comments: Accepted to LREC-COLING 2024 (long)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5] arXiv:2404.16547 (cross-list from eess.AS) [pdf, other]: Title: Developing Acoustic Models for Automatic Speech Recognition in Swedish

Authors: Giampiero Salvi

Comments: 16 pages, 7 figures

Journal-ref: European Student Journal of Language and Speech, 1999

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[6] arXiv:2404.16305 (cross-list from cs.MM) [pdf, other]: Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

Authors: Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2404.16216 (cross-list from cs.CV) [pdf, other]: Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling

Authors: Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2404.16104 (cross-list from eess.AS) [pdf, other]: Title: Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective

Authors: Albert Rilliard, David Doukhan, Rémi Uro, Simon Devauchelle

Comments: 5 pages, 2 figures, keywords:, Gender, Diachrony, Vocal Tract Resonance, Vocal register, Broadcast speech

Journal-ref: Radek Skarnitzl & Jan Vol\'in (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague 2023, pp. 753-757. Guarant International. ISBN 978-80-908 114-2-3

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9] arXiv:2404.13101 (cross-list from eess.IV) [pdf, ps, other]: Title: DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data

Authors: Hesam Hakimnejad, Zohreh Azimifar, Narjes Goshtasbi

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)

Thu, 25 Apr 2024

[10] arXiv:2404.15637 [pdf, other]: Title: HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

Authors: Xinlei Niu, Jing Zhang, Charles Patrick Martin

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.15854 (cross-list from cs.CR) [pdf, other]: Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

Comments: Submitted to IEEE TDSC

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2404.15704 (cross-list from cs.LG) [pdf, other]: Title: Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2404.15321 (cross-list from eess.SP) [pdf, other]: Title: Characteristics-Based Design of Multi-Exponent Bandpass Filters

Authors: Samiya A Alkhairy

Comments: 14 pages, 5 figures, 2 tables, 62 equations. Submitted to IEEE Transactions on Circuits and Systems I: Regular Papers

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 24 Apr 2024 (showing first 12 of 15 entries)

[14] arXiv:2404.15181 [pdf, ps, other]: Title: Tailors: New Music Timbre Visualizer to Entertain Music Through Imagery

Authors: ChungHa Lee

Comments: 47 pages, 9 figures, 5 tables

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[15] arXiv:2404.15160 [pdf, ps, other]: Title: Vector Signal Reconstruction Sparse and Parametric Approach of direction of arrival Using Single Vector Hydrophone

Authors: Jiabin Guo

Comments: 22 pages. arXiv admin note: substantial text overlap with arXiv:2404.13568

Subjects: Sound (cs.SD)
[16] arXiv:2404.15143 [pdf, other]: Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Authors: Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Carrie Gates, Kevin Butler, Patrick Traynor

Comments: Submitted to ACM journal -- Digital Threats: Research and Practice

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[17] arXiv:2404.14946 [pdf, other]: Title: StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

Authors: Sen Liu, Yiwei Guo, Xie Chen, Kai Yu

Comments: Accepted by ICASSP 2024

Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 11521-11525

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18] arXiv:2404.14771 [pdf, other]: Title: Music Style Transfer With Diffusion Model

Authors: Hong Huang, Yuyi Wang, Luyao Li, Jun Lin

Comments: 8 pages, 6 figures, ICMC 2023

Journal-ref: International Computer Music Conference (ICMC 2023) pp. 40-47, October 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[19] arXiv:2404.15208 (cross-list from cs.SI) [pdf, other]: Title: Analysis and Visualization of Musical Structure using Networks

Authors: Alberto Alcalá-Alvarez, Pablo Padilla-Longoria

Subjects: Social and Information Networks (cs.SI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Physics and Society (physics.soc-ph)
[20] arXiv:2404.15176 (cross-list from eess.AS) [pdf, other]: Title: Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition

Authors: David Doukhan, Simon Devauchelle, Lucile Girard-Monneron, Mía Chávez Ruz, V. Chaddouk, Isabelle Wagner, Albert Rilliard

Comments: 5 pages, 1 figure, keywords: Transgender voice, Gender perception, Speaker gender classification, CNN, X-Vector

Journal-ref: Proc. INTERSPEECH 2023, 5207-5211

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[21] arXiv:2404.15168 (cross-list from eess.AS) [pdf, other]: Title: Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech

Authors: Hasmot Ali, Md. Fahad Hossain, Md. Mehedi Hasan, Sheikh Abujar, Sheak Rashed Haider Noori

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[22] arXiv:2404.14913 (cross-list from eess.AS) [pdf, other]: Title: Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations

Authors: Theo Lepage, Reda Dehak

Comments: accepted at Odyssey 2024: The Speaker and Language Recognition Workshop. arXiv admin note: text overlap with arXiv:2306.03664

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23] arXiv:2404.14903 (cross-list from eess.AS) [pdf, other]: Title: Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting

Authors: Kevin Wilkinghoff, Alessia Cornaggia-Urrigshardt

Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Sound (cs.SD)
[24] arXiv:2404.14860 (cross-list from eess.AS) [pdf, other]: Title: Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

Authors: Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Comments: 13 pages, 6 figures, Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25] arXiv:2404.14736 (cross-list from cs.HC) [pdf, ps, other]: Title: Qualitative Approaches to Voice UX

Authors: Katie Seaborn, Jacqueline Urakami, Peter Pennefather, Norihisa P. Miyake

Journal-ref: ACM Computing Surveys (2024)

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 45 entries: 1-25 | 26-45 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Fri, 26 Apr 2024

Thu, 25 Apr 2024

Wed, 24 Apr 2024 (showing first 12 of 15 entries)