We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 45 entries: 1-25 | 26-45 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 26 Apr 2024

[1]  arXiv:2404.16619 [pdf, other]
Title: The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge
Comments: Accepted in Grand Challenge of ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2404.16436 [pdf, ps, other]
Title: Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics
Comments: 18 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3]  arXiv:2404.16259 [pdf, other]
Title: An Experiment with Electric Guitar Signals for Exploring the Virtuosity based on the Entropy of Music
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2404.16743 (cross-list from cs.CL) [pdf, other]
Title: Automatic Speech Recognition System-Independent Word Error Rate Estimatio
Comments: Accepted to LREC-COLING 2024 (long)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2404.16547 (cross-list from eess.AS) [pdf, other]
Title: Developing Acoustic Models for Automatic Speech Recognition in Swedish
Authors: Giampiero Salvi
Comments: 16 pages, 7 figures
Journal-ref: European Student Journal of Language and Speech, 1999
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[6]  arXiv:2404.16305 (cross-list from cs.MM) [pdf, other]
Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2404.16216 (cross-list from cs.CV) [pdf, other]
Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8]  arXiv:2404.16104 (cross-list from eess.AS) [pdf, other]
Title: Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective
Comments: 5 pages, 2 figures, keywords:, Gender, Diachrony, Vocal Tract Resonance, Vocal register, Broadcast speech
Journal-ref: Radek Skarnitzl & Jan Vol\'in (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague 2023, pp. 753-757. Guarant International. ISBN 978-80-908 114-2-3
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9]  arXiv:2404.13101 (cross-list from eess.IV) [pdf, ps, other]
Title: DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)

Thu, 25 Apr 2024

[10]  arXiv:2404.15637 [pdf, other]
Title: HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[11]  arXiv:2404.15854 (cross-list from cs.CR) [pdf, other]
Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Comments: Submitted to IEEE TDSC
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2404.15704 (cross-list from cs.LG) [pdf, other]
Title: Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:2404.15321 (cross-list from eess.SP) [pdf, other]
Title: Characteristics-Based Design of Multi-Exponent Bandpass Filters
Comments: 14 pages, 5 figures, 2 tables, 62 equations. Submitted to IEEE Transactions on Circuits and Systems I: Regular Papers
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 24 Apr 2024 (showing first 12 of 15 entries)

[14]  arXiv:2404.15181 [pdf, ps, other]
Title: Tailors: New Music Timbre Visualizer to Entertain Music Through Imagery
Authors: ChungHa Lee
Comments: 47 pages, 9 figures, 5 tables
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[15]  arXiv:2404.15160 [pdf, ps, other]
Title: Vector Signal Reconstruction Sparse and Parametric Approach of direction of arrival Using Single Vector Hydrophone
Authors: Jiabin Guo
Comments: 22 pages. arXiv admin note: substantial text overlap with arXiv:2404.13568
Subjects: Sound (cs.SD)
[16]  arXiv:2404.15143 [pdf, other]
Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath
Comments: Submitted to ACM journal -- Digital Threats: Research and Practice
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[17]  arXiv:2404.14946 [pdf, other]
Title: StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Comments: Accepted by ICASSP 2024
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 11521-11525
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18]  arXiv:2404.14771 [pdf, other]
Title: Music Style Transfer With Diffusion Model
Comments: 8 pages, 6 figures, ICMC 2023
Journal-ref: International Computer Music Conference (ICMC 2023) pp. 40-47, October 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[19]  arXiv:2404.15208 (cross-list from cs.SI) [pdf, other]
Title: Analysis and Visualization of Musical Structure using Networks
Subjects: Social and Information Networks (cs.SI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Physics and Society (physics.soc-ph)
[20]  arXiv:2404.15176 (cross-list from eess.AS) [pdf, other]
Title: Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition
Comments: 5 pages, 1 figure, keywords: Transgender voice, Gender perception, Speaker gender classification, CNN, X-Vector
Journal-ref: Proc. INTERSPEECH 2023, 5207-5211
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[21]  arXiv:2404.15168 (cross-list from eess.AS) [pdf, other]
Title: Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[22]  arXiv:2404.14913 (cross-list from eess.AS) [pdf, other]
Title: Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
Comments: accepted at Odyssey 2024: The Speaker and Language Recognition Workshop. arXiv admin note: text overlap with arXiv:2306.03664
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23]  arXiv:2404.14903 (cross-list from eess.AS) [pdf, other]
Title: Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting
Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Sound (cs.SD)
[24]  arXiv:2404.14860 (cross-list from eess.AS) [pdf, other]
Title: Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Comments: 13 pages, 6 figures, Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25]  arXiv:2404.14736 (cross-list from cs.HC) [pdf, ps, other]
Title: Qualitative Approaches to Voice UX
Journal-ref: ACM Computing Surveys (2024)
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 45 entries: 1-25 | 26-45 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help  (Access key information)