5PMSTTA2 : Audio Processing - WPMTTTA2

Number of hours
- Lectures 8.0
- Projects 0
- Tutorials 8.0
- Internship 0
- Laboratory works 4.0
ECTS
ECTS 2.0

Goal(s)

This course deals with the fundamentals of audio processing (automatic analysis of natural and artificial sound scenes, music signals, etc.): Basics on sounds and their recording; fundamental tools for the analysis, transformation and synthesis of audio signals; speech enhancement in noise; audio sources separation; spatial processing (multichannel processing). This course will address both classical approaches based on signal and channel models, and recent approaches based on machine learning and deep learning in particular.

Contact Laurent GIRIN, Ronald PHLYPO

Content(s)

Part 1: Fundamentals of sounds and their recording
Part 2: Fundamentals of audio analysis/synthesis (discrete Fourier transform, short-term Fourier transform, phase vocoder)
Part 3: Speech denoising and separation of audio sources (in single channel set-up)
Part 4: Multichannel spatial processing (with a focus on multichannel audio source separation)

Prerequisites

Solid skills in signal processing (analog and digital, deterministic and statistical).
The deep learning aspects are strongly connected to the deep learning course, to the corresponding audio processing project (Projet de simulation logicielle), and to the speech processing course, all in 3A Sicom.

Test

Written exam: 2h
Lab work: report
Computation of the grade: Written exam: 50%, Lab work report: 50%

Written exam: 50%, Lab work report: 50%

Additional Information

Course list

Curriculum->Double-Diploma Engineer/Master->Semester 9

Curriculum->Master->Semester 9

Bibliography

J. B. Allen & L. R. Rabiner, A unified approach to short-time Fourier analysis and synthesis, Proceedings of the IEEE, 1977.
J. Benesty, S. Makino & J. Chen, Speech enhancement, Springer, 2006.
R. E. Berg & D. G. Stork, The physics of sound. Prentice Hall, 1995.
M. Dolson, The phase vocoder: A tutorial, Computer Music Journal, 1986.
E. Jacobsen & R. Lyons, The sliding DFT, Signal Processing Magazine, 2003.
H. Kuttruff, Room acoustics, CRC Press, 2016.
J. Le Roux, E. Vincent & H. Erdogan, Learning-based approaches to speech enhancement and separation, Tutorial at Interspeech Conference 2016.
P. C. Loizou, Speech enhancement: Theory and practice, CRC Press, 2013.
M. Müller, Fundamentals of Music Processing, Springer, 2015.
A.V. Oppenheim & W.S. Shaffer, Digital Signal Processing, Prentice Hall, NYC, 1975.
E. Vincent, T. Virtanen & S. Gannot (Eds.), Audio source separation and speech enhancement, John Wiley & Sons, 2018.
D. Wang & J. Chen, Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018.

Update - 06/10/2025

Version française

Voir la version française de cette page