Phelma Formation 2022

5PMSTTA2 : Audio Processing - WPMTTTA2

  • Number of hours

    • Lectures 8.0
    • Projects 0
    • Tutorials 8.0
    • Internship 0
    • Laboratory works 4.0

    ECTS

    ECTS 2.0

Goal(s)

This course deals with the fundamentals of audio processing (automatic analysis of natural and artificial sound scenes, music signals, etc.): Basics on sounds and their recording; fundamental tools for the analysis, transformation and synthesis of audio signals; speech enhancement in noise; audio sources separation; spatial processing (multichannel processing). This course will address both classical approaches based on signal and channel models, and recent approaches based on machine learning and deep learning in particular.

Contact Laurent GIRIN

Content(s)

Part 1: Fundamentals of sounds and their recording
Part 2: Fundamentals of audio analysis/synthesis (discrete Fourier transform, short-term Fourier transform, phase vocoder)
Part 3: Speech denoising and separation of audio sources (in single channel set-up)
Part 4: Multichannel spatial processing (with a focus on multichannel audio source separation)
Part 5: Deep generative models for sound synthesis (speaker: Fanny Roche, Arturia)
Part 6: Fundamentals of Music Information Retrieval (Speaker: Geoffroy Peeters, Telecom ParisTech)



Prerequisites

Solid skills in signal processing (analog and digital, deterministic and statistical).
The deep learning aspects are strongly connected to the deep learning course, to the corresponding audio processing project (Projet de simulation logicielle), and to the speech processing course, all in 3A Sicom.

Test

Written exam: 2h
Lab work: report
Computation of the grade: Written exam: 50%, Lab work report: 50%



Written exam: 50%, Lab work report: 50%

Additional Information

Course list
Curriculum->Double-Diploma Engineer/Master->Semester 9
Curriculum->Master->Semester 9

Bibliography

J. B. Allen & L. R. Rabiner, A unified approach to short-time Fourier analysis and synthesis, Proceedings of the IEEE, 1977.
J. Benesty, S. Makino & J. Chen, Speech enhancement, Springer, 2006.
R. E. Berg & D. G. Stork, The physics of sound. Prentice Hall, 1995.
M. Dolson, The phase vocoder: A tutorial, Computer Music Journal, 1986.
E. Jacobsen & R. Lyons, The sliding DFT, Signal Processing Magazine, 2003.
H. Kuttruff, Room acoustics, CRC Press, 2016.
J. Le Roux, E. Vincent & H. Erdogan, Learning-based approaches to speech enhancement and separation, Tutorial at Interspeech Conference 2016.
P. C. Loizou, Speech enhancement: Theory and practice, CRC Press, 2013.
M. Müller, Fundamentals of Music Processing, Springer, 2015.
A.V. Oppenheim & W.S. Shaffer, Digital Signal Processing, Prentice Hall, NYC, 1975.
E. Vincent, T. Virtanen & S. Gannot (Eds.), Audio source separation and speech enhancement, John Wiley & Sons, 2018.
D. Wang & J. Chen, Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018.