We present here audio samples for the causal Demucs model trained on the DNS challenge dataset as presented in the paper Real Time Speech Enhancement in the Waveform Domain. We used the causal Demucs with H=64, Revecho augmentation with partial dereverberation (10% of reverb kept), and adding back 1% of the dry signal.
We used a specific causal implementation for evaluation, which feed to model with audio frames of 40ms, strided by 16ms. The model outputs a prediction for the left-most 16ms of the input frame. On a quad-core Intel i7-8565U CPU (2.0 GHz, up to AVX2 instruction set), it takes just about 16ms to evaluate, allowing for real time speech enhancement on laptop. The model weights 135MB, with future work planned on quantization.
The following samples are artificial mixtures with reverb added, from the DNS blind test set.
The following samples are real recordings from the DNS blind test set.