Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Main Code: 1,784 LOC (25 files) = PY (92%) + YAML (6%) + IN (<1%) + CFG (<1%) Secondary code: Test: 0 LOC (0); Generated: 0 LOC (0); Build & Deploy: 47 LOC (4); Other: 388 LOC (3); |
|||
File Size: 0% long (>1000 LOC), 80% short (<= 200 LOC) | |||
Unit Size: 0% long (>100 LOC), 43% short (<= 10 LOC) | |||
Conditional Complexity: 0% complex (McCabe index > 50), 51% simple (McCabe index <= 5) | |||
|
Logical Component Decomposition: primary (4 components) | ||
|
1 year, 4 months old
|
|
|
|
0% of code updated more than 50 times Also see temporal dependencies for files frequently changed in same commits. |
|
|
|
Goals: Keep the system simple and easy to change (4) |
|
Latest commit date: 2022-01-17
1
commits
(30 days)
1
contributors
(30 days) |
|
generated by sokrates.dev (configuration) on 2022-01-25