Demixing stereo track into multiple sources

unfa · Post by **unfa** » Thu Apr 17, 2014 3:55 pm

Hi!

I've been thinking of a signal processing unit that could be used to extract multiple mono tracks from a single stereo track based on different loudnesses (X/Y mode) and/or timing/phase (A/B mode) of nearly identical spectral patterns present in L and R channels.

The first test could be to extract 3 mono tracks from 1 stereo track of 3 instruments mixed:

: Concept 01.png (52.69 KiB) Viewed 370 times

I've made a very simple sound file where the three instruments are 3 sine waves: 500, 1000 and 1500 Hz panned respectively -50%, 0% and +50%.
The spectrum is extremely simple as each instrument contains only one harmonic (the fundamental) and the stereo separation is quite big.
Alse the individual instruments play 1 second solo in the beginning of the file, then after a 1 second pause all three sound together for 3 seconds:

Test 01.ogg.zip: (5.53 KiB) Downloaded 46 times

However one can't simply "take this, substract it from that and voila!" to extract individual mono tracks from this mix, or can he?
I wasn't able to get any good results using the Audacity Noise Removal plus Inversion and summing.

It's a challenge

Now the task is to create a program that would:

1. Anylase give stereo sound file and detect the number of instruments (distinguishable mono fields inside the stereo filed) in it;
2. Create a phase corellation / amplitude differences profile for each instrument (for example A: -50% pan, B: 0 % pan; C +50% pan);
3. Use the generated information to separate instruments and create multiple mono (or stereo, after panning the extracted signals approperiately) sound files that can be mixed again to produce the original stereo signal.

I first was thinking about this when I was asked if it's possible to remove reverb from a voice recording. I though:
"The reverb early reflections and tail will have a different stereo wideness and phase differences than the voice itself, so basing upon this I should be able to seperate the dry sound from the wet to some degree". Unfortunately I was provided a mono recording so this idea was scraped for then.

I'm thinking of how human perception of sound works, and what "DSP processing" is used there to descriminate different sound sources in a stereo mix that the human brain is fed with. Why a surround soundsystem can perform what stereo headphones can not? Or can they? How do whe know that one sound is coming from a different source than the other?

I think that software developers can use the psychoacoustics to create a program that will also "discriminate" different sound sources given only a stereo track.
I believe once the simpliest working implementation is done, further improving the processor is going to provide us with a useful tool for "demixing" sound.

I've found some sources on this matter (search results for "sound demixing"):
http://www.math.uci.edu/icamp/summer/re ... _demix.pdf
http://mdsp.smartelectronix.com/2005/11 ... -demixing/

I wonder if anyone has developed such a thing (any commercial tools present?). If there is a proof that this is possible? How far can it go?

I guess I don't need to convince anyone that such a tool would be a great thing for any sound engineer - most likely for removing unwanted reverb from human speech.

What do you think? Do you want to help this project? I'm not a good coder myself, and I can't write DSP code, but I guess there are lots of skilled programmers who can, maybe we can come up with something awesome together