This is a weekend project idea. Wouldn't it be cool to sing into a mic and have your voice converted to the sweet sound of a 2A03 pulse-width-modulated square wave? "Yes, Matt, that would be cool! But how can it be done!?" I'll tell you.
I've got a VST instrument that does my NES synthesis ready to go. We just need to send it some MIDI data with pitch bend and mod wheel parameters for the duty cycle modulation. As you all know, duty cycle (or pulse width) modulation is a critical part of that NES square wave sound. The NES can produce square waves with 12.5%, 25%, and 50% pulse width.
First order of business is pitch detection, obviously. We'll do that with some algorithm appropriate for human speech; something that makes some sensible adjustments, like smoothing, continuation, etc., to improve detection.
We can go beyond this and use formants to control the duty cycle. The goal is to make an "Oh" sound produce a perfect square wave (50% pulse width) and an "Eee" sound produce a sharper sound, say 12.5% pulse width, and so on. That is to say, make that square wave talk. Linear predictive analysis can help us separate the source from the filter in periodic signals - and the filter, in this case, is the vocal tract. Or, we can get at the formants by matching the coarse spectral shape to a basis set, but I really haven't researched this. Remember, we only need to identify three or four different formant classes, one for each of the 2A03 pulse widths. Let it suffice to say: it can be done!
So we have broken the incoming vocal signal into little chunks, and for each chunk, we have the pitch and the formant. We package this in some MIDI data and patch it into the VST plugin that synthesizes our Nintendo beeps - voilà; the voice to NES converter.
Extra Credit:
Get your fricatives on that noise channel. What!!!?
Update:
We can take care of the aforementioned dirty work with a few really helpful Max/MSP externals from Tristan Jehan.
Leave a Reply