|February 23rd, 2018|
|ideas, music, tech|
- The shape of my fingers on the mandolin fretboard indicates what chord I'm about to play.
- I tap my foot on downbeats.
That is, I'd like to infer the bass note from hand position, and then get the timing and intensity from foot tapping.
I already know how to do the foot portion: I regularly play with something that converts foot tapping to midi, so the tricky part is figuring out which note to play.
There's a lot of research on figuring out chords from audio, but that's not useful to me here. Consider switching from one chord to the next: the soonest you might hear of the new chord is the downbeat, but by that point it's too late to make the bass come out on the right note. Plus, often you want the bass on the downbeat and the mandolin on the upbeat.
What I think could work better is using a camera. The chord shapes I play are very straightforward, and I switch shapes far enough in advance that something could interpret the shape and infer the appropriate bass note to play before I hit the foot trigger.
This might be the sort of thing that deep learning has made a lot easier? Here's how I think training this could work:
- Set up a camera pointing at the mandolin fretboard.
- Record a video where you only play variations of the same chord. Say, lots of different things that should all trigger the bass note 'G'. Move around as you do this, trying to get a range of fretboard angles.
- Record another video where you do the same thing with a different chord.
- Pull out the frames and label them with which video they came from, train on that.
Alternatively, you could train two networks, one to identify the hand shape, and another to identify the fret that the lowest string is being played at.
People who know more ML than me: is this the sort of thing that actually is relatively straightforward with entry-level tools these days? With enough accuracy that it's not going to play the wrong notes live in an unfamiliar environment?