Remove music from live audio stream, leaving just voices - possible?

Greetings. I'm looking for a filter or tool that can remove the music from a live video source (e.g., a NASA broadcast), while leaving the voices and non-musical noises. It doesn't have to be perfect, but it needs to remove/mute enough of the music to avoid triggering YouTube's copyright detection, and not mess up the voices too much. To be clear: I'm not trying to circumvent copyright rules. I'm trying to remove potentially-copyrighted incidental background music from an otherwise copyright-free broadcast. This is my "holy grail". The human brain can easily distinguish between music and speech, and can "filter" out the music in order to focus on the words. Surely there must be a way to do this programmatically, possibly with some sort of AI processing. A simple frequency filter is not sufficient; there are too many common frequencies between voice and music. There could even be a processing delay of a few seconds; it doesn't have to be an instantaneous filter - but the "scrubbed" audio output needs to be sync'able with the video.

Does such a filter or tool exist? I'd even be willing to pay a clever audio-knowledgeable programmer to develop one. I myself am a developer, but I have no knowledge of audio algorithms or AI.