It's pretty complex, actually, and since I've never used Ableton I will explain to the best of my ability.
Since you 'ripped' the audio off of something, it's very hard to separate the words from the music. You would have to delicately cute the different pitches of the audio to be able to separate words from music, at least I think. And even then, you will still get some off bits here and there.
That's the best I can offer, I'm afraid. :|