python - Google Speech returning all words ever spoken, instead of just the words from the transcript - Stack Overflow

admin2025-04-19 1

Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.

My code:

if not response.results:
            continue

        result = response.results[0]
        if not result.alternatives:
            continue

        transcript = result.alternatives[0].transcript
        confidence = result.alternatives[0].confidence
        words = result.alternatives[0].words

        if result.is_final:
            print("*******************")
            sensory_log.info(f"Final STT output: {transcript}")
            print(f"Confidence: {confidence:.2f}")
            self.process_input(transcript)

            # Check for multiple speakers using words
            if words:
                print(words)
                # Track unique speaker IDs using a list
                speaker_ids = []
                for word in words:
                    print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
                    if word.speaker_tag not in speaker_ids:
                        speaker_ids.append(word.speaker_tag)
                
                print(f"Detected {len(speaker_ids)} speakers")

My code:

if not response.results:
            continue

        result = response.results[0]
        if not result.alternatives:
            continue

        transcript = result.alternatives[0].transcript
        confidence = result.alternatives[0].confidence
        words = result.alternatives[0].words

        if result.is_final:
            print("*******************")
            sensory_log.info(f"Final STT output: {transcript}")
            print(f"Confidence: {confidence:.2f}")
            self.process_input(transcript)

            # Check for multiple speakers using words
            if words:
                print(words)
                # Track unique speaker IDs using a list
                speaker_ids = []
                for word in words:
                    print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
                    if word.speaker_tag not in speaker_ids:
                        speaker_ids.append(word.speaker_tag)
                
                print(f"Detected {len(speaker_ids)} speakers")

Share Improve this question asked Mar 4 at 0:03 JackKalish 1,5953 gold badges15 silver badges26 bronze badges

1 it is looks strange. Maybe it is bug. And maybe you should send it to authors of this module. – furas Commented Mar 4 at 10:01
1 Hi @jacki, I have posted an answer. I hope it will help you. do consider accepting and upvoting if it helps, as per Stack Overflow guidelines, helping more Stack contributors with their research – Sourav Dutta Commented Mar 5 at 21:37

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

Here the problem is result.alternatives[0].words , which contains all the words from the previous transcript also . So you can filter the word the help of start_time = word_info.start_time .

when result.is_final is True and it’s not capture the word from previous transcript .

In your code you have to modify this section ,

if words: print(words) . You can refer to this documentation to change in your code

isFinal indicates whether the results obtained within this list entry are interim or are final. Checkout this full doc for more info .

You can also file a bug here issue tracker and vote with + one .

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745066752a283033.html

最新回复(0)