Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.
My code:
if not response.results:
continue
result = response.results[0]
if not result.alternatives:
continue
transcript = result.alternatives[0].transcript
confidence = result.alternatives[0].confidence
words = result.alternatives[0].words
if result.is_final:
print("*******************")
sensory_log.info(f"Final STT output: {transcript}")
print(f"Confidence: {confidence:.2f}")
self.process_input(transcript)
# Check for multiple speakers using words
if words:
print(words)
# Track unique speaker IDs using a list
speaker_ids = []
for word in words:
print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
if word.speaker_tag not in speaker_ids:
speaker_ids.append(word.speaker_tag)
print(f"Detected {len(speaker_ids)} speakers")
Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.
My code:
if not response.results:
continue
result = response.results[0]
if not result.alternatives:
continue
transcript = result.alternatives[0].transcript
confidence = result.alternatives[0].confidence
words = result.alternatives[0].words
if result.is_final:
print("*******************")
sensory_log.info(f"Final STT output: {transcript}")
print(f"Confidence: {confidence:.2f}")
self.process_input(transcript)
# Check for multiple speakers using words
if words:
print(words)
# Track unique speaker IDs using a list
speaker_ids = []
for word in words:
print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
if word.speaker_tag not in speaker_ids:
speaker_ids.append(word.speaker_tag)
print(f"Detected {len(speaker_ids)} speakers")
Here the problem is
result.alternatives[0].words
, which contains all the words from the previous transcript also . So you can filter the word the help of start_time = word_info.start_time
.
when result.is_final is True
and it’s not capture the word from previous transcript .
In your code you have to modify this section ,
if words: print(words) . You can refer to this documentation to change in your code
isFinal indicates whether the results obtained within this list entry are interim or are final. Checkout this full doc for more info .
You can also file a bug here issue tracker and vote with +
one .