-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AWS plugin for TTS and STT #1302
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 827a09c The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@@ -19,4 +19,5 @@ pip install \ | |||
"${SCRIPT_DIR}/livekit-plugins-rag" \ | |||
"${SCRIPT_DIR}/livekit-plugins-playai" \ | |||
"${SCRIPT_DIR}/livekit-plugins-silero" \ | |||
"${SCRIPT_DIR}/livekit-plugins-turn-detector" | |||
"${SCRIPT_DIR}/livekit-plugins-turn-detector" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw we already have install-plugins-editable.sh not sure why this script was committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chnages is very old, submitted by another contributor, I'm picking up now and fixing stuff. will share final PR soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved from CI in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need two versions because editable is not the same thing as local install. mypy requires latter
return credentials.access_key, credentials.secret_key | ||
|
||
|
||
TTS_SPEECH_ENGINE = Literal["standard", "neural", "long-form", "generative"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this to another file. Check how we do it for other TTS/STT
|
||
response = await client.synthesize_speech(**_strip_nones(params)) | ||
|
||
if "AudioStream" in response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit avoid the extra indent here
except Exception as e: | ||
logger.exception(f"an error occurred while streaming inputs: {e}") | ||
|
||
handler = TranscriptEventHandler(stream.output_stream, self._event_ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we create a separate class?
self, | ||
*, | ||
voice: str | None = "Ruth", | ||
language: TTS_LANGUAGE | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language: TTS_LANGUAGE | None = None, | |
language: TTS_LANGUAGE | None = None, |
We should always allow a str too here, we can't guarantee we will update the languages quickly
*, | ||
voice: str | None = "Ruth", | ||
language: TTS_LANGUAGE | None = None, | ||
output_format: TTS_OUTPUT_FORMAT = "pcm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to expose the output format. we only support pcm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do support mp3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah true!
|
||
# If API key and secret are provided, create a session with them | ||
if api_key and api_secret: | ||
session = boto3.Session( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this making network calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boto3.Session()
doesn’t make network calls, but session.get_credentials() does if API keys and secrets aren’t cached., but we’re calling it during initialization, it’s a one-time operation.
try: | ||
async for frame in self._input_ch: | ||
if isinstance(frame, rtc.AudioFrame): | ||
await stream.input_stream.send_audio_event( | ||
audio_chunk=frame.data.tobytes() | ||
) | ||
await stream.input_stream.end_stream() | ||
|
||
except Exception as e: | ||
logger.exception(f"an error occurred while streaming inputs: {e}") | ||
|
||
async def handle_transcript_events(): | ||
try: | ||
async for event in stream.output_stream: | ||
if isinstance(event, TranscriptEvent): | ||
self._process_transcript_event(event) | ||
except Exception as e: | ||
logger.exception( | ||
f"An error occurred while handling transcript events: {e}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using try — finally. We have an utility for it here
finally: | ||
await utils.aio.gracefully_cancel(*tasks) | ||
except Exception as e: | ||
logger.exception(f"An error occurred while streaming inputs: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is swallowing exceptions? In this case the baseclass will not try to reconnect on failure
def get_client(self): | ||
"""Returns a client creator context.""" | ||
return self._session.create_client( | ||
"polly", | ||
region_name=self._opts.speech_region, | ||
aws_access_key_id=self._api_key, | ||
aws_secret_access_key=self._api_secret, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we hide this?
from typing import Any, Callable | ||
|
||
import aiohttp | ||
from aiobotocore.session import AioSession, get_session # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't support types?
This PR implements AWS plugin for TTS and STT