Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AWS plugin for TTS and STT #1302

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Support AWS plugin for TTS and STT #1302

wants to merge 40 commits into from

Conversation

jayeshp19
Copy link
Collaborator

@jayeshp19 jayeshp19 commented Dec 26, 2024

This PR implements AWS plugin for TTS and STT

Copy link

changeset-bot bot commented Dec 26, 2024

🦋 Changeset detected

Latest commit: 827a09c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
livekit-plugins-aws Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@@ -19,4 +19,5 @@ pip install \
"${SCRIPT_DIR}/livekit-plugins-rag" \
"${SCRIPT_DIR}/livekit-plugins-playai" \
"${SCRIPT_DIR}/livekit-plugins-silero" \
"${SCRIPT_DIR}/livekit-plugins-turn-detector"
"${SCRIPT_DIR}/livekit-plugins-turn-detector" \
Copy link
Member

@theomonnom theomonnom Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw we already have install-plugins-editable.sh not sure why this script was committed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chnages is very old, submitted by another contributor, I'm picking up now and fixing stuff. will share final PR soon

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved from CI in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need two versions because editable is not the same thing as local install. mypy requires latter

return credentials.access_key, credentials.secret_key


TTS_SPEECH_ENGINE = Literal["standard", "neural", "long-form", "generative"]
Copy link
Member

@theomonnom theomonnom Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to another file. Check how we do it for other TTS/STT


response = await client.synthesize_speech(**_strip_nones(params))

if "AudioStream" in response:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit avoid the extra indent here

except Exception as e:
logger.exception(f"an error occurred while streaming inputs: {e}")

handler = TranscriptEventHandler(stream.output_stream, self._event_ch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we create a separate class?

self,
*,
voice: str | None = "Ruth",
language: TTS_LANGUAGE | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
language: TTS_LANGUAGE | None = None,
language: TTS_LANGUAGE | None = None,

We should always allow a str too here, we can't guarantee we will update the languages quickly

*,
voice: str | None = "Ruth",
language: TTS_LANGUAGE | None = None,
output_format: TTS_OUTPUT_FORMAT = "pcm",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to expose the output format. we only support pcm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do support mp3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah true!

@jayeshp19 jayeshp19 changed the title [draft] Support AWS plugin for TTS and STT Support AWS plugin for TTS and STT Jan 20, 2025
@jayeshp19 jayeshp19 marked this pull request as ready for review January 20, 2025 09:51

# If API key and secret are provided, create a session with them
if api_key and api_secret:
session = boto3.Session(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this making network calls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boto3.Session() doesn’t make network calls, but session.get_credentials() does if API keys and secrets aren’t cached., but we’re calling it during initialization, it’s a one-time operation.

Comment on lines 154 to 173
try:
async for frame in self._input_ch:
if isinstance(frame, rtc.AudioFrame):
await stream.input_stream.send_audio_event(
audio_chunk=frame.data.tobytes()
)
await stream.input_stream.end_stream()

except Exception as e:
logger.exception(f"an error occurred while streaming inputs: {e}")

async def handle_transcript_events():
try:
async for event in stream.output_stream:
if isinstance(event, TranscriptEvent):
self._process_transcript_event(event)
except Exception as e:
logger.exception(
f"An error occurred while handling transcript events: {e}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using try — finally. We have an utility for it here

finally:
await utils.aio.gracefully_cancel(*tasks)
except Exception as e:
logger.exception(f"An error occurred while streaming inputs: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is swallowing exceptions? In this case the baseclass will not try to reconnect on failure

Comment on lines 108 to 115
def get_client(self):
"""Returns a client creator context."""
return self._session.create_client(
"polly",
region_name=self._opts.speech_region,
aws_access_key_id=self._api_key,
aws_secret_access_key=self._api_secret,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hide this?

from typing import Any, Callable

import aiohttp
from aiobotocore.session import AioSession, get_session # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't support types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants