This project is a Streamlit application that integrates AI chat capabilities with a camera feed. It allows users to interact with an AI model using text or voice input and receive responses in both text and audio formats. The application also supports multiple languages and voice selections.
- Camera Integration: Start and stop the camera feed directly from the application.
- Voice Input: Use your microphone to input queries.
- Text Input: Type your queries directly into the chat interface.
- Language Support: Choose from multiple languages for input and output.
- Voice Selection: Select from a variety of voices for audio responses.
- LLM Service Selection: Choose between different LLM services for generating responses.
- Clone the repository:
git clone https://github.com/JKL404/AI-Chat-with-Camera-Integration.git
- Navigate to the project directory:
cd AI-Chat-with-Camera-Integration
- Install the required packages:
pip install -r requirements.txt
Create a .env
file in the root of your project directory and add the following keys:
ELEVENLABS_API_KEY="your_elevenlabs_api_key"
GROQ_API_KEY="your_groq_api_key"
Replace "your_elevenlabs_api_key"
and "your_groq_api_key"
with your actual API keys.
Run the application using Streamlit:
streamlit run app.py
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.
The application supports multiple languages including English, Nepali, Hindi, and French. You can select your preferred language from the sidebar. Additionally, a variety of voices are available for audio responses, which can also be selected from the sidebar. This feature allows users to tailor the application to their linguistic and auditory preferences.
You can choose between different LLM services such as Groq and Anthropic for generating AI responses. This selection can be made from the sidebar under the "LLM Service Selection" section. Different services may offer varying response styles or capabilities, so users can experiment to find the best fit for their needs.
- Camera: The camera can be started and stopped using the controls in the sidebar. The feed is displayed in the sidebar when active, allowing users to see what the AI is "seeing" and potentially use this visual input in their queries.
- Chat: The chat interface allows for both text and voice input. Voice input can be activated by clicking the "Speak" button, which will record and transcribe your speech. This dual input method ensures flexibility in how users can interact with the AI.
- Ensure that your camera and microphone permissions are enabled for the application.
- If the application does not start, check that all dependencies are installed correctly and that your API keys are set in the
.env
file.