Here's a comprehensive README.md
file for you all to get started with this concept of Multimodality AI project
MVP Chatbot is an AI-powered chatbot capable of performing image recognition and text generation using the Replicate API which uses MIstral 7B under the hood. The chatbot interacts with users, processes images, and generates text responses based on user input.
- Image recognition
- Image generation
- Text generation
- Interactive chat experience
- Easy to set up and extend
-
Clone the repository:
git clone https://github.com/Kaif9999/Multimodal-AI cd Mulimodal-AI
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
.\venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
Create a
.env
file in the root directory and add the following variables:REPLICATE_API_KEY = <'Your Replicate API key'> REPLICATE_TEXT_MODEL = yorickvp/llava-v1.6-mistral-7b REPLICATE_TEXT_MODEL_VERSION = 19be067b589d0c46689ffa7cc3ff321447a441986a7694c01225973c2eafc874 REPLICATE_IMAGE_MODEL = stability-ai/sdxl REPLICATE_IMAGE_MODEL_VERSION = 7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc
-
Run the chatbot:
python main1.py
chainlit run main1.py
-
Interacting with the chatbot:
- Start a chat and send messages.
- Upload images for recognition.
Multimodal-AI/
├── venv/ # Virtual environment directory
├── .env # Environment Variables
├── .gitignore # gitignore file
├── main1.py # Main script to run the chatbot
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── chainlit.md # Text that displays on the frontend of the Chatbot
└── test_main1.py # Performs Unit Tests and Mock Test on main1.py file
The main script initializes the chatbot, sets up the Replicate client, processes user messages, and handles image uploads.
-
Imports:
import time import chainlit as cl import replicate import requests from chainlit import user_session from decouple import config
-
On chat start: Initializes message history and Replicate client.
@cl.on_chat_start async def on_chat_start(): message_history = [] user_session.set("MESSAGE_HISTORY", message_history) api_token = config("REPLICATE_API_KEY") client = replicate.Client(api_token=api_token) user_session.set("REPLICATE_CLIENT", client)
-
Upload image: Handles image upload to Replicate.
def upload_image(image_path): upload_response = requests.post( "https://dreambooth-api-experimental.replicate.com/v1/upload/filename.png", headers={"Authorization": f"Token {config('REPLICATE_API_KEY')}"} ).json() file_binary = open(image_path, "rb").read() requests.put(upload_response["upload_url"], headers={'Content-Type': 'image/png'}, data=file_binary) return upload_response["serving_url"]
-
On message: Processes user messages, uploads images, and generates responses.
@cl.on_message async def main(message: cl.Message): msg = cl.Message(content="", author="mvp assistant") await msg.send() images = [file for file in message.elements if "image" in file.mime] prompt = f"You are a helpful Assistant that can help me with image recognition and text generation.\n\nPrompt: {message.content}" message_history = user_session.get("MESSAGE_HISTORY") client = user_session.get("REPLICATE_CLIENT") if images: message_history = [] url = upload_image(images[0].path) input_vision = {"image": url, "top_p": 1, "prompt": prompt, "max_tokens": 1024, "temperature": 0.6} else: input_vision = {"top_p": 1, "prompt": prompt, "max_tokens": 1024, "temperature": 0.5, "history": message_history} output = client.run(f"{config('REPLICATE_MODEL')}:{config('REPLICATE_MODEL_VERSION')}", input=input_vision) ai_message = "" for item in output: await msg.stream_token(item) time.sleep(0.1) ai_message += item await msg.send() message_history.append(f"User: {message.content}") message_history.append(f"Assistant: {ai_message}") user_session.set("MESSAGE_HISTORY", message_history)
- Implementation of image generation function
- To be able to give input in multiple languages, and get output in your desired language
- Implementation of text to speech
- Looking for integrating Gemini-Flash for image and text generation
Contributions are welcome! Please follow these steps to contribute:
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch
- Make your changes.
- Commit your changes:
git commit -m "Add feature"
- Push to the branch:
git push origin feature-branch
- Create a Pull Request.