A Chrome extension that provides an AI-powered assistant to help you navigate and interact with web pages through natural language commands.
- 🤖 Natural Language Control: Interact with web pages using simple text commands
- 🖱️ Visual Cursor: See where the AI assistant is clicking with an animated cursor
- 🔍 Smart Element Detection: Automatically identifies and interacts with clickable elements
- ⌨️ Form Filling: Fill out forms and input fields with natural language instructions
- 📜 Content Extraction: Extract page content in various formats (text, markdown)
- 🔄 Navigation: Search Google, navigate to URLs, and go back in history
- ⬆️ Scrolling: Control page scrolling with natural commands
- ⚡ Multiple AI Providers: Support for both OpenAI and Google Gemini
- Click: Click on any interactive element on the page
- Fill: Input text into form fields
- Search Google: Perform Google searches directly
- Navigate: Go to specific URLs or go back in history
- Scroll: Scroll the page up or down
- Send Keys: Send keyboard inputs to active elements
- Extract Content: Get page content as text or markdown
- Clone this repository or download the source code
- Open Chrome and navigate to
chrome://extensions/
- Enable "Developer mode" in the top right corner
- Click "Load unpacked" and select the directory containing the extension files
- Click the extension icon in Chrome to open the options page
- Choose your preferred AI provider (OpenAI or Google Gemini)
- Configure the API settings:
- Enter your OpenAI API key
- Optionally customize the model (default: gpt-4o-mini)
- Enter your Gemini API key
- Optionally customize the model (default: gemini-2.0-flash-exp)
- Click the extension icon to open the sidebar
- Type your command in natural language (e.g., "Click the login button" or "Fill the email field with [email protected]")
- The AI assistant will:
- Analyze the page structure
- Identify relevant elements
- Execute the requested action
- Provide visual feedback with the cursor
Enable debug mode in the options page to:
- View captured screenshots in the chat
- See detailed logging information
- Help troubleshoot interactions
Here are some example commands you can try:
- "Click the sign up button"
- "Fill the username field with johndoe"
- "Search for Chrome extensions"
- "Scroll down one page"
- "Go back to the previous page"
- "Extract the main content as markdown"
- Google Chrome browser
- API key from either OpenAI or Google Gemini
- Active internet connection
The extension consists of:
- Content Script: Handles page interactions and element detection
- Background Script: Manages AI communication and extension state
- Sidebar Interface: Provides the chat interface for user commands
- Options Page: Allows configuration of AI providers and settings
- API keys are stored locally in Chrome storage
- Screenshots are only used for AI analysis and are not stored
- No data is collected or stored outside of your browser
If the extension isn't working as expected:
- Check if your API key is correctly configured
- Ensure you have an active internet connection
- Try refreshing the page
- Check the browser console for error messages
- Enable debug mode for more detailed logging
This project is licensed under the MIT License - see the LICENSE file for details.