Voice Input with ElevenLabs - Chrome Extension

A Chrome extension that enables you to use ElevenLabs speech recognition technology for voice input in any text field on a web page.

Important: API Key

⚠️ This extension is provided without an API key. To use it, you must:

Register on the ElevenLabs website
Go to "Profile Settings"
Find or generate your API key
Carefully review the Privacy Policy and Terms of Service
- ElevenLabs may store and analyze submitted audio recordings
- Your voice data may be used to improve the service
- Usage limitations apply depending on your account type

We recommend:

Using a unique API key for this extension
Regularly updating your key for security purposes
Monitoring your key usage in your ElevenLabs dashboard
If necessary, restricting key usage by IP or setting other limitations in ElevenLabs settings

Features

Activate voice recording with a double press and hold of the Cmd key (Mac) or Ctrl key (Windows/Linux)
Convert speech to text using the ElevenLabs API
Customizable recognition language (Russian, English, and others)
Insert recognized text into the active text field with context awareness
Automatically copy to clipboard if no active input field is present
Visual recording process indication (customizable)
Automatic settings page opening when API key is missing
Support for insertion undo (Cmd+Z / Ctrl+Z)
Advanced recognition settings:
- Sound event marking (laughter, applause, etc.)
- Timestamps for words or characters
- Speaker identification in dialogues
- Speaker count specification (up to 32)
- Keywords to improve recognition accuracy
- Preferred microphone selection
- Audio recording debugging
- Visual recording indication customization

Service Architecture (PageObject Pattern)

The extension uses the PageObject pattern to encapsulate interaction with DOM, browser API, and other external services.

Main advantages of the architecture:

Modularity - each service is responsible for its area
Encapsulation - complexities of working with DOM and browser APIs are hidden
Improved testability - modules can be tested in isolation
Cleaner code without primitive DOM manipulations
Easier maintenance - functionality is divided into logical blocks

Project structure:

/
├── core/                        # Application core
│   ├── page-object.js           # Main PageObject class - service coordinator
│   └── localization.js          # Core localization module
├── services/                    # Services encapsulating functionality
│   ├── logger.js                # Logging service
│   ├── chrome.js                # Chrome API service
│   ├── i18n.js                  # Internationalization service
│   ├── settings.js              # Settings service
│   ├── dom.js                   # DOM service
│   ├── events.js                # Events service
│   ├── media.js                 # Media service (audio)
│   ├── ui.js                    # UI elements service
│   ├── text.js                  # Text insertion service
│   ├── clipboard.js             # Clipboard service
│   ├── notifications.js         # Notifications service
│   ├── audio-analyzer.js        # Audio analysis service
│   ├── speech-api.js            # Speech recognition API service
│   └── keyboard-controller.js   # Keyboard events service
├── content.js                   # Content script entry point
├── background.js                # Background script
├── popup.html/js                # Extension popup window
├── options.html/js              # Extension settings page
└── manifest.json                # Extension manifest

Using PageObject

// In the main content script, PageObject is available through the global variable window.page
// Created during initialization in content.js

// Examples of using various services:

// Logging
window.page.logger.info('Information message');
window.page.logger.error('Error', errorObject);

// Working with events
window.page.events.onKeyDown('Control', (event, currentTime) => {
  // Handle Control key press
});

// Working with media
const stream = await window.page.media.getAudioStream();
const recorder = window.page.media.createRecorder(stream);
window.page.media.startRecording();

// Text insertion
await window.page.text.insertText('Recognized text');

// UI management
window.page.ui.showMask();
window.page.ui.changeMaskColor('rgba(0, 255, 0, 0.15)');
window.page.ui.hideMask();

// Working with settings
const apiKey = await window.page.settings.getValue('apiKey');
await window.page.settings.checkApiKey();

// Localization
const message = window.page.i18n.getTranslation('key_name');
window.page.i18n.setLanguage('en');
window.page.i18n.applyTranslations(document.querySelector('.my-element'));

Extension Installation

Download all files into a single folder on your computer
Open Chrome and navigate to chrome://extensions/
Enable "Developer Mode" (toggle in the top right corner of the page)
Click the "Load unpacked extension" button
Select the folder containing the extension files

Usage

Go to any web page and place your cursor in a text field
Quickly double-press and hold Cmd (Mac) or Ctrl (Windows/Linux)
Speak while holding the key
Release the key to finish recording
The recognized text will be inserted into the active field with undo capability (Cmd+Z / Ctrl+Z)

Extension Configuration

The extension provides a user-friendly interface for configuration:

Click on the extension icon in the Chrome toolbar
Click the "Open Settings" button
In the settings page that opens, first specify:
- ElevenLabs API key (required)
  - Get the key from the ElevenLabs website (see "Important: API Key" section)
  - The key field is hidden by default for security
  - The "Show" button allows you to view and edit the key
  - The field automatically shows when validation errors occur
  - Visual error indication with animation
- Speech recognition language
- Enable/disable sound event marking
- Timestamp detail level (disabled/by words/by characters)
- Speaker identification in dialogues
- Maximum number of speakers (1-32)
- Keywords to improve recognition (up to 100 words)
- Preferred microphone for faster recording startup
- Audio recording debugging (playback of recording)
- Visual recording process indication

Requirements

Chrome version 80 or higher
Microphone access
Valid ElevenLabs API key (registration required)
Internet connection for sending audio to the ElevenLabs API
HTTPS connection for clipboard functionality
Agreement with ElevenLabs' privacy policy and terms of service

Security and Privacy

The extension does not store or transmit data to third parties other than ElevenLabs
Audio recordings are sent directly to the ElevenLabs API
API key is stored locally in the browser in encrypted form
We recommend regularly checking key usage in your ElevenLabs dashboard
If you suspect your key has been compromised, immediately replace it in the extension settings and deactivate the old key in ElevenLabs

Operational Features

The extension considers context when inserting text:

If the cursor is after a punctuation mark (., !, ?), a space is added before insertion
If the cursor is after a line break character, text is inserted without additional formatting
In other cases, ". " is added before the inserted text to separate thoughts

When no active input field is present:

Recognized text is automatically copied to the clipboard
A notification of successful copying is displayed

Visual recording indication:

Yellow overlay during recording initialization
Green overlay during active recording
Can be disabled in settings

API key verification:

When attempting to start recording, the presence and validity of the API key is checked
If the key is missing or invalid, the settings page opens automatically
The key input field is automatically shown and focused when an error occurs
Recording will not start until a valid API key is provided

Code Working Principles

Preferably divide large functions into smaller specialized ones
Each function should have a single responsibility
Minimize code duplication
Improve readability and maintainability
Maximum refactoring size: up to 100 lines of code at a time

Code Management

Use git for version control
Make each change as a separate commit with an informative message
Preferred commit format: "Refactoring: [brief description of changes]"

Limitations

Does not work on Chrome system pages (chrome://)
Requires microphone access permission
Free ElevenLabs account limitations may apply to speech recognition
Clipboard copying may be restricted on some websites for security reasons

Troubleshooting

If you experience issues with the extension:

Check if microphone access is enabled for the extension
Ensure your ElevenLabs API key is valid
Check the developer console for errors (F12 -> Console tab)
Try reloading the extension on the chrome://extensions/ page
If text is not copied to the clipboard:
- Make sure the page is loaded via HTTPS
- Check if copying is allowed on the current site
- Try using a different browser or update your current one

API Documentation

The extension uses the ElevenLabs Speech-to-Text API. Full documentation is available at: https://elevenlabs.io/docs/api-reference/speech-to-text/convert

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
_locales		_locales
core		core
services		services
INSTALL.md		INSTALL.md
README.md		README.md
README.ru.md		README.ru.md
background.js		background.js
content.js		content.js
icon128.png		icon128.png
icon16.png		icon16.png
icon48.png		icon48.png
image.jpg		image.jpg
image.png		image.png
image.webp		image.webp
manifest.json		manifest.json
options.html		options.html
options.js		options.js
popup.html		popup.html
popup.js		popup.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Input with ElevenLabs - Chrome Extension

Important: API Key

Features

Service Architecture (PageObject Pattern)

Main advantages of the architecture:

Project structure:

Using PageObject

Extension Installation

Usage

Extension Configuration

Requirements

Security and Privacy

Operational Features

Code Working Principles

Code Management

Limitations

Troubleshooting

API Documentation

About

Uh oh!

Releases

Packages

Languages

kaluchi/speech-to-text-extension

Folders and files

Latest commit

History

Repository files navigation

Voice Input with ElevenLabs - Chrome Extension

Important: API Key

Features

Service Architecture (PageObject Pattern)

Main advantages of the architecture:

Project structure:

Using PageObject

Extension Installation

Usage

Extension Configuration

Requirements

Security and Privacy

Operational Features

Code Working Principles

Code Management

Limitations

Troubleshooting

API Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages