Implement Push-to-Talk feature and enhance audio input settings in AssistantView and CustomizeView. Update README for API key instructions and improve audio processing logic in OpenAI SDK. Adjust pnpm-lock.yaml for dependency updates.

This commit is contained in:
Илья Глазунов 2026-01-16 00:41:58 +03:00
parent 669c019fd8
commit 656e8f0932
10 changed files with 5190 additions and 4095 deletions

View File

@ -1,10 +1,5 @@
<img width="1299" height="424" alt="cd (1)" src="https://github.com/user-attachments/assets/b25fff4d-043d-4f38-9985-f832ae0d0f6e" /> <!-- <img width="1299" height="424" alt="cd (1)" src="https://github.com/user-attachments/assets/b25fff4d-043d-4f38-9985-f832ae0d0f6e" /> -->
# Mastermind
## Recall.ai - API for desktop recording
If youre looking for a hosted desktop recording API, consider checking out [Recall.ai](https://www.recall.ai/product/desktop-recording-sdk/?utm_source=github&utm_medium=sponsorship&utm_campaign=sohzm-cheating-daddy), an API that records Zoom, Google Meet, Microsoft Teams, in-person meetings, and more.
This project is sponsored by Recall.ai.
--- ---
@ -14,33 +9,36 @@ This project is sponsored by Recall.ai.
> [!NOTE] > [!NOTE]
> During testing it wont answer if you ask something, you need to simulate interviewer asking question, which it will answer > During testing it wont answer if you ask something, you need to simulate interviewer asking question, which it will answer
A real-time AI assistant that provides contextual help during video calls, interviews, presentations, and meetings using screen capture and audio analysis. A real-time AI assistant that provides contextual help during video calls, interviews, presentations, and meetings using screen capture and audio analysis. It is fork of [Cheating Daddy](https://github.com/sohzm/cheating-daddy) project.
## Features ## Features
- **Live AI Assistance**: Real-time help powered by Google Gemini 2.0 Flash Live - **Live AI Assistance**: Real-time help powered by Gemini API / OpenAI SDK / OpenAI Realtime API, so you can choose which one you want to use
- **Screen & Audio Capture**: Analyzes what you see and hear for contextual responses - **Screen & Audio Capture**: Analyzes what you see and hear for contextual responses
- **Multiple Profiles**: Interview, Sales Call, Business Meeting, Presentation, Negotiation - **Multiple Profiles**: Interview, Sales Call, Business Meeting, Presentation, Negotiation
- **Transparent Overlay**: Always-on-top window that can be positioned anywhere - **Transparent Overlay**: Always-on-top window that can be positioned anywhere, if something goes wrong you can hide it without stoping session and losing context!
- **Click-through Mode**: Make window transparent to clicks when needed - **Click-through Mode**: Make window transparent to clicks when needed
- **Cross-platform**: Works on macOS, Windows, and Linux (kinda, dont use, just for testing rn) - **Cross-platform**: Works on macOS, Windows, and Linux (kinda, dont use, just for testing rn)
## Setup ## Setup
1. **Get a Gemini API Key**: Visit [Google AI Studio](https://aistudio.google.com/apikey) 1. **Get a API Key**: Visit [Google AI Studio](https://aistudio.google.com/apikey) or [OpenAI](https://platform.openai.com/docs/api-reference) or any other OpenAI-compatible API!
2. **Install Dependencies**: `npm install` 2. **Install Dependencies**: `pnpm install`
3. **Run the App**: `npm start` 3. **Run the App**: `pnpm start`
## Usage ## Usage
1. Enter your Gemini API key in the main window 1. Enter your API key in the main window, select provider and model you want to use in preferences
2. Choose your profile and language in settings 2. Choose your profile and language in settings
3. Click "Start Session" to begin 3. Click "Start Session" to begin, if you want to use push-to-talk mode, you can enable it in preferences
4. Position the window using keyboard shortcuts 4. Position the window using keyboard shortcuts, or use your mouse to move it
5. The AI will provide real-time assistance based on your screen and what interview asks 5. The AI will provide real-time assistance based on your screen and system audio/microphone input, you can also send text messages to AI by pressing Enter
## Keyboard Shortcuts ## Keyboard Shortcuts
> [!NOTE]
> All keyboard shortcuts are customizable in settings. You can check some default shortcuts below.
- **Window Movement**: `Ctrl/Cmd + Arrow Keys` - Move window - **Window Movement**: `Ctrl/Cmd + Arrow Keys` - Move window
- **Click-through**: `Ctrl/Cmd + M` - Toggle mouse events - **Click-through**: `Ctrl/Cmd + M` - Toggle mouse events
- **Close/Back**: `Ctrl/Cmd + \` - Close window or go back - **Close/Back**: `Ctrl/Cmd + \` - Close window or go back
@ -48,13 +46,13 @@ A real-time AI assistant that provides contextual help during video calls, inter
## Audio Capture ## Audio Capture
- **macOS**: [SystemAudioDump](https://github.com/Mohammed-Yasin-Mulla/Sound) for system audio - **macOS**: [SystemAudioDump](https://github.com/Mohammed-Yasin-Mulla/Sound) for system audio capture, you can use microphone input as well
- **Windows**: Loopback audio capture - **Windows**: Loopback audio capture, you can use microphone input as well
- **Linux**: Microphone input - **Linux**: Microphone input
## Requirements ## Requirements
- Electron-compatible OS (macOS, Windows, Linux) - Electron-compatible OS (macOS, Windows, Linux)
- Gemini API key - AI Provider API key
- Screen recording permissions - Screen recording permissions
- Microphone/audio permissions - Microphone/audio permissions

8886
pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@ -366,6 +366,57 @@ export class AssistantView extends LitElement {
.region-select-btn span { .region-select-btn span {
margin-left: 4px; margin-left: 4px;
} }
.ptt-toggle-btn {
display: flex;
align-items: center;
justify-content: center;
background: transparent;
color: var(--text-secondary);
border: 1px solid var(--border-color);
padding: 6px 12px;
border-radius: 20px;
font-size: 12px;
cursor: pointer;
transition: all 0.15s ease;
}
.ptt-toggle-btn:hover {
background: var(--hover-background);
color: var(--text-color);
border-color: var(--text-color);
}
.ptt-toggle-btn.active {
color: var(--error-color);
border-color: var(--error-color);
}
.ptt-indicator {
display: flex;
align-items: center;
gap: 8px;
font-size: 11px;
color: var(--text-secondary);
margin-bottom: 6px;
}
.ptt-dot {
width: 8px;
height: 8px;
border-radius: 50%;
background: var(--border-color);
box-shadow: 0 0 0 1px var(--border-color);
}
.ptt-dot.active {
background: var(--error-color);
box-shadow: 0 0 0 1px var(--error-color);
}
.ptt-label {
font-family: 'SF Mono', Monaco, monospace;
}
`; `;
static properties = { static properties = {
@ -377,6 +428,9 @@ export class AssistantView extends LitElement {
flashCount: { type: Number }, flashCount: { type: Number },
flashLiteCount: { type: Number }, flashLiteCount: { type: Number },
aiProvider: { type: String }, aiProvider: { type: String },
pushToTalkActive: { type: Boolean },
audioInputMode: { type: String },
pushToTalkKeybind: { type: String },
}; };
constructor() { constructor() {
@ -388,6 +442,9 @@ export class AssistantView extends LitElement {
this.flashCount = 0; this.flashCount = 0;
this.flashLiteCount = 0; this.flashLiteCount = 0;
this.aiProvider = 'gemini'; this.aiProvider = 'gemini';
this.pushToTalkActive = false;
this.audioInputMode = 'auto';
this.pushToTalkKeybind = '';
} }
getProfileNames() { getProfileNames() {
@ -507,6 +564,7 @@ export class AssistantView extends LitElement {
// Load limits on mount // Load limits on mount
this.loadLimits(); this.loadLimits();
this.loadPushToTalkKeybind();
// Set up IPC listeners for keyboard shortcuts // Set up IPC listeners for keyboard shortcuts
if (window.require) { if (window.require) {
@ -532,10 +590,17 @@ export class AssistantView extends LitElement {
this.scrollResponseDown(); this.scrollResponseDown();
}; };
this.handlePushToTalkState = (event, state) => {
this.pushToTalkActive = state?.active ?? false;
this.audioInputMode = state?.inputMode ?? 'auto';
this.requestUpdate();
};
ipcRenderer.on('navigate-previous-response', this.handlePreviousResponse); ipcRenderer.on('navigate-previous-response', this.handlePreviousResponse);
ipcRenderer.on('navigate-next-response', this.handleNextResponse); ipcRenderer.on('navigate-next-response', this.handleNextResponse);
ipcRenderer.on('scroll-response-up', this.handleScrollUp); ipcRenderer.on('scroll-response-up', this.handleScrollUp);
ipcRenderer.on('scroll-response-down', this.handleScrollDown); ipcRenderer.on('scroll-response-down', this.handleScrollDown);
ipcRenderer.on('push-to-talk-state', this.handlePushToTalkState);
} }
} }
@ -557,6 +622,9 @@ export class AssistantView extends LitElement {
if (this.handleScrollDown) { if (this.handleScrollDown) {
ipcRenderer.removeListener('scroll-response-down', this.handleScrollDown); ipcRenderer.removeListener('scroll-response-down', this.handleScrollDown);
} }
if (this.handlePushToTalkState) {
ipcRenderer.removeListener('push-to-talk-state', this.handlePushToTalkState);
}
} }
} }
@ -584,6 +652,15 @@ export class AssistantView extends LitElement {
} }
} }
async loadPushToTalkKeybind() {
if (window.cheatingDaddy?.storage?.getKeybinds) {
const isMac = window.cheatingDaddy?.isMacOS || navigator.platform.includes('Mac');
const defaultKeybind = isMac ? 'Ctrl+Space' : 'Ctrl+Space';
const keybinds = await window.cheatingDaddy.storage.getKeybinds();
this.pushToTalkKeybind = keybinds?.pushToTalk || defaultKeybind;
}
}
getTotalUsed() { getTotalUsed() {
return this.flashCount + this.flashLiteCount; return this.flashCount + this.flashLiteCount;
} }
@ -608,6 +685,14 @@ export class AssistantView extends LitElement {
} }
} }
handlePushToTalkToggle() {
if (!window.require) {
return;
}
const { ipcRenderer } = window.require('electron');
ipcRenderer.send('push-to-talk-toggle');
}
scrollToBottom() { scrollToBottom() {
setTimeout(() => { setTimeout(() => {
const container = this.shadowRoot.querySelector('.response-container'); const container = this.shadowRoot.querySelector('.response-container');
@ -649,10 +734,26 @@ export class AssistantView extends LitElement {
render() { render() {
const responseCounter = this.getResponseCounter(); const responseCounter = this.getResponseCounter();
const showPushToTalk = this.aiProvider === 'openai-sdk' && this.audioInputMode === 'push-to-talk';
const keybindLabel = this.pushToTalkKeybind || 'Hotkey';
const pushToTalkLabel = this.pushToTalkActive
? 'Recording...'
: `Press ${keybindLabel} to start/stop`;
const pushToTalkButtonLabel = this.pushToTalkActive ? 'Stop' : 'Record';
return html` return html`
<div class="response-container" id="responseContainer"></div> <div class="response-container" id="responseContainer"></div>
${showPushToTalk
? html`
<div class="ptt-indicator">
<span class="ptt-dot ${this.pushToTalkActive ? 'active' : ''}"></span>
<span>Push-to-Talk:</span>
<span class="ptt-label">${pushToTalkLabel}</span>
</div>
`
: ''}
<div class="text-input-container"> <div class="text-input-container">
<button class="nav-button" @click=${this.navigateToPreviousResponse} ?disabled=${this.currentResponseIndex <= 0}> <button class="nav-button" @click=${this.navigateToPreviousResponse} ?disabled=${this.currentResponseIndex <= 0}>
<svg width="24px" height="24px" stroke-width="1.7" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"> <svg width="24px" height="24px" stroke-width="1.7" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
@ -671,6 +772,17 @@ export class AssistantView extends LitElement {
<input type="text" id="textInput" placeholder="Type a message to the AI..." @keydown=${this.handleTextKeydown} /> <input type="text" id="textInput" placeholder="Type a message to the AI..." @keydown=${this.handleTextKeydown} />
<div class="capture-buttons"> <div class="capture-buttons">
${showPushToTalk
? html`
<button
class="ptt-toggle-btn ${this.pushToTalkActive ? 'active' : ''}"
@click=${this.handlePushToTalkToggle}
title="Toggle Push-to-Talk recording"
>
${pushToTalkButtonLabel}
</button>
`
: ''}
<button class="region-select-btn" @click=${this.handleRegionSelect} title="Select region to analyze (like Win+Shift+S)"> <button class="region-select-btn" @click=${this.handleRegionSelect} title="Select region to analyze (like Win+Shift+S)">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor"> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor">
<path <path

View File

@ -537,6 +537,7 @@ export class CustomizeView extends LitElement {
color: var(--error-color); color: var(--error-color);
border-left: 2px solid var(--error-color); border-left: 2px solid var(--error-color);
} }
`; `;
static properties = { static properties = {
@ -549,6 +550,7 @@ export class CustomizeView extends LitElement {
backgroundTransparency: { type: Number }, backgroundTransparency: { type: Number },
fontSize: { type: Number }, fontSize: { type: Number },
theme: { type: String }, theme: { type: String },
audioInputMode: { type: String },
onProfileChange: { type: Function }, onProfileChange: { type: Function },
onLanguageChange: { type: Function }, onLanguageChange: { type: Function },
onImageQualityChange: { type: Function }, onImageQualityChange: { type: Function },
@ -587,6 +589,7 @@ export class CustomizeView extends LitElement {
// Audio mode default // Audio mode default
this.audioMode = 'speaker_only'; this.audioMode = 'speaker_only';
this.audioInputMode = 'auto';
// Custom prompt // Custom prompt
this.customPrompt = ''; this.customPrompt = '';
@ -795,6 +798,7 @@ export class CustomizeView extends LitElement {
this.backgroundTransparency = prefs.backgroundTransparency ?? 0.8; this.backgroundTransparency = prefs.backgroundTransparency ?? 0.8;
this.fontSize = prefs.fontSize ?? 20; this.fontSize = prefs.fontSize ?? 20;
this.audioMode = prefs.audioMode ?? 'speaker_only'; this.audioMode = prefs.audioMode ?? 'speaker_only';
this.audioInputMode = prefs.audioInputMode ?? 'auto';
this.customPrompt = prefs.customPrompt ?? ''; this.customPrompt = prefs.customPrompt ?? '';
this.theme = prefs.theme ?? 'dark'; this.theme = prefs.theme ?? 'dark';
this.aiProvider = prefs.aiProvider ?? 'gemini'; this.aiProvider = prefs.aiProvider ?? 'gemini';
@ -820,6 +824,7 @@ export class CustomizeView extends LitElement {
this.updateBackgroundTransparency(); this.updateBackgroundTransparency();
this.updateFontSize(); this.updateFontSize();
this.notifyPushToTalkSettings();
this.requestUpdate(); this.requestUpdate();
} catch (error) { } catch (error) {
console.error('Error loading settings:', error); console.error('Error loading settings:', error);
@ -832,6 +837,10 @@ export class CustomizeView extends LitElement {
resizeLayout(); resizeLayout();
} }
disconnectedCallback() {
super.disconnectedCallback();
}
getProfiles() { getProfiles() {
return [ return [
{ {
@ -944,6 +953,28 @@ export class CustomizeView extends LitElement {
this.requestUpdate(); this.requestUpdate();
} }
async handleAudioInputModeChange(e) {
this.audioInputMode = e.target.value;
await cheatingDaddy.storage.updatePreference('audioInputMode', e.target.value);
this.notifyPushToTalkSettings();
this.requestUpdate();
}
notifyPushToTalkSettings() {
if (!window.require) {
return;
}
try {
const { ipcRenderer } = window.require('electron');
ipcRenderer.send('update-push-to-talk-settings', {
inputMode: this.audioInputMode,
});
ipcRenderer.send('update-keybinds', this.keybinds);
} catch (error) {
console.error('Failed to notify push-to-talk settings:', error);
}
}
async handleThemeChange(e) { async handleThemeChange(e) {
this.theme = e.target.value; this.theme = e.target.value;
await cheatingDaddy.theme.save(this.theme); await cheatingDaddy.theme.save(this.theme);
@ -965,6 +996,7 @@ export class CustomizeView extends LitElement {
nextResponse: isMac ? 'Cmd+]' : 'Ctrl+]', nextResponse: isMac ? 'Cmd+]' : 'Ctrl+]',
scrollUp: isMac ? 'Cmd+Shift+Up' : 'Ctrl+Shift+Up', scrollUp: isMac ? 'Cmd+Shift+Up' : 'Ctrl+Shift+Up',
scrollDown: isMac ? 'Cmd+Shift+Down' : 'Ctrl+Shift+Down', scrollDown: isMac ? 'Cmd+Shift+Down' : 'Ctrl+Shift+Down',
pushToTalk: isMac ? 'Ctrl+Space' : 'Ctrl+Space',
}; };
} }
@ -1050,6 +1082,11 @@ export class CustomizeView extends LitElement {
name: 'Scroll Response Down', name: 'Scroll Response Down',
description: 'Scroll the AI response content down', description: 'Scroll the AI response content down',
}, },
{
key: 'pushToTalk',
name: 'Push-to-Talk',
description: 'Activate audio recording (OpenAI SDK only)',
},
]; ];
} }
@ -1319,6 +1356,9 @@ export class CustomizeView extends LitElement {
} }
renderAudioSection() { renderAudioSection() {
const isPushToTalkAvailable = this.aiProvider === 'openai-sdk';
const pushToTalkDisabled = !isPushToTalkAvailable;
return html` return html`
<div class="content-header">Audio Settings</div> <div class="content-header">Audio Settings</div>
<div class="form-grid"> <div class="form-grid">
@ -1331,6 +1371,28 @@ export class CustomizeView extends LitElement {
</select> </select>
<div class="form-description">Choose which audio sources to capture for the AI.</div> <div class="form-description">Choose which audio sources to capture for the AI.</div>
</div> </div>
<div class="form-group">
<label class="form-label">Audio Input Mode</label>
<select
class="form-control"
.value=${this.audioInputMode}
@change=${this.handleAudioInputModeChange}
?disabled=${pushToTalkDisabled}
>
<option value="auto">Automatic (Always Listening)</option>
<option value="push-to-talk">Push-to-Talk (Hotkey Activated)</option>
</select>
<div class="form-description">
${pushToTalkDisabled
? 'Push-to-Talk is available only with the OpenAI SDK provider.'
: this.audioInputMode === 'auto'
? 'Audio is continuously recorded and transcribed when silence is detected.'
: 'Audio recording starts when you press and hold/toggle the hotkey.'}
</div>
</div>
${this.audioInputMode === 'push-to-talk'
? html`<div class="form-description">Use the Push-to-Talk hotkey (toggle) to start/stop recording.</div>`
: ''}
</div> </div>
`; `;
} }

View File

@ -295,7 +295,7 @@ export class HelpView extends LitElement {
<span>Community & Support</span> <span>Community & Support</span>
</div> </div>
<div class="community-links"> <div class="community-links">
<div class="community-link" @click=${() => this.handleExternalLinkClick('https://cheatingdaddy.com')}> <!-- <div class="community-link" @click=${() => this.handleExternalLinkClick('https://cheatingdaddy.com')}>
<svg <svg
viewBox="0 0 24 24" viewBox="0 0 24 24"
fill="none" fill="none"
@ -312,8 +312,8 @@ export class HelpView extends LitElement {
></path> ></path>
</svg> </svg>
Website Website
</div> </div> -->
<div class="community-link" @click=${() => this.handleExternalLinkClick('https://github.com/sohzm/cheating-daddy')}> <div class="community-link" @click=${() => this.handleExternalLinkClick('https://github.com/ShiftyX1/Mastermind')}>
<svg <svg
viewBox="0 0 24 24" viewBox="0 0 24 24"
fill="none" fill="none"
@ -329,7 +329,7 @@ export class HelpView extends LitElement {
</svg> </svg>
GitHub GitHub
</div> </div>
<div class="community-link" @click=${() => this.handleExternalLinkClick('https://discord.gg/GCBdubnXfJ')}> <!-- <div class="community-link" @click=${() => this.handleExternalLinkClick('https://discord.gg/GCBdubnXfJ')}>
<svg <svg
viewBox="0 0 24 24" viewBox="0 0 24 24"
fill="none" fill="none"
@ -353,7 +353,7 @@ export class HelpView extends LitElement {
></path> ></path>
</svg> </svg>
Discord Discord
</div> </div> -->
</div> </div>
</div> </div>

View File

@ -33,6 +33,7 @@ const DEFAULT_PREFERENCES = {
selectedImageQuality: 'medium', selectedImageQuality: 'medium',
advancedMode: false, advancedMode: false,
audioMode: 'speaker_only', audioMode: 'speaker_only',
audioInputMode: 'auto',
fontSize: 'medium', fontSize: 'medium',
backgroundTransparency: 0.8, backgroundTransparency: 0.8,
googleSearchEnabled: false, googleSearchEnabled: false,

View File

@ -186,6 +186,7 @@ async function initializeAISession(customPrompt = '', profile = 'interview', lan
try { try {
await openaiSdkProvider.initializeOpenAISDK(providerConfig); await openaiSdkProvider.initializeOpenAISDK(providerConfig);
openaiSdkProvider.setSystemPrompt(systemPrompt); openaiSdkProvider.setSystemPrompt(systemPrompt);
openaiSdkProvider.updatePushToTalkSettings(prefs.audioInputMode || 'auto');
sendToRenderer('update-status', 'Ready (OpenAI SDK)'); sendToRenderer('update-status', 'Ready (OpenAI SDK)');
return true; return true;
} catch (error) { } catch (error) {
@ -325,6 +326,16 @@ function setupAIProviderIpcHandlers(geminiSessionRef) {
saveConversationTurn(transcription, response); saveConversationTurn(transcription, response);
}); });
ipcMain.on('push-to-talk-toggle', () => {
if (currentProvider === 'openai-sdk') {
openaiSdkProvider.togglePushToTalk();
}
});
ipcMain.on('update-push-to-talk-settings', (event, { inputMode } = {}) => {
openaiSdkProvider.updatePushToTalkSettings(inputMode || 'auto');
});
ipcMain.handle('initialize-ai-session', async (event, customPrompt, profile, language) => { ipcMain.handle('initialize-ai-session', async (event, customPrompt, profile, language) => {
return await initializeAISession(customPrompt, profile, language); return await initializeAISession(customPrompt, profile, language);
}); });

View File

@ -14,6 +14,8 @@ let openaiClient = null;
let currentConfig = null; let currentConfig = null;
let conversationMessages = []; let conversationMessages = [];
let isProcessing = false; let isProcessing = false;
let audioInputMode = 'auto';
let isPushToTalkActive = false;
// macOS audio capture // macOS audio capture
let systemAudioProc = null; let systemAudioProc = null;
@ -294,6 +296,18 @@ async function processAudioChunk(base64Audio, mimeType) {
const now = Date.now(); const now = Date.now();
const buffer = Buffer.from(base64Audio, 'base64'); const buffer = Buffer.from(base64Audio, 'base64');
if (audioInputMode === 'push-to-talk') {
if (!isPushToTalkActive) {
return { success: true, ignored: true };
}
// In push-to-talk mode we only buffer while active
audioChunks.push(buffer);
lastAudioTime = now;
return { success: true, buffering: true };
}
// Track first chunk time for duration-based flushing // Track first chunk time for duration-based flushing
if (audioChunks.length === 0) { if (audioChunks.length === 0) {
firstChunkTime = now; firstChunkTime = now;
@ -380,6 +394,97 @@ async function flushAudioAndTranscribe() {
} }
} }
function notifyPushToTalkState() {
sendToRenderer('push-to-talk-state', {
active: isPushToTalkActive,
inputMode: audioInputMode,
});
}
function resetRealtimeAudioBuffer() {
audioChunks = [];
firstChunkTime = 0;
lastAudioTime = 0;
if (silenceCheckTimer) {
clearTimeout(silenceCheckTimer);
silenceCheckTimer = null;
}
if (windowsTranscriptionTimer) {
clearInterval(windowsTranscriptionTimer);
windowsTranscriptionTimer = null;
}
}
function updateTranscriptionTimerForPushToTalk() {
if (audioInputMode === 'push-to-talk') {
stopTranscriptionTimer();
return;
}
if (systemAudioProc && !transcriptionTimer) {
startTranscriptionTimer();
}
}
async function setPushToTalkActive(active) {
const wasActive = isPushToTalkActive;
isPushToTalkActive = active;
if (active) {
// Starting recording - clear any old buffers
resetRealtimeAudioBuffer();
audioBuffer = Buffer.alloc(0);
console.log('Push-to-Talk: Recording started');
sendToRenderer('update-status', 'Recording...');
}
notifyPushToTalkState();
// When user stops recording in PTT mode, send audio for transcription
if (!active && wasActive && audioInputMode === 'push-to-talk') {
console.log('Push-to-Talk: Recording stopped, transcribing...');
sendToRenderer('update-status', 'Transcribing...');
// For browser-based audio (Windows)
if (audioChunks.length > 0) {
await flushAudioAndTranscribe();
}
// For macOS SystemAudioDump
if (audioBuffer.length > 0) {
await transcribeBufferedAudio(true); // Force transcription
}
sendToRenderer('update-status', 'Listening...');
}
}
async function togglePushToTalk() {
if (isPushToTalkActive) {
await setPushToTalkActive(false);
} else {
await setPushToTalkActive(true);
}
}
function updatePushToTalkSettings(inputMode) {
if (inputMode) {
audioInputMode = inputMode;
}
if (audioInputMode !== 'push-to-talk' && isPushToTalkActive) {
isPushToTalkActive = false;
}
if (audioInputMode !== 'push-to-talk') {
resetRealtimeAudioBuffer();
audioBuffer = Buffer.alloc(0);
}
notifyPushToTalkState();
updateTranscriptionTimerForPushToTalk();
}
function clearConversation() { function clearConversation() {
const systemMessage = conversationMessages.find(m => m.role === 'system'); const systemMessage = conversationMessages.find(m => m.role === 'system');
conversationMessages = systemMessage ? [systemMessage] : []; conversationMessages = systemMessage ? [systemMessage] : [];
@ -403,6 +508,7 @@ function closeOpenAISDK() {
conversationMessages = []; conversationMessages = [];
audioChunks = []; audioChunks = [];
isProcessing = false; isProcessing = false;
isPushToTalkActive = false;
// Clear timers // Clear timers
if (silenceCheckTimer) { if (silenceCheckTimer) {
@ -414,6 +520,7 @@ function closeOpenAISDK() {
windowsTranscriptionTimer = null; windowsTranscriptionTimer = null;
} }
notifyPushToTalkState();
sendToRenderer('update-status', 'Disconnected'); sendToRenderer('update-status', 'Disconnected');
} }
@ -461,11 +568,16 @@ function hasSpeech(buffer, threshold = 500) {
return rms > threshold; return rms > threshold;
} }
async function transcribeBufferedAudio() { async function transcribeBufferedAudio(forcePTT = false) {
if (audioBuffer.length === 0 || isProcessing) { if (audioBuffer.length === 0 || isProcessing) {
return; return;
} }
// In push-to-talk mode, only transcribe when explicitly requested (forcePTT=true)
if (audioInputMode === 'push-to-talk' && !forcePTT) {
return;
}
// Calculate audio duration // Calculate audio duration
const bytesPerSample = 2; const bytesPerSample = 2;
const audioDurationMs = (audioBuffer.length / bytesPerSample / SAMPLE_RATE) * 1000; const audioDurationMs = (audioBuffer.length / bytesPerSample / SAMPLE_RATE) * 1000;
@ -475,7 +587,8 @@ async function transcribeBufferedAudio() {
} }
// Check if there's actual speech in the audio (Voice Activity Detection) // Check if there's actual speech in the audio (Voice Activity Detection)
if (!hasSpeech(audioBuffer)) { // Skip VAD check in PTT mode - user explicitly wants to transcribe
if (!forcePTT && !hasSpeech(audioBuffer)) {
// Clear buffer if it's just silence/noise // Clear buffer if it's just silence/noise
audioBuffer = Buffer.alloc(0); audioBuffer = Buffer.alloc(0);
return; return;
@ -487,7 +600,9 @@ async function transcribeBufferedAudio() {
try { try {
console.log(`Transcribing ${audioDurationMs.toFixed(0)}ms of audio...`); console.log(`Transcribing ${audioDurationMs.toFixed(0)}ms of audio...`);
sendToRenderer('update-status', 'Transcribing...'); if (!forcePTT) {
sendToRenderer('update-status', 'Transcribing...');
}
const transcription = await transcribeAudio(currentBuffer, 'audio/wav'); const transcription = await transcribeAudio(currentBuffer, 'audio/wav');
@ -497,12 +612,18 @@ async function transcribeBufferedAudio() {
// Send to chat // Send to chat
await sendTextMessage(transcription); await sendTextMessage(transcription);
} else if (forcePTT) {
console.log('Push-to-Talk: No speech detected in recording');
} }
sendToRenderer('update-status', 'Listening...'); if (!forcePTT) {
sendToRenderer('update-status', 'Listening...');
}
} catch (error) { } catch (error) {
console.error('Transcription error:', error); console.error('Transcription error:', error);
sendToRenderer('update-status', 'Listening...'); if (!forcePTT) {
sendToRenderer('update-status', 'Listening...');
}
} }
} }
@ -598,6 +719,10 @@ async function startMacOSAudioCapture() {
// Convert stereo to mono // Convert stereo to mono
const monoChunk = CHANNELS === 2 ? convertStereoToMono(chunk) : chunk; const monoChunk = CHANNELS === 2 ? convertStereoToMono(chunk) : chunk;
if (audioInputMode === 'push-to-talk' && !isPushToTalkActive) {
continue;
}
// Add to audio buffer for transcription // Add to audio buffer for transcription
audioBuffer = Buffer.concat([audioBuffer, monoChunk]); audioBuffer = Buffer.concat([audioBuffer, monoChunk]);
@ -643,7 +768,7 @@ async function startMacOSAudioCapture() {
}); });
// Start periodic transcription // Start periodic transcription
startTranscriptionTimer(); updateTranscriptionTimerForPushToTalk();
sendToRenderer('update-status', 'Listening...'); sendToRenderer('update-status', 'Listening...');
@ -651,6 +776,10 @@ async function startMacOSAudioCapture() {
} }
function startTranscriptionTimer() { function startTranscriptionTimer() {
// Don't start auto-transcription timer in push-to-talk mode
if (audioInputMode === 'push-to-talk') {
return;
}
stopTranscriptionTimer(); stopTranscriptionTimer();
transcriptionTimer = setInterval(transcribeBufferedAudio, TRANSCRIPTION_INTERVAL_MS); transcriptionTimer = setInterval(transcribeBufferedAudio, TRANSCRIPTION_INTERVAL_MS);
} }
@ -682,6 +811,8 @@ module.exports = {
sendImageMessage, sendImageMessage,
processAudioChunk, processAudioChunk,
flushAudioAndTranscribe, flushAudioAndTranscribe,
togglePushToTalk,
updatePushToTalkSettings,
clearConversation, clearConversation,
closeOpenAISDK, closeOpenAISDK,
startMacOSAudioCapture, startMacOSAudioCapture,

View File

@ -186,6 +186,10 @@ ipcRenderer.on('update-status', (event, status) => {
cheatingDaddy.setStatus(status); cheatingDaddy.setStatus(status);
}); });
ipcRenderer.on('push-to-talk-toggle', () => {
ipcRenderer.send('push-to-talk-toggle');
});
async function startCapture(screenshotIntervalSeconds = 5, imageQuality = 'medium') { async function startCapture(screenshotIntervalSeconds = 5, imageQuality = 'medium') {
// Store the image quality for manual screenshots // Store the image quality for manual screenshots
currentImageQuality = imageQuality; currentImageQuality = imageQuality;

View File

@ -9,6 +9,7 @@ let windowResizing = false;
let resizeAnimation = null; let resizeAnimation = null;
const RESIZE_ANIMATION_DURATION = 500; // milliseconds const RESIZE_ANIMATION_DURATION = 500; // milliseconds
function createWindow(sendToRenderer, geminiSessionRef) { function createWindow(sendToRenderer, geminiSessionRef) {
// Get layout preference (default to 'normal') // Get layout preference (default to 'normal')
let windowWidth = 1100; let windowWidth = 1100;
@ -155,6 +156,7 @@ function getDefaultKeybinds() {
scrollUp: isMac ? 'Cmd+Shift+Up' : 'Ctrl+Shift+Up', scrollUp: isMac ? 'Cmd+Shift+Up' : 'Ctrl+Shift+Up',
scrollDown: isMac ? 'Cmd+Shift+Down' : 'Ctrl+Shift+Down', scrollDown: isMac ? 'Cmd+Shift+Down' : 'Ctrl+Shift+Down',
emergencyErase: isMac ? 'Cmd+Shift+E' : 'Ctrl+Shift+E', emergencyErase: isMac ? 'Cmd+Shift+E' : 'Ctrl+Shift+E',
pushToTalk: isMac ? 'Ctrl+Space' : 'Ctrl+Space',
}; };
} }
@ -164,6 +166,10 @@ function updateGlobalShortcuts(keybinds, mainWindow, sendToRenderer, geminiSessi
// Unregister all existing shortcuts // Unregister all existing shortcuts
globalShortcut.unregisterAll(); globalShortcut.unregisterAll();
const prefs = storage.getPreferences();
const audioInputMode = prefs.audioInputMode || 'auto';
const enablePushToTalk = audioInputMode === 'push-to-talk';
const primaryDisplay = screen.getPrimaryDisplay(); const primaryDisplay = screen.getPrimaryDisplay();
const { width, height } = primaryDisplay.workAreaSize; const { width, height } = primaryDisplay.workAreaSize;
const moveIncrement = Math.floor(Math.min(width, height) * 0.1); const moveIncrement = Math.floor(Math.min(width, height) * 0.1);
@ -343,6 +349,18 @@ function updateGlobalShortcuts(keybinds, mainWindow, sendToRenderer, geminiSessi
console.error(`Failed to register emergencyErase (${keybinds.emergencyErase}):`, error); console.error(`Failed to register emergencyErase (${keybinds.emergencyErase}):`, error);
} }
} }
// Register push-to-talk shortcut (OpenAI SDK only, gated by preferences)
if (keybinds.pushToTalk && enablePushToTalk) {
try {
globalShortcut.register(keybinds.pushToTalk, () => {
sendToRenderer('push-to-talk-toggle');
});
console.log(`Registered pushToTalk (toggle): ${keybinds.pushToTalk}`);
} catch (error) {
console.error(`Failed to register pushToTalk (${keybinds.pushToTalk}):`, error);
}
}
} }
function setupWindowIpcHandlers(mainWindow, sendToRenderer, geminiSessionRef) { function setupWindowIpcHandlers(mainWindow, sendToRenderer, geminiSessionRef) {