PRD DOCUMENT:
Understood! I will now compile all research, insights, and specifications into a final, all-in-one dossier that includes:
- Executive Summary & Branding (DeepType by Empathy Labs, under the DeepHustle.ai project)
- Ultimate Pitch & Pitch Deck (Both in Markdown format and separate file if possible)
- Comprehensive PRD (Product Requirements Document)
- SWOT Analysis & Gap Analysis
- Feature Breakdown & Roadmap (Token-based deep research breakdown instead of timeframes)
- Accessibility-first UX/UI Guidelines (ARIA descriptions, alt-text, voice-first UI details)
- Full Codebase (Repository structure, written-out files, ASCII art headers for inclusivity, inline comments for easy comprehension)
I'll ensure it is structured optimally, with ultra-clear formatting using Markdown syntax for headers, sections, and technical clarity.
I'll get back with the fully compiled dossier shortly.
Vision Statement: Empower every individual – regardless of vision or ability – to master typing and digital communication through empathetic AI. DeepType’s mission is to be an AI-powered typing tutor that enables blind and visually-impaired users to achieve keyboard proficiency, opening doors to education, employment, and independence. With a focus on “accessible by design”, DeepType bridges the digital divide by turning the keyboard into an inclusive gateway for all.
Naming & Brand Identity: DeepType is the product name, reflecting the use of deep learning (“Deep”) to revolutionize touch typing (“Type”). It exists under the DeepHustle.ai project umbrella – an innovation initiative by Empathy Labs. Empathy Labs is the organization’s brand, emphasizing user-centric design and emotional intelligence in tech. The naming strategy ensures clarity: Empathy Labs conveys trust and compassion, DeepHustle.ai signals cutting-edge AI innovation, and DeepType itself clearly describes the product’s function. Together, they present a unified brand with a heart (Empathy) and a brain (Deep AI technology).
Market Positioning: DeepType is positioned at the intersection of assistive technology and ed-tech. It targets a significant underserved market: the millions of people with visual impairments who require effective tools to learn keyboard skills. Globally, an estimated 43 million people are blind and 295 million have moderate-to-severe visual impairment (Bridging the Digital Disability Divide: Determinants of Internet Use among Visually Impaired Individuals in Thailand). Many of these individuals rely on screen readers (like JAWS or NVDA) to use computers, yet accessible typing training tools are scarce and outdated. DeepType aims to be the premier digital solution for accessible typing education, much like a “Duolingo for typing” tailored to blind and low-vision users. By leveraging AI and modern UX practices, it stands out from legacy offerings. The brand promise is empowerment: DeepType gives users the confidence to navigate the digital world through touch typing. This resonates strongly in a market where independence and employment are tightly linked to technological skills – especially when over 70% of blind and visually-impaired adults face unemployment, partly due to limited access to training (Employment Barriers for the Blind and Visually Impaired — World Services for the Blind). DeepType will be marketed as an inclusion-driven innovation: not just another typing app, but a social impact tool that combines empathy with technology to transform lives.
Imagine pressing a key and unlocking a world of opportunity. DeepType is an AI-powered tutor that speaks, listens, and understands, turning the once tedious task of learning to type into an empowering journey for those who need it most. DeepType by Empathy Labs is the first accessibility-first typing coach, designed for the blind and visually impaired, but delightful for everyone. It’s like having a personal coach who never gets tired, never judges, and is available 24/7 – “the Swiss Army Knife of typing tutors” for all abilities.
“Because everyone deserves a voice and a keyboard.”
- 🎯 Built for Accessibility from Day One: Unlike generic typing programs, DeepType is built ground-up for blind and low-vision users. Every feature – from audio guidance to high-contrast visuals – follows accessibility best practices (WCAG 2.1, ARIA roles, etc.). This isn’t a retrofit; it’s a revolution. We don’t just meet compliance, we embrace “accessibility-first” design as our core ethos.
- 🗣 Voice-First and Hands-On: DeepType is a voice-interactive tutor. It speaks instructions and encouragement in natural language, and listens for commands. Users can navigate the entire learning experience with speech or a single button. This voice-first interaction means eyes-free, hassle-free learning – perfect for blind users and also convenient for anyone (think learning while keeping your eyes on another task).
- 🤖 Adaptive AI Coach: At the heart of DeepType is cutting-edge AI (including large language models) that adapts in real-time to the learner. Make a mistake? DeepType’s AI detects the pattern and offers personalized feedback and exercises. Struggling with certain letters? The AI dynamically adjusts the lesson to give extra practice where needed. No static lessons or one-size-fits-all curriculum – DeepType learns the learner.
- 🌐 Cross-Platform Convenience: Available on web, desktop, and mobile without separate development silos. Our tech stack enables writing code once and deploying everywhere, ensuring a consistent experience whether the user is on a Windows PC with a screen reader, a Mac, or a mobile device. DeepType even works offline for desktop (via an app) so it’s reliable in classrooms and areas with limited internet.
- 💡 Competitive Edge over Legacy Tools: Traditional solutions like Talking Typing Teacher rely on pre-recorded voices and fixed sc (Talking Typing Teacher | BoundlessAT.com)L146】. DeepType, however, uses AI voices and natural language generation to provide conversational, context-aware guidance. Unlike screen readers (JAWS/NVDA) that simply echo keys, DeepType teaches with a curriculum, tracks progress, and gamifies the experience. And unlike costly enterprise software, DeepType aims to be affordable or free for end-users (with a sustainable business model backing it – see below).
For a visually impaired person, learning to type is not just a skill – it’s a lifeline to the digital world. Yet current options are bleak: decades-old software with robotic feedback, expensive licenses, or relying on general screen readers that don’t teach. The result? Many blind individuals struggle with slow typing or never learn, limiting their potential in school and the workplace. Unmet need: an engaging, effective, and affordable way to learn keyboarding without sight.
Opportunity: DeepType sits at the convergence of two rising tides – the advancement of AI in education, and the push for digital inclusion. Recent breakthroughs show the promise of AI in assistive tech (e.g., Be My Eyes integrating OpenAI’s GPT-4 to describe images for blind (Introducing Be My AI (formerly Virtual Volunteer) for People who are Blind or Have Low Vision, Powered by OpenAI’s GPT-4)1-L4】). Yet, no one has applied such AI prowess to typing education for the blind. We have the first-mover advantage to capture this niche and expand it to a broader audience (sighted users also benefit from voice-assisted, hands-free learning – think driving, or dyslexic learners using multi-sensory feedback).
DeepType can become the gold standard for accessible skills training, starting with typing and potentially expanding to other digital literacy skills. By solving a deeply specific problem with excellence, we build trust and brand loyalty in the assistive tech community.
DeepType’s business model balances social impact with sustainability:
- Freemium Core: The base product (core typing lessons and accessibility features) is free for individual users, ensuring that cost is never a barrier for those who need it most. This echoes the strategy of NVDA, the free screen reader that rapidly gained global adoption – now used by ~65% of screen reader (WebAIM: Screen Reader User Survey #10 Results)1-L4】, surpassing its expensive rival JAWS.
- B2B and Institutional Licensing: Revenue is generated by offering premium plans to schools, rehabilitation centers, and enterprises. For example:
- Education Edition: Schools for the blind or K-12 special education programs can subscribe to a managed DeepType Classroom package, which includes teacher dashboards, student progress analytics, and custom lesson creation. These institutions often have funding or grants for assistive technology and will pay for a solution that demonstrably improves student outcomes.
- Corporate Training & CSR: Companies aiming to hire and upskill visually-impaired employees (or retrain workers who lost vision) can license DeepType for professional use. Additionally, corporations could sponsor DeepType deployments as part of their Corporate Social Responsibility programs, effectively sponsoring free licenses for users in developing regions (a model similar to how some companies sponsor NVDA development via donations).
- Premium Features for Power Users: While the basic tutor is free, advanced features could be behind a modest subscription. Examples: an AI “Tutor Plus” that users can converse with to get career advice or advanced typing drills, cloud sync of personal progress across devices, or the ability to generate custom practice content (e.g., “I want to practice typing this specific book or code snippet” – the AI prepares a lesson). These power features cater to enthusiasts or professionals and can justify a monthly fee.
- Grants and Partnerships: Given its mission-driven nature, DeepType will aggressively seek partnerships with nonprofits (e.g., American Foundation for the Blind) and technology grants. These partnerships not only provide funding but also endorsements and user base. For instance, a foundation might fund the development of a new Braille-display integration feature, or a government agency might deploy DeepType in digital literacy initiatives.
- Open-Source Community Edition: A portion of DeepType’s code (especially accessibility utilities) could be open-sourced to encourage community contributions and transparency. This fosters goodwill and potentially free improvements, while the core AI tutoring logic and premium services remain proprietary for monetization. It’s similar to how some companies open-source their SDKs but sell hosted services.
Monetization with Empathy: At all times, our strategy ensures that the end-user who most needs DeepType (a blind learner) is never left out due to cost. Revenue streams target those who can pay (schools, orgs, sponsors) to subsidize those who cannot. This aligns with our brand values (Empathy Labs) and creates a positive feedback loop: more users -> more data -> better AI -> more compelling product -> more paying partners.
DeepType’s voice and tone are motivational, friendly, and inclusive. We use microcopy (short pieces of guiding text or audio) to keep users engaged and encouraged:
- Onboarding greeting: “Welcome to DeepType – where your fingers learn to sing on the keyboard. Let’s unlock your potential, one key at a time!”
- When the user makes a mistake: “Whoops, that didn’t match. No worries – try again, I’m right here with you.” (No scolding, always supportive.)
- Success message: “Great job! You nailed that. Ready for the next challenge?”
- Idle encouragement (if user is inactive): “I’m still here. Whenever you’re ready, press the space bar and we’ll continue your journey.”
- Interface labels use empowering language: The “Start” button might say “Begin Your Journey”, the help section: “Need a hand? (Press H)” spoken as “You can press H anytime for help – I’ve got tips for you.”
Persuasive messaging also highlights outcomes: After a progress milestone, DeepType might say, “You’ve improved your speed by 20%! Imagine writing emails or coding with this speed. You’re on fire!” This connects the practice to real-life benefits, sustaining motivation.
Throughout our copy, we emphasize independence, confidence, and fun. Typing is framed not as a tedious skill, but as liberation. “Your voice in the digital world” is a recurring theme – since typing enables one to communicate just like sighted peers. Our branding tagline could be: “DeepType: Touch the keys, touch the world.” This resonates emotionally and sticks in memory.
DeepType proudly wears the “accessibility-first” badge. In pitches and materials, we make it clear this is not an afterthought or add-on. For example, the product website and pitch deck prominently state: “Designed for accessibility from scratch – not retrofitted.” We highlight features like:
- Voice Guided Learning: All lessons are delivered through clear speech, so a user with zero vision can participate without any setup. “If you can hear, you can learn with DeepType.”
- One-Button Navigation: The entire UI can be driven with a single key or switch device. This means even users with motor challenges or cognitive overload can navigate step-by-step. (For instance, an on-screen highlight moves through options and the user hits the one button to select – a common approach for switch accessibility.)
- High Contrast & Large Print: For low-vision users, DeepType offers bold, large text and high-contrast color themes out of the box. Screenshots in the pitch deck demonstrate a stark black-and-white interface with >AAA contrast, showing our dedication to usable design for low vision.
- Screen-Reader Friendly: DeepType works harmoniously with screen readers. However, it often won’t need a screen reader’s assistance because it is the screen reader for its own interface. (E.g., the app announces “Menu: Start, Settings, Exit – use arrow keys or say ‘Start’” etc.) This dual approach (integrating with or without external screen reader) means flexibility for user preference.
- Multimodal Input: The user can speak commands, type responses, or even use a Braille display in future iterations. DeepType positions itself as device-agnostic. If you can press any button or utter a word, you can control it. Such flexibility is rare and a strong selling point in assistive tech.
By positioning accessibility at the forefront, we also capture the interest of allies: educators, parents, occupational therapists, and diversity officers who seek tools that champion inclusive design. DeepType isn’t just a tool; it’s a statement that technology should serve everyone. Our pitch emphasizes this higher purpose, which not only differentiates us but also often sways decision-makers (e.g., a school district choosing between a generic typing software vs. DeepType will see that only DeepType was built for their blind students’ needs).
(Below is a markdown representation of the Pitch Deck for DeepType. Each slide is described with its title and key bullet points.)
Slide 1: Title & Vision
DeepType by Empathy Labs
The AI-Powered Typing Tutor for All
- Vision: Empower every individual to communicate digitally, regardless of vision or ability.
- Tagline: Touch the keys, touch the world.
Slide 2: The Problem
- 285 million people have visual impairments, 43 million blind worl (Bridging the Digital Disability Divide: Determinants of Internet Use among Visually Impaired Individuals in Thailand)1-L4】.
- Typing = essential skill for education & jobs, yet no effective way to learn if you can’t see the keyboard.
- Legacy solutions are outdated, hard to access, or extremely expensive. (E.g., 70% unemployment among blind partly due to lack of tech s (Employment Barriers for the Blind and Visually Impaired — World Services for the Blind)-L77】.)
- Opportunity: Huge gap in assistive education – time to innovate.
Slide 3: The Solution
- DeepType – an AI tutor that speaks, listens, and adapts to the user.
- Learn to type through interactive audio lessons, real-time feedback, and personalized practice.
- Use it on any device: phone, tablet, computer – no sight required.
- Outcome: Blind/low-vision users gain digital independence (sending emails, coding, writing) by mastering typing.
Slide 4: Why Now? (Market Timing)
- AI & Accessibility Renaissance: AI is enabling new assistive tech (e.g., GPT-4 Vision in Be My (Introducing Be My AI (formerly Virtual Volunteer) for People who are Blind or Have Low Vision, Powered by OpenAI’s GPT-4)1-L4】, Seeing AI by Microsoft combining visio (Seeing AI: New Technology Research to Support the Blind and Visually Impaired Community - Microsoft Accessibility Blog)L137】). It’s proven that AI can make tech more inclusive.
- Tech Convergence: Speech recognition, text-to-speech, and language models are all advanced and affordable in 2025 – enabling our solution.
- Remote Learning Boom: Post-2020, digital learning tools are mainstream. Accessibility in e-learning is a highlighted need. Institutions seek solutions like DeepType to include all learners.
Slide 5: Key Features (What makes DeepType special)
- 🗣 Voice-Guided Lessons: Friendly voice instructions and feedback. Hands-free learning.
- 🤖 Adaptive Learning AI: Adjusts difficulty in real-time, generates custom exercises on the fly. No one gets left behind or bored.
- 🎮 Gamified & Motivating: Progress badges, fun sound cues, and challenges keep learners engaged. It’s fun!
- 💻 Cross-Platform: Use it via web browser, dedicated desktop app, or on mobile. Consistent experience, cloud-synced progress.
- ♿ Accessibility at Core: High-contrast UI, ARIA labels, one-switch mode, works with screen readers – built to WCAG standards.
Slide 6: Competitive Landscape
- Talking Typing Teacher (Legacy software): Audio-based lessons with recorded voice. Cons: static content, no AI, Windows-only, n (Talking Typing Teacher | BoundlessAT.com)137-L146】.
- TypeAbility (JAWS add-on): Teaches typing within JAWS screen reader. Cons: requires expensive JAWS, dated curriculum.
- NVDA/JAWS Screen Readers: Not actually teaching tools – they identify keys but don’t provide structured lessons (only a “keyboard help” mode). Also, JAWS costs (A New Way to Obtain JAWS and ZoomText | Accessworld | American Foundation for the Blind)129-L137】 or $1200 perpetual, pricing many out.
- Mainstream Typing Apps (e.g., Mavis Beacon): Visual-centric, unusable for blind users; no voice guidance.
- Others (Seeing AI, Be My Eyes): Solve different problems (environment perception, not typing).
DeepType’s Edge: No one offers an AI-driven, accessibility-first typing tutor. We combine the strengths of these (voice output, structured curriculum) and add modern AI adaptivity and multi-platform support. We’re in a league of our own – a blue ocean in assistive ed-tech.
Slide 7: Business Model
- Free for Users: Core app free for end-users (removing adoption barriers, like NVDA did with screen readers).
- B2B/Institutional Sales: Revenue from schools, rehab centers, libraries – annual licenses including admin dashboards & priority support.
- Sponsored Programs: Partner with nonprofits/corporates to sponsor deployments (CSR funding covers costs for communities in need).
- Premium Add-ons: Optional subscription for advanced personal features (e.g., conversational practice buddy, specialty courses like “coding keyboard shortcuts” or “Excel navigation”).
- Scaling Plan: Start in assistive tech niche -> expand features for general audience (e.g., sighted people using voice tutor while multitasking) -> position as a universal typing tutor (with accessibility as our differentiator and moral backbone).
Slide 8: Roadmap & Milestones
- Q1: MVP launch (Web app beta) – Core lessons A-Z, basic voice feedback. Collect user feedback.
- Q2: Desktop/Mobile apps release (wrapped PWA). Add voice command navigation, cloud sync via Supabase.
- Q3: AI Adaptive Engine v2 – integrate GPT-based error analysis and live coaching, plus initial multi-language support (type in English, Spanish, etc.).
- Q4: Institutional Dashboard – analytics for classrooms, content editor for teachers. Begin pilot programs with 3 blind schools.
- Year 2: Scale to 10K+ users, GPT-4 (or Gemini) powered conversational tutor (“Ask DeepType anything”), Braille display integration, and pursue Series A funding for growth.
Slide 9: Team & Empathy Labs
- Founders: [Your Name] – (Background in AI and personal connection to accessibility), [Other Name] – (EdTech veteran, created learning curricula).
- Empathy Labs – Innovation lab focusing on human-centric AI. DeepHustle.ai is our project incubator blending deep learning with human empathy.
- Advisors include accessibility experts (e.g., a blind tech lead, special-ed teacher) and AI researchers. Backed by [Mentors/Accelerator if any].
- Our superpower: We combine technical expertise with lived experience insights – building with the community, not just for them.
Slide 10: The Ask & Closing
- Seeking: [If pitching to investors: $X seed funding] OR [If pitching for partnership: pilot opportunities, introductions, etc.].
- This will fuel development of advanced features (real-time voice AI, more languages) and allow us to distribute DeepType to those who need it globally.
- Impact: A successful DeepType means thousands of people will gain skills, confidence, and jobs that were previously out of reach. It’s not just an investment in a product, it’s an investment in digital equality.
- Join us in typing a new chapter of inclusion.
- Thank You.
(Contact: your.email@empathylabs.ai | www.deephustle.ai/deeptype)
(End of Pitch Deck)
Note: Images and graphics (previously planned) are omitted in this text version, but would include screenshots of the app interface, icons representing voice/AI, and an illustrative user persona to humanize the story.
DeepType’s product requirements focus on delivering a fully accessible, intelligent typing tutor. The core functionalities include:
- Interactive Audio Lessons: The system provides step-by-step typing lessons through audio prompts. User Story: “As a blind beginner, I want the tutor to tell me which fingers and keys to use so I can learn touch typing without sight.”
- The lesson content ranges from learning home row keys to complex sentences. Each lesson is articulated by a pleasant human-like voice (text-to-speech).
- The user can control playback: say “repeat” or press a key to hear instructions again. The lesson flows at the user’s pace, waiting for input and giving feedback.
- Real-time Feedback and Error Correction: As the user types, DeepType instantly checks input. If the user presses the correct key, they get positive feedback (a chime sound or voice “Good!”). If wrong, the system signals it (buzz sound or “Oops, try again, that was X, not Y”). User Story: “If I mistype, I want immediate correction so I know what to fix.”
- The system identifies which key was pressed in error and can speak its name (“You pressed K, but the target was F”). This reinforces learning of key positions.
- Errors are logged to adapt difficulty (e.g., if user consistently struggles with a particular hand or letter, the AI will introduce extra practice for it).
- Voice Commands & Navigation: DeepType supports a set of voice commands to navigate the app hands-free. User Story: “As a user who can’t see the screen, I want to be able to speak commands like ‘next lesson’ or ‘main menu’ to control the app.”
- Key commands: “Start lesson one”, “Pause”, “Resume”, “Repeat”, “Menu”, “Help”. Alternatively, a single keyboard key (like the spacebar or a special “DeepType Key”) can cycle through options and select them, accommodating one-switch users.
- The app includes a voice-controlled onboarding/tutorial where it teaches the user how to use these voice or single-key controls (with practice, e.g., “Press the spacebar to select an option now”).
- User Progress Tracking: Each user has a profile with their lesson progress, typing speed (WPM), and accuracy stats. User Story: “I want to track my improvement over time and resume where I left off.”
- Achievements or badges are unlocked for milestones (e.g., 5 lessons completed, first 30 WPM speed, etc.) to encourage progress.
- Data like last lesson completed, current difficulty level, and custom preferences (voice speed, theme) are saved (likely in a cloud database if logged in, or locally if offline).
- Adaptive Lesson Planning (AI-driven): The content adapts to the user’s performance. User Story: “If I’m finding something too easy or too hard, I want the tutor to adjust so I’m always appropriately challenged.”
- If the user breezes through lessons with high accuracy, the AI might skip some redundant practice or suggest a more advanced exercise (e.g., move from letters to words sooner).
- If the user struggles, the AI can inject an extra practice session focusing on troublesome keys, or slow down the pace of new key introductions.
- This is powered by a rules engine enhanced with machine learning: initially simple (if accuracy < X, repeat lesson), later more complex using pattern recognition (e.g., “user often swaps S and A – maybe do a drill contrasting those letters”).
- Multimodal Teaching Aids: While primarily audio, DeepType can optionally display visual aids for those who have some vision:
- An on-screen keyboard graphic highlighting the key to press (with high contrast colors).
- Large-print text of the current exercise (e.g., the word or sentence to type) in case low-vision users want to follow along visually.
- These aids have proper alt-text or ARIA descriptions so even if a blind user accidentally focuses them, the screen reader will say something like “Visual keyboard illustration, current key: F” rather than leaving them in the dark.
- Content Variety & Gamification: To keep engagement, the PRD includes mini-games and varied content:
- Typing games that are audio-based (for example, an audio version of “falling words”: the voice says a random word and the user must type it before another word “falls” – represented by a ticking timer or rising tone; this becomes a score challenge).
- Fun exercises like typing the lyrics of a song (with the music playing in the background if they want), or typing to control a simple audio game (e.g., “hit the spaceship by typing the letter that corresponds to its coordinate” – this would be described via audio).
- These are stretch goals, but mentioned in PRD to ensure extensibility of the content engine to support game modes in addition to linear lessons.
- Accessibility Compliance & Settings: DeepType will meet or exceed all relevant accessibility guidelines.
- Settings include: TTS voice selection (if multiple voices available), speech rate adjustment, verbosity level (novice mode might speak very verbosely including each character, while advanced mode might assume more and speak less), high contrast toggle, and an option to integrate with existing screen reader (for those who prefer their JAWS voice for consistency – the app could output via screen reader instead of its own TTS).
- It will also have an “Audio-only mode” which ensures that all necessary information is spoken without requiring any visual element (for pure blind usage with no screen).
- Conversely a “Visual Assist mode” can provide on-screen hints for sighted or low-vision supporters (like a parent or teacher watching can see what’s going on). This dual-output approach ensures both blind users and sighted helpers get what they need.
These core features are driven by our guiding principle: teach typing in a way that feels natural and supportive to someone who cannot see. The PRD ensures that any feature that involves output must have an audible form, and any input must be possible via keyboard or voice (not just mouse/touch).
To implement the above, DeepType leverages a modern and flexible technical stack, stitching together AI services, cross-platform frameworks, and accessible UI libraries:
- Frontend: We plan to use Web technologies (HTML, CSS, TypeScript/JavaScript), likely within a framework like React for modularity. The UI will be a web app at its core for easy cross-platform reach. We might employ frameworks or libraries that aid in building accessible components (e.g., Reach UI or ARIA toolkit for React, which provide pre-built accessible widgets).
- Lovable.dev & Bolt.new (AI-assisted development): To accelerate development, we will experiment with AI-powered coding tools. Lovable.dev can scaffold a project quickly – generating a React/TypeScript app with our described (Lovable.dev - AI Web App Builder | Refine)†L71-L79】. It acts as an “AI co-engineer,” setting up the skeleton (routing, basic pages, API integrations with Supabase, etc.) from natural language specs. Bolt.new, an AI-powered development en (Bolt vs. Cursor: Which AI Coding App Is Better?)†L62-L70】, will be used to prototype interface components rapidly. For example, by prompting “Create a high-contrast landing page with a ‘Start Lesson’ button and our logo,” we can get boilerplate that we then refine. These tools don’t replace coding but speed up initial setup and repetitive tasks, allowing the team to focus on complex logic. Rationale: This fits our lean startup approach – we leverage AI to build AI software faster.
- Backend: The backend is relatively lightweight – mostly to handle persistent data and AI integration:
- Supabase (PostgreSQL database + Auth): We choose Supabase as our primary backend platform. It provides an open-source Firebase alternative with a Postgres database, restful APIs, real-time subscriptions, and user authentication out-o (Best backends for FlutterFlow: Firebase vs Supabase vs Xano)†L33-L40】. This means we can store user profiles, progress logs, achievements, etc., with minimal backend code. Supabase’s auth will manage sign-ups (with email/password or OAuth if needed) and we can secure data (each user’s data is private, etc).
- Supabase’s real-time capabilities might be used to sync live progress (for instance, if a teacher is remotely monitoring a student’s lesson, the keystrokes per minute or errors could stream to a dashboard in real-time).
- We also plan to use Supabase’s storage (for any audio clips or if we allow users to upload custom text content to practice on).
- AI Services: A cornerstone is integration of advanced AI:
- OpenAI API: We will utilize OpenAI’s APIs for a couple of purposes. One is the GPT-4 (or GPT-3.5) model for language tasks – e.g., generating dynamic practice sentences or engaging trivia about what the user is typing (imagine as they practice, the AI shares a fun fact: “Did you know the word ‘TYPE’ originated from…” to keep things interesting). Another is potential use of OpenAI’s new multimodal/voice features. For instance, OpenAI’s Whisper model for speech-to-text and their text-to-speech for voice output. The OpenAI Whisper API can transcribe audio (like user’s spoken commands) with high accuracy. While it doesn’t yet support true streaming real-time transcription in the API, we can chunk audio every few seconds to simulate (Transcribe via Whisper in real-time / live - API - OpenAI Developer Community) (Transcribe via Whisper in real-time / live - API - OpenAI Developer Community)†L23-L31】. We’ll design our voice command system around this limitation (short commands which can be captured in <5s chunks, so the delay is negligible).
- Google Gemini 2.0 (Flash API): As we future-proof, we note Google’s upcoming Gemini 2.0 from DeepMind, which promises multimodal support and lightning-fast (Google | Gemini 2.0 Flash API - Kaggle)5†L5-L13】. Once available, this could power the adaptive coach in DeepType, possibly providing even quicker or more nuanced feedback than GPT. For example, Gemini might be used to analyze a user’s pattern of mistakes in depth or run a real-time conversation mode with the user about their progress. The “Flash API” suggests real-time or streaming capabilities, which could help in creating an interactive agent that feels live. The PRD includes support to integrate Gemini as an alternative or supplement to OpenAI (keeping our architecture model-agnostic so we can plug in the best AI).
- Text-to-Speech (TTS): We have options here: use the Web Speech API in browsers for on-device TTS (which leverages system voices) and/or use cloud TTS (e.g., Google Cloud Text-to-Speech or Amazon Polly) for consistent high-quality voices across platforms. For simplicity and offline support on desktop, we will use on-device TTS where available. On mobile, we can use the platform’s native screen reader voice via an API or our own integrated voice. The PRD requirement is that the voice must be clear and preferably natural (neural voices). We’ll research which approach yields the best combination of latency and quality. It might be viable to ship a pre-trained lightweight TTS model for offline (for example, Coqui TTS or similar open-source) for an offline desktop mode, while using cloud for online mode.
- Cross-platform Frameworks: To achieve “web, desktop, mobile-native without re-coding”, we outline a cross-platform strategy:
- The core is a Progressive Web App (PWA) built in React/TypeScript. This runs in any modern browser (fulfilling the web requirement).
- Desktop: We will use Tauri or Electron to wrap the web app into a desktop application for Windows/Mac/Linux. Tauri is preferred for its lightweight, security, and ability to create native binaries with a web (tauri-apps/tauri: Build smaller, faster, and more secure desktop and ...)1†L5-L13】. With Tauri, our web code becomes a desktop app that can access local resources (e.g., the microphone for voice input) and run offline. Tauri allows building installers and an app experience that integrates with the OS (start menu, etc.), all while reusing 99% of our code.
- Mobile: For mobile native, we have a couple of paths. We can package the PWA as an app using something like Capacitor (from Ionic) which allows deploying web code as native iOS/Android apps with access to native APIs (for mic, vibration, etc.). Alternatively, React Native could be used with the same business logic, but that’s a separate codebase unless we unify via something like Expo’s web support. Given our resource constraints, we lean towards using the web PWA directly on mobile (modern iOS and Android browsers support many necessary APIs). We ensure the PWA is installable (Add to Home Screen) and works offline after first load (using Service Workers for caching). This way, a user can “install” DeepType from the browser and use it like a native app. In parallel, if needed for App Store presence, we can wrap the PWA in a minimal native shell (Capacitor) and publish it. This still avoids rewriting logic.
- Shared Code: We will maintain a single codebase for logic. For any platform-specific code (like file access on desktop, or different speech API on web vs mobile), we use conditional wrappers or services. E.g., an
AudioInputService
interface with implementations: one uses the browserwebkitSpeechRecognition
(for Chrome), another uses the Cordova/Capacitor plugin for speech on mobile, another perhaps uses an Electron node module or the OS’s Speech API for desktop. The app at runtime picks the appropriate one. This design is specified in the PRD to ensure we prep for multi-environment deployment.
- Cursor AI & Vercel v0.dev (Dev Tools): While not part of the product delivered to users, internally we incorporate tools like Cursor (an AI-assisted code editor) for improving our development speed and code quality. Cursor can help write repetitive code or tests by conversing with the codebase. V0.dev by Vercel is another tool that can generate UI components from descriptions, particularly for React + Tailwi (Vercel v0.dev: A hands-on review · Reflections - Ann Catherine Jose)3†L5-L13】. We will use v0.dev for designing accessible UI components; for example, describe “a large high-contrast toggle switch with ARIA roles” and refine the output. This ensures even the development process keeps accessibility in mind (the AI is likely to include ARIA attributes if we specify).
- Realtime Collaboration & API: If we allow a teacher or remote volunteer to monitor or assist a session, we might use WebSockets or Supabase’s realtime channels. This is a potential feature for v2, but the architecture includes the ability (maybe via a Node.js server using Socket.io or Supabase realtime) to broadcast events (like “user completed Lesson 2”) to a connected dashboard.
- Testing and CI: The PRD requires that we integrate automated accessibility testing in our CI pipeline. Tools like axe-core (by Deque Systems) can be used to run accessibility audits on our interface to catch issues (like missing labels) early. Also, we plan to include unit tests for critical logic (especially the adaptive algorithm – we can write tests simulating sequences of mistakes and ensuring the lesson adaptation logic does what we expect).
In summary, the technical stack is a blend of AI services (OpenAI, possibly DeepMind), cloud backend (Supabase), and web-centric cross-platform frameworks (React, Tauri/Capacitor). This allows us to meet the broad requirements: intelligent behavior, data persistence, and deployment on multiple platforms with one codebase. By using AI coding aids (Lovable.dev, Bolt.new, Cursor), we also significantly reduce development time, enabling a small team to achieve results comparable to a much larger team – a critical advantage for a startup.
To ensure DeepType runs seamlessly on web, desktop, and mobile without duplicating effort, the architecture is carefully planned:
Overall Architecture:
At its core, DeepType is a single-page application (SPA) that communicates with cloud services as needed. Think of it as a layered design:
- UI Layer: React components (or similar) render the interface and manage interactions. This layer cares about things like showing a virtual keyboard, displaying text, capturing keypress or microphone input.
- Logic Layer: This is the heart – lesson logic, state management (could use something like Redux or React Context for state), and the adaptive algorithm. It’s abstracted so it doesn’t directly depend on browser APIs – meaning it can run in any JS environment (browser, Node, React Native).
- Platform Services Layer: These are small modules that handle platform-specific functions: e.g.,
SpeechInputService
,SpeechOutputService
,StorageService
. The app, instead of callingwindow.speechSynthesis.speak
directly, will call an interface methodspeak(text)
. In the web build, that maps towindow.speechSynthesis
; in a mobile build, it could call a native plugin; in desktop, maybe it calls an OS-level TTS or uses the web one as well. Similarly for listening: on web, useSpeechRecognition
API if available; on desktop, possibly the OS dictation API or route audio to our server’s Whisper. We design these as swappable modules. - Backend API Layer: When the app needs to fetch or save data (login, pulling down a new set of practice sentences, updating progress), it calls our backend via REST or GraphQL (Supabase provides RESTful endpoints and client libraries). There’s also calls to AI APIs: those might be direct from the frontend (for less sensitive requests and if CORS allows) or via our backend proxy (for secure calls that involve API secrets). For instance, generating a custom exercise using GPT might be done by sending a request to our backend function
generateExercise
which then calls OpenAI and returns result, so we don’t expose the API key on the client.
Web App:
- Runs entirely in the browser. On modern browsers, even voice is possible: Chrome’s implementation of the Web Speech API allows real-time-ish speech recognition (though not standard across all browsers yet). The web app will detect capabilities: if using Chrome, can enable full voice commands locally; if not, it might fall back to sending audio to server or require keyboard control for commands.
- The web app is the primary development target (fast iteration with hot-reload, devtools, etc.). It will be responsive (using CSS flexbox or grid) so it can fit small mobile screens up to large desktop screens with reflow. We’ll likely implement a simplified layout for mobile (bigger buttons, maybe hide visuals) versus desktop (which could show more info at once).
- We ensure PWA compliance: a service worker for offline caching of assets and perhaps an offline mode where a set of lessons are available without internet. The app manifest will allow it to be installed on home screen.
Desktop App:
- Using Tauri, we package the same web app. Tauri essentially provides a Rust core that loads our HTML/CSS/JS and presents it in a native window (using the system’s webview). It also allows calling native code via an API bridge. We will use that for deeper integration: for example, file system access if we want to log data locally, or to use system-level TTS if needed. But we’ll minimize divergence – ideally the app behaves the same as in a browser.
- Building for desktop will produce .exe for Windows, .app for Mac, etc. All logic still runs in JS in the webview, but with optional assists from Rust. Tauri’s security means we should explicitly allow any API calls from JS to Rust (limiting risk). According to Tauri docs, you can build small (a few MB) apps since it doesn’t bundle a full Chromium like (tauri-apps/tauri: Build smaller, faster, and more secure desktop and ...)1†L5-L13】.
- We’ll include auto-update in the desktop version (Tauri has mechanisms or we can roll our own checking our server for updates).
- For desktop, one goal is working fully offline. With everything packaged and using on-device TTS, a user could install and run DeepType on a computer with no internet (good for secure environments or those without connectivity).
- One challenge: speech recognition offline. If the PC has no internet, Chrome’s engine won’t work (it sends to Google servers). We may implement a setting “offline mode” where voice commands are disabled or limited to what we can do locally. There are local speech recognition projects (Vosk, Coqui STT) that could be integrated in the desktop via the Rust side for offline STT. That’s an advanced feature – PRD marks it as a possibility if we target truly offline voice input.
Mobile App:
- If running as pure PWA: On Android, Chrome will allow install and mic usage easily. On iOS, Safari PWA has gotten better and does allow some offline caching and even speech synthesis via Web Speech. Speech recognition on iOS Safari might not be available (as Apple hasn’t enabled the API as of iOS 16/17), so on iPhone the voice commands might be limited unless we use a hack with an external service. For the best experience on mobile, a native wrapper with Capacitor is ideal:
- Capacitor wraps our web code into a WebView inside a minimal native app. We get access to Cordova/Capacitor plugins for things like SpeechRecognition (which under the hood could use Siri’s transcription or just present a native prompt). We’d use the capacitor-community speech recognition plugin, and text-to-speech plugin for consistency.
- With that, we can publish to App Store/Play Store. The code remains the same, just including capacitor JS bridge scripts and some config.
- Mobile considerations: touches and gestures. A blind user on mobile might use VoiceOver or TalkBack screen reader. We need to ensure compatibility (the app’s elements must be properly labeled so if VoiceOver is running, it can read the “Start Lesson” button, etc.). Alternatively, the user might rely solely on our in-app voice and not the system screen reader. We must handle both gracefully:
- Possibly detect if a screen reader is active (some OS allow detection) and then adjust (for example, if VoiceOver is on, we might not use our custom gestures to avoid conflict).
- The single-switch idea on mobile: a Bluetooth switch or just tapping anywhere on screen as the one button. We can implement a full-screen invisible button for “select”, and have an automated scanning focus (which the user hears). This is complex but doable; however, voice commands largely cover it too.
- Performance: Mobile devices have limited processing for AI. But since heavy AI (like GPT) calls are cloud-based, and TTS can be offloaded, the client just needs to handle audio playback/recording and UI. We’ll test on mid-range devices to ensure smooth audio and no lag in typing feedback.
Security & Privacy: Given we’re dealing with potentially personal data (user progress, maybe voice recordings), architecture includes:
- Using secure communications (HTTPS for all API calls).
- Storing minimal personal data (maybe just email and performance metrics). All sensitive AI processing (like speech or adaptive suggestions) can be done on the fly and not stored, or if stored, anonymized.
- If we allow cloud sync of user data, we’ll ensure compliance with privacy laws (GDPR etc.), provide data export/delete options.
- The voice data: if we do send voice to our servers (like for Whisper transcription), we’ll do it via secure WebSocket or HTTPS and not retain the raw audio after transcription (unless user opts in to share for improvement).
- These details would be in the PRD under a “Non-functional Requirements” section: performance (e.g., “The app should have <200ms latency for keystroke feedback”), security (“User data encrypted at rest and in transit”), and accessibility (which we treat as functional requirement actually, given the nature).
The cross-platform plan in summary: One codebase, modular design, deploy everywhere. This approach minimizes redundant work and ensures feature parity across platforms. Users can start on one device and continue on another seamlessly. For instance, a student practices on a PC at school, then later on their phone at home – DeepType will sync their progress via Supabase so the experience continues. This ubiquity is a strong point in our PRD because many existing tools are limited to one platform (e.g., legacy typing software on Windows only). We’re essentially making DeepType available wherever the user is, ensuring consistency and convenience.
To assess DeepType’s strategic position, we conduct a SWOT analysis (Strengths, Weaknesses, Opportunities, Threats) along with a Gap Analysis of the current market solutions versus user needs.
Strengths:
- Accessibility Expertise: DeepType is designed with accessibility at its core. This specialized focus is a strong differentiator – few competitors can claim the same level of built-in support for blind users. We have first-mover advantage in this niche.
- AI-Powered Adaptivity: Our use of AI (GPT-4, adaptive algorithms) provides a personalized learning experience that static software cannot match. This not only improves effectiveness but also gives us a tech prestige (“the most advanced tutor out there”).
- Cross-Platform Reach: Being available on web, desktop, and mobile greatly expands our user base. Schools that use Windows PCs, individuals on Macs, and users in developing countries who primarily have Android phones – all can use DeepType. This ubiquity is a strength.
- Backed by Empathy Labs Vision: The strong branding and vision (Empathy + Deep Tech) builds trust. Stakeholders are more likely to support a mission-driven product. It’s not just software, it’s a cause. That can galvanize community support, volunteer contributions, etc.
- Community and Cost Advantage: If we keep the core free, we align with the NVDA approach which saw massive communi (WebAIM: Screen Reader User Survey #10 Results)35†L1-L4】. This can turn users into evangelists. Also, open-sourcing parts can harness community development (strengthening the product beyond our internal capacity).
Weaknesses:
- Limited Initial Content/Scope: As a new product, we might launch with a limited curriculum or features compared to mature competitors. For example, Talking Typing Teacher has a whole suite of lessons, games, and a word processor; we might not have all that on day one. Users might find content not deep enough if we don’t rapidly expand it.
- Reliance on AI/Internet: Some features (like voice AI or advanced adaptivity) rely on internet connectivity and third-party APIs. In scenarios of no internet, our experience might degrade compared to offline software that’s fully self-contained. Also, API costs (OpenAI etc.) could become a burden if usage scales and isn’t monetized proportionally.
- User Adoption Hurdle: Ironically, reaching the target users can be challenging. Many visually impaired learners depend on instructors or institutions to recommend tools. Convincing these gatekeepers (teachers, rehab specialists) to try a new product is a slow process. We also must support users who are not tech-savvy – the onboarding must be foolproof. Any small usability issue for a blind user could turn them away. So our margin for error is slim, especially with such a discerning audience that has been trained to rely on known solutions.
- Small Team & Resources: Initially, we are likely a small team. Implementing and maintaining multi-platform software with heavy AI might stretch our resources. We’ll need to prioritize carefully. Lack of certain domain expertise (e.g., if none of us are blind, we might misjudge some UX aspects) could be a weakness – that’s why involving beta users and experts early is crucial.
- Unproven Effectiveness: We believe our approach is superior, but we will need data to prove learning outcomes. Until we gather success stories or studies, some educators might be skeptical if AI adaptivity truly yields better results than, say, structured repetition. We might also face scrutiny: “Does this actually teach touch typing effectively, or is it too fancy?” Overcoming that initial doubt is a challenge.
Opportunities:
- Market Void / Blue Ocean: The niche of accessible typing tutors is under-served. The main products are few and dated (TypeAbility, Talking Typer, etc.), leaving a large global user base with needs not fully met. This void is our opportunity to become the de facto solution worldwide, much like how JAWS became synonymous with screen reader in the ’90s, or how Be My Eyes became essential on smartphones for visual assistance. We can define the category.
- Institutional Partnerships: Schools for the blind, vocational rehab centers, libraries, and disability organizations are actively looking for tools to improve digital literacy. For instance, partnering with national blindness federations or ministries of education could give DeepType an official channel to thousands of users. There are also grants in assistive tech we can tap into (e.g., governments funding tech for inclusive education).
- Technology Trends: Voice assistants and voice UIs are trending. People are increasingly comfortable talking to devices (Alexa, Siri, etc.). DeepType rides this wave as a voice-first educational app. We can piggyback on this trend in marketing, and even possibly integrate with those ecosystems (imagine an Alexa skill “typing practice” that is a subset of DeepType’s functionality – a stretch idea, but possible).
- Expansion to Other Skills: Once we establish ourselves in typing, the underlying tech (voice interaction + adaptive learning) could apply to other skills for visually impaired users: using a touchscreen, learning braille through audio drills, coding tutorials (teaching programming with spoken guidance), etc. DeepType could evolve into a suite (“DeepSkills” platform). The opportunity is to leverage our tech platform for multiple products, increasing ROI on our R&D.
- Corporate Accessibility Compliance: Companies are under pressure to be more inclusive. DeepType could be marketed to employers to help upskill blind employees or as a tool for diversity training. It might even be used by sighted employees to practice screen reader mode or keyboard-only usage as empathy training. That’s a tangential use case, but an opportunity for B2B sales beyond the obvious.
- Competitive Weaknesses: Our competitors have known weaknesses we can exploit:
- JAWS and TypeAbility: Expensive and thus not accessible to many (JAWS is $90/year or $1200 (A New Way to Obtain JAWS and ZoomText | Accessworld | American Foundation for the Blind)129-L137】; TypeAbility requires JAWS to run, doubling cost). We can undercut by being free/low-cost.
- Talking Typing Teacher: No longer actively supported (as noted, manufacturer offers no tec (Talking Typing Teacher - Standard)). If it’s abandonware, users will gladly switch to a modern supported product.
- General Typing Software: They ignore blind users entirely – an open goal for us to score. Also, even for sighted users, many find traditional typing tutors boring – our unique voice-interactive approach might attract a subset of mainstream users (like those who prefer auditory learning or have ADHD and enjoy a more interactive style).
- Global Reach & Localization: Many developing countries have a growing population of visually impaired individuals with increasing access to tech (cheap Android phones, etc.), but absolutely no local-language typing tutors. We can lead in localization – using our AI to generate lessons in various languages quickly. The opportunity to become the go-to solution in non-English markets is huge (and the competition there is basically zero beyond English). This not only is good mission-wise but also opens avenues for funding from international agencies focusing on literacy and disability.
Threats:
- Competition from Big Tech: If our idea proves the market, big players might step in. For instance, Microsoft could enhance their Seeing AI app or Windows’ Narrator to include a typing tutor mode. They have resources and existing user base (Seeing AI is free and popul (Seeing AI - Apps on Google Play)†L11-L18】; JAWS or NVDA could implement a tutorial feature in future versions). If an OS-level feature appears, it’s a serious threat (why install DeepType if Windows teaches typing out-of-the-box?). Our defense is to move quickly and establish brand loyalty and superiority before that happens.
- Rapid Tech Changes: The AI we rely on (OpenAI, etc.) is evolving. There’s a risk of API price increases or policy changes (for example, if OpenAI changes how their educational use licenses work or if a free tier is removed, etc.). Also, new AI models might outshine our approach, requiring continuous integration. If we fail to keep up with the latest (like not adopting Google’s superior model once out), a competitor could and get an edge.
- Funding Risks: As a mission product, if we don’t get the right funding, development could stall. It’s a threat that the project might not sustain purely on revenue early on (because we plan to subsidize users). We mitigate this by pursuing partnerships and proving value to paying customers ASAP.
- User Adoption Risks: The visually impaired community often relies on word-of-mouth and is careful with new tech (due to many products over-promising and under-delivering). If our early version has bugs or inaccessible pieces, word could spread and tarnish our reputation. We could get labeled as “not truly accessible” which is hard to recover from. So quality and community engagement are crucial to avoid the threat of negative perception.
- Legal/Compliance: If we were to collect voice data, there are privacy concerns. Mishandling user data could lead to legal issues or distrust. Also, as an educational tool, we need to be careful with claims – if we operate in the EU, we might need to comply with GDPR, etc. Not doing so could shut us out of key markets. While not an immediate competitor threat, regulatory issues can threaten the project’s reach (for instance, needing to ensure our TTS voices are licensed for our use case, etc.).
- Competition from Adjacent Fields: An indirect threat: a mainstream typing tutor might add an “accessibility mode” cheaply (like just adding voice prompts). Even if it’s not as good as DeepType, if it’s part of a product that’s already deployed widely, it could suck away potential users. For example, if Typing.com (a popular free web tutor) decided to add a screenreader-friendly mode, schools might just use that rather than try something new. We have to stay ahead by truly outperforming any half-measures others might do.
The gap analysis compares user needs (especially those of blind/low-vision learners) against what existing solutions provide, highlighting where those solutions fall short and how DeepType fills the gap:
- Need: Accessible, Non-Visual Guidance.
Gap in current solutions: Traditional typing software assumes you can see on-screen instructions or hands. Blind users instead use specialized programs or screen readers, but these often provide minimal guidance (maybe just spoken letters). Tools like Talking Typing Teacher addressed this with recorded (Talking Typing Teacher | BoundlessAT.com)137-L146】, but it’s not interactive beyond fixed lessons. Modern screen readers have a “keyboard learn mode” (press a key and it announces it), but no structured lessons or progression. DeepType’s Fill: We provide a full curriculum via audio, not just isolated key feedback. It’s interactive and context-aware voice guidance, which currently no mainstream or assistive tool fully offers (existing ones either have voice but no AI interactivity, or interactivity but not voice-first). We also allow voice input for control, which none of the old tutors do – they required keyboard navigation through menus, which is cumbersome. - Need: Adaptive Learning Pace.
Gap: Current teaching programs for the blind are one-size-fits-all. For example, TypeAbility has 99 lessons that everyone goes through the same way. If a student already knows some touch typing, they can’t easily skip ahead without manually picking lessons. Conversely, if they struggle, the software doesn’t custom-tailor more practice; the teacher would have to manually repeat lessons. DeepType’s Fill: The adaptive AI engine ensures each learner gets a customized experience – essentially a personal tutor adjusting on the fly. This is a major gap we fill; none of the existing products leverage AI or adaptive algorithms. We cite how modern AI can do this, e.g., “NVDA and JAWS don’t teach, and older tutors don’t adapt – DeepType is the first to personalize typing education in real-time.” - Need: Engagement & Motivation.
Gap: Learning to type by touch is tedious, more so when done with bland software. Many visually impaired students lose interest in current tutors because they involve repetitive drills with monotonous feedback. Gamification is minimal (Talking Typer might have some games, but audio games were limited due to tech of its time). Also, immediate encouragement is something a human teacher gives – software often just says “incorrect” in a flat tone. DeepType’s Fill: We use gamified elements and lively, human-like encouragement. The AI voice can vary phrases, crack a mild joke, or reference earlier progress (“You consistently get F right now – fantastic improvement!”). By tracking progress and awarding badges, we introduce a gaming element. There’s currently a gap in turning typing practice into something fun for blind users; DeepType intends to fill that with audio games and challenges. Even small touches like different sound effects for successes vs. mistakes can improve engagement, and our design documents emphasize a rich audio feedback scheme, whereas older software might just beep or say “wrong” in the same tone. - Need: Modern Platform Support.
Gap: A huge practical gap is that many existing solutions run only on certain platforms (mostly Windows). For instance, Talking Typing Teacher is a Windows program from decades ago (does it run on Windows 10/11 easily? Possibly with compatibility mode). Mac users or mobile users have zero options in that category. In today’s world, a learner might want to practice on their phone or tablet – currently not possible with specialized typing tutors. DeepType’s Fill: Cross-platform availability. We meet users where they are. No current competitor offers a mobile app for typing for the blind. DeepType will likely be the first to have that. This is a critical gap we fill: accessibility of the tutor itself on different devices. - Need: Support & Community.
Gap: As noted, some older products are no longer supported (no updates, no support lines). If a user hits a bug or compatibility issue, they’re stuck. Also, there’s no community around them (perhaps some mailing lists, but nothing active). DeepType’s Fill: As a new, mission-driven product, we plan to build a community forum where users can share experiences, ask questions, and where we (developers) actively respond. This community aspect plus active support (even if via email or chat) addresses the frustration gap. Also, if DeepType is offered partly open-source or free, communities of volunteers (like translators, or those making lesson content) can form, which doesn’t exist currently for closed old software. - Need: Affordability.
Gap: The cost of some solutions (JAWS + TypeAbility combo, or even just JAWS’s own training material) is high. Many individuals in low-income settings cannot afford these. There is a clear gap for a free or low-cost solution. NVDA filled that gap in screen readers, but for typing tutors, NVDA only goes so far with its basic help mode. DeepType’s Fill: Free core offering – we will ensure a student can learn touch typing without paying. This is filling a direct economic gap. By citing JAWS’ cost and NV (WebAIM: Screen Reader User Survey #10 Results) (A New Way to Obtain JAWS and ZoomText | Accessworld | American Foundation for the Blind)129-L137】, we strengthen the argument that free, quality tools see widespread adoption in this space. - Need: Integrations & Extendability.
Gap: Current tools are pretty closed. For example, a teacher can’t easily add custom lessons in Talking Typing Teacher beyond what’s built-in (maybe they have some minor customization, but likely limited). And those don’t integrate with learning management systems or other tools. DeepType’s Fill: Because it’s modern and API-driven, we could integrate with other systems. For instance, a teacher could download a report of a student’s progress or we could integrate with a braille display (to output the text being typed in braille for deaf-blind users – a future possibility). Being built on web tech, we also can update content continuously, push new lessons, etc., which older software cannot unless you install an update. So we fill the gap of an evolving platform versus a static product.
In summary, the gap analysis reveals that DeepType addresses multiple unmet needs: a truly accessible interface, personalized learning, engaging experience, multi-device support, and affordability – none of which are collectively present in any one existing product. This analysis reinforces our value proposition: DeepType isn’t just an incremental improvement, it’s a generational leap in assistive typing education. Our strategy is to communicate this clearly to users and stakeholders: we’re solving problems that have lingered for years in this domain.
To build DeepType efficiently, we break down the development into features and modules, and allocate “deep research” resources (like focused R&D tasks or use of large context AI analysis, measured in notional tokens) to each. We also consider the complexity (for debugging and planning).
Below is the feature development order (roughly in priority) along with an estimation of research and debugging complexity:
-
1. Audio Lesson Engine – Priority: Highest (MVP)
Description: The core system to play audio instructions, accept keystroke input, and give immediate audio feedback. This includes the basics of lesson scripting (sequences of prompts and expected inputs).
Research Allocation: 🔎🔎 (2 tokens) – Requires researching best practices in teaching touch typing (education domain research) and tuning TTS for clarity. The basic approach is known (many tutors do this), so minimal deep AI research needed beyond selecting a pleasant voice and designing effective prompt wording. Some research into phonetics may help (ensuring letters are pronounced distinctly, e.g., “F” vs “S”).
Complexity/Debugging: Medium. Handling keyboard events and timing feedback is straightforward, but we must fine-tune for different typing speeds and ensure no input is missed. Debugging involves making sure fast typists don’t break it and slow typists aren’t rushed. Also must debug how it behaves if user presses wrong keys multiple times, etc. -
2. Voice Command & Control – Priority: High (MVP adjunct)
Description: Implementing the ability to navigate menus or trigger actions with speech (and alternatively with a single key). For MVP, focus on a few essential commands like “repeat”, “next”, “pause”.
Research Allocation: 🔎🔎🔎 (3 tokens) – We need to do a deep dive on speech recognition integration. Specifically, research how to use the Web Speech API or Whisper for near-real-time command recognition. This might involve experimenting with different recognition approaches (browser vs server) and measuring latency. There’s also research on keyword spotting (like maybe always listening for the word “DeepType” or a wake word). We might allocate one token for investigating Web Speech API constraints, one for Whisper API chunk (Transcribe via Whisper in real-time / live - API - OpenAI Developer Community) (Transcribe via Whisper in real-time / live - API - OpenAI Developer Community)1】, and one for designing a command vocabulary that’s easy to recognize (e.g., avoiding hard-to-distinguish words).
Complexity/Debugging: High. Speech input can be unpredictable (accents, noise). Debugging this involves lots of testing with different voices and ensuring false positives/negatives are minimized. Ensuring that the voice recognition doesn’t interfere with the lesson audio (it might pick up the tutor’s voice – we likely have to mute the recognizer when the tutor is talking, etc.). One-button control also needs careful state management (scanning menus etc., which can be tricky to debug timing). -
3. Curriculum Content & Progression – Priority: High
Description: Develop the actual lessons from beginner to advanced. Define each lesson’s content (keys introduced, practice words, etc.), and the progression logic (# of exercises before unlocking next lesson, criteria to pass).
Research Allocation: 🔎 (1 token) – Mostly educational design rather than technical. We might use one deep research cycle to review existing touch typing curricula (like what order keys are taught in QWERTY, proven methods such as home row first, etc.). Perhaps consult educational research on typing for blind learners (maybe one exists from APH or Perkins). Use AI to generate a bank of practice sentences that are phonetically diverse and relevant. This is not heavy on algorithm research, more on content quality.
Complexity/Debugging: Low/Medium. Content itself doesn’t “bug out” in the software sense, but ensuring it’s effective is key. We will need user testing to adjust difficulty. The main debugging is ensuring the system properly loads and transitions between lessons according to the curriculum definitions (state machine bugs, etc.). That’s manageable. -
4. Adaptive Learning Algorithm – Priority: Medium-High (after basic version works)
Description: The AI that adjusts difficulty and practice based on user performance. Initially maybe rule-based (if error rate > 20%, repeat lesson), eventually ML-driven (pattern recognition).
Research Allocation: 🔎🔎🔎🔎 (4 tokens) – This is a core differentiator, so we allocate significant deep research. We’d research:- Optimal adaptive learning techniques (perhaps look at how Kahn Academy or language learning apps do it).
- Possibly use a reinforcement learning or Bayesian model to decide when a user is ready to progress. An AI research token might go into prototyping a model that predicts “mastery” of a key.
- If using GPT to analyze mistakes, allocate a token to prompt engineering: e.g., feeding it the sequence of user inputs and asking it to suggest which keys to focus on.
- Research also includes user modeling: how to keep a profile of user strengths/weaknesses. We might dedicate one token to reading academic papers on personalized learning for keyboard or similar.
Complexity/Debugging: High. This feature can get complex, as it introduces a lot of condition branches and data handling. Debugging means verifying the adaptation does what we expect – e.g., does it properly detect patterns? We’ll create simulation tests (feed in a fake user who always messes up certain keys, see if it adapts accordingly). Also need to ensure it doesn’t make the experience erratic (too easy or too hard jumps). The complexity is both in design and in testing the machine learning components. We will likely implement a simpler heuristic first (easier to debug), then gradually hand over to AI suggestions.
-
5. Multi-language & Localization Support – Priority: Medium
Description: Allow the tutor to teach typing in different languages (different keyboard layouts) and localize the interface and voice prompts to other languages.
Research Allocation: 🔎🔎 (2 tokens) – We’d do research on how different keyboard layouts (AZERTY, etc.) should be taught – order of keys might differ. Also research on language-specific considerations (for example, some languages have accented characters, we must handle speaking those). One token for investigating TTS and STT capabilities in target languages (ensuring our chosen APIs have good Spanish, French, etc. voices and recognition). Another token for internationalization framework (how to structure our content so it’s translation-friendly). Possibly use AI to assist in translating content or generating language-specific practice text.
Complexity/Debugging: Medium. Internationalization in code can cause bugs (e.g., text not appearing if locale switch fails, or right-to-left languages messing layout). We’ll need to test each language environment thoroughly. Also if we support switching keyboard layout, the input key codes vs expected letters mapping must be handled – not too hard, but details matter (e.g., on a French keyboard hitting the key labeled ‘A’ actually sends a different scancode). We should plan a robust key mapping module. Debugging that requires physically testing or emulating different keyboard layouts. -
6. Gamified Modules (Audio Games) – Priority: Medium (Enhancement)
Description: Develop game-like exercises (e.g., a “typing race” where the user must type words quickly to win, or an audio target practice as described).
Research Allocation: 🔎🔎 (2 tokens) – Research audio game design for the blind. There’s a niche community and some existing games (like audio shoot ’em ups). We’d spend tokens on learning what makes audio games fun and how to convey game state via sound. Also maybe research using spatial audio cues (e.g., a sound panning left/right to indicate something). If we incorporate scoring, research psychological aspects of reward systems.
Complexity/Debugging: Medium. Game logic can be complex but contained. The main debugging is making sure the games remain accessible (no unintended need for vision) and that timing loops are correct (e.g., in a fast-paced game, ensure performance on different devices doesn’t lag). There might be interesting edge cases like pausing a game, or what if the user speaks a command during a game unintentionally. We’ll sandbox games so they don’t break the main flow. They are optional modules, so they won’t hold up core launch if issues arise (they can be beta features toggled off if needed). -
7. User Management & Cloud Sync – Priority: Medium
Description: Enabling user accounts, saving progress to cloud, and syncing across devices. Also includes the teacher dashboard (for multiple users).
Research Allocation: 🔎 (1 token) – Most of this is straightforward use of Supabase (which we already know). Minimal research needed beyond reading Supabase docs thoroughly and perhaps looking into data structures for storing metrics. One token might be spent on security/privacy best practices, ensuring we implement auth securely and consider data encryption. If doing a teacher portal, research compliance with student data regulations (FERPA in US schools, etc.).
Complexity/Debugging: Medium. Integrating auth can introduce edge cases (password resets, offline mode for non-logged in, etc.). Data sync bugs might include duplicate records or conflicts if offline edits are allowed. The teacher dashboard is essentially a separate interface filtering through data – complexity depends on how fancy we get (if just viewing stats, it’s simpler; if real-time monitoring or control, more complex). We will likely phase this: initial just personal accounts, later teacher view. Testing multi-user scenarios and data privacy (ensuring user A can’t see user B’s data) is crucial. -
8. Accessibility QA & Polishing – Priority: Continuous
Description: This is not a single feature but an ongoing task: conducting thorough accessibility testing (with screen readers, various assistive tech) and addressing any gaps.
Research Allocation: 🔎 (1 token) – We allocate “deep research” for keeping up with accessibility best practices. That means reading updated WCAG guidelines, ARIA techniques, or consulting with accessibility experts. Possibly use an AI to audit our UI code for accessibility issues.
Complexity/Debugging: Medium-High. Debugging accessibility is a bit different: it might be fixing an ARIA label or adjusting focus order – not complex algorithmically, but requires meticulous attention. We will treat any accessibility issue as a high priority bug. For example, if during testing we find that a screen reader announces something incorrectly, that needs fixing. It’s an iterative process with each UI component. Also making sure our one-switch navigation doesn’t trap a user or cause an endless loop is something to carefully test (simulate a user only hitting spacebar and ensure they can do everything, albeit slowly). -
9. Advanced AI Tutor (“Conversational Mode”) – Priority: Low (Future enhancement)
Description: A mode where the user can ask the AI questions like “What fingers should I use for this?” or even have a conversational lesson summary (“What did I do wrong today?” – and the AI explains). This leverages the LLM to act like a tutor you can talk to.
Research Allocation: 🔎🔎🔎 (3 tokens) – This is a bleeding-edge feature, so heavy research. We’d experiment with prompting GPT-4 to act as a tutor given a transcript of the session. One token on figuring out how to condense session data for the prompt (maybe we use a summary), one on natural language understanding of user questions (maybe some fine-tuning or prompt library like “if user asks something about typing technique, answer from our knowledge base”), and one on voice interaction specifics (maintaining context in a voice conversation, possibly using OpenAI’s conversation mode or multi-turn handling). Also research if any similar conversational tutors exist (for other subjects) to learn from their approach.
Complexity/Debugging: High. This combines many systems: ASR (speech to text) to get the question, the LLM to answer, TTS to speak back. There’s potential for error at each stage (misheard question, or nonsensical AI answer). We need to constrain the AI to be accurate and not give harmful advice. Debugging requires a lot of testing with various questions. We may need to implement a fallback if AI fails (like a static FAQ database as backup). This feature would likely be introduced carefully as a beta with disclaimers.
Each feature/module will be tackled in roughly the above order, though some can be parallel (e.g., content development can happen alongside coding the engine, voice command R&D can parallel basic engine coding).
The term “token-based deep research” implies we will use our access to AI (large context models, web research) in measured chunks to answer key unknowns for each feature. For instance, before implementing the voice command, we might spend one “token” (a dedicated session) with GPT-4 browsing literature on best voice UX practices, and another token to prototype the code using an AI coding assistant. By allocating these tokens, we ensure we do not go in blind on complex features – we first gather insights or even let an AI help structure the solution.
Debugging Complexity Allocation: We acknowledge some features will consume more debugging time (e.g., speech-related features, adaptive logic). We allocate our development sprints accordingly:
- The initial Audio Lesson Engine and basic UI gets a large chunk of initial debugging allocation since that must be rock-solid (this is our foundation).
- Voice commands and adaptivity we will release in beta phases, expecting to gather bug reports and iterate.
- We’ll maintain a testing matrix (different OS, browsers, with/without screen readers, etc.) and allocate time to run through it for each major release.
Finally, the roadmap in terms of timeline (as partially outlined in the pitch deck’s slide 8) is:
- Phase 1 (Months 0-3): Implement features 1, 2, 3 to deliver a functional MVP for English on web/desktop. Conduct deep research for adaptivity but implement basic logic first. Begin user testing with a small group (maybe from a local blind community or online forum).
- Phase 2 (Months 4-6): Implement adaptivity (feature 4) and polish based on feedback. Add more lessons to cover full keyboard. Introduce user accounts (part of 7) if needed for testers. Beta release to broader audience, possibly in partnership with an organization to pilot in a class.
- Phase 3 (Months 6-9): Add some gamified exercises (5 and partial 6) to increase engagement. Start mobile packaging. Incorporate localization groundwork (so non-English testers can try Spanish or others by end of this phase).
- Phase 4 (Months 9-12): Focus on robust cloud sync and teacher dashboard (rest of 7) for institutional use. Expand language support. Hardening accessibility compliance (8) in preparation for an official 1.0 launch. Maybe start work on the conversational AI tutor (9) in R&D, though that might go into next year’s plan.
- Beyond: After core product is stable and widely adopted, roll out the advanced AI tutor as a premium add-on and explore other skill modules.
This feature breakdown and roadmap ensures that we tackle critical needs first (so users get value early), while also setting aside time to delve into complex features with adequate research. By assigning “tokens” of deep research to the hairy problems, we leverage AI and existing knowledge to de-risk our development path. Each completed feature moves us closer to the vision of a comprehensive, intelligent typing tutor that leaves no learner behind.
DeepType’s UX/UI is governed by a simple rule: if it’s not accessible, it doesn’t go in the product. Every interface element and interaction is crafted to be usable by our target users. Here we outline key guidelines and principles that our design and development must follow, ensuring an accessibility-first experience:
Designing for a voice-first interface means assuming the primary mode of user interaction is through spoken dialogue and audio feedback:
- Everything is Announced: The user should never have to guess what’s happening. Whenever the app state changes, a concise announcement is made. For example, when a new lesson starts, it might say, “Lesson 5: Typing words with S and D. Press any key to begin.” When a lesson is completed, it announces the result (“Lesson complete! Accuracy: 90%, Speed: 12 WPM. Great job!”).
- Conversational Tone: The voice interactions should feel natural. Instead of robotic commands, we use conversational language. This means using first person (“I will now show you...”) or second person (“You can try that again”) to create a rapport. According to usability research, voice interfaces should b (3 Reasons Why It's Time to Talk about Voice UI - Frog Design)to be effective. We apply this by giving our TTS prompts some personality (while staying professional and clear).
- Short Prompts & Confirmations: Users can’t see a long list of options, so voice prompts should be brief and not overload memory. For example, in a menu, don’t read all options at once if there are many; instead, present them one at a time or in small groups. Use auditory icons if helpful (like a subtle tone indicating more options available). After a voice command is given by the user, the system should confirm it understood (e.g., User: “repeat”, System: “Repeating the instruction: [then repeats].”). This confirmation principle prevents confusion in case the speech recognition misheard something.
- Tolerance and Recovery: If the system doesn’t catch a voice command or the user says something unexpected, it should handle it gracefully. Maybe provide a gentle reprompt: “Sorry, I didn’t get that. You can say ‘repeat’ or ‘menu’.” This ensures the user never feels stuck. Designing these flows is critical: every voice prompt in our flowchart has an error handling branch.
- No Unnecessary Voice Input: We keep required voice input minimal. While we support voice commands, we don’t force the user to speak at any time if they’re not comfortable. There’s always an alternative like pressing a key or clicking. Voice is offered as a convenience and necessity for some, but not mandatory. This principle is inclusive (some blind users are non-verbal or simply shy to speak commands, and some have speech impairments).
- Environmental Awareness: Recognize that users might be in noisy environments or around others. We allow the user to use headphones and still operate. For privacy or quiet settings, we might include an option for vibrational feedback or subtle sounds instead of spoken feedback (for example, a user who is deaf-blind might rely on vibrations – although that’s a very niche case, advanced but possible with a Braille display’s vibrate feature or phone vibration on key events). Our UI should consider an option “mute voice output” which then outputs to braille display or just uses beeps for correct/wrong (for those who can’t hear). This is part of being voice-first but not voice-only.
We design DeepType such that a user with only one switch or one finger available can still navigate the entire application:
- Focus Highlight and Scan: At any given screen (or menu), one element is in focus (virtually). For example, on startup, “Start Lesson” might be the focused item. If the user presses the “Select” button (space bar or a hardware switch), that item activates. If the user does nothing for a moment or presses a different special key (or the same button depending on config), the focus will move to the next item and announce it. This is a sequential scan mechanism. It’s similar to how many switch-accessibility interfaces work, and also how TV interfaces or old Nokia phones worked. This will be an optional mode if voice commands aren’t used.
- Timing vs Manual Advance: Some scanning UIs auto-advance focus after X seconds. We likely prefer manual advance (press to move focus) because timing can be stressful and users might miss the window. Manual gives users full control: e.g., press Tab or a special “next” key to cycle focus, and Space or “select” to activate. We ensure that both the physical keyboard and an on-screen single big button can do these (on touchscreen, perhaps tap = select, long press = next, or vice versa — we have to pick an intuitive mapping).
- Consistent Layout: The number of interactive elements at any time is kept minimal to aid scanning. For example, the main menu might have 3 options (Start, Settings, Exit). In a lesson, perhaps only one or two (maybe a “stop” button). By limiting choices, we reduce how much the user must scan through. This is a guideline: keep UI screens simple. If a complex input is needed (like entering an email for signup), we handle it with a special flow that is still single-switch friendly (e.g., an onscreen keyboard that scans through letters group by group, although typing an email might be easier if we allow voice dictation or just let a sighted assistant do that step).
- Visible Focus Indicator: For low-vision and sighted support observers, we will have a high-contrast focus ring or highlight around the currently focused element. Perhaps a thick yellow outline or a bright glow. This is an ARIA best practice for keyboard navigation – ensure focus is not hidden. We also might enlarge the focused item or put a subtle animation (like pulsing) to catch attention. This helps users who can see a bit to follow along, and it’s also good for any keyboard user.
- Auditory Focus Indicator: In addition to the voice reading the focused element (“Settings”), we could have a sound cue when moving focus. For instance, a tick sound each time focus advances, and a slightly different tone when wrapping around back to first item. These auditory cues help users understand the interface structure (like “there were 3 items, I heard 3 ticks and now a wrap-around sound, so I know I’m back at top”). This concept comes from existing screen reader behaviors and auditory UI design.
- ARIA Roles for Widgets: We will use ARIA roles to inform assistive tech that our custom scanning UI is a list of menu items. For example, in HTML we might mark the menu container with
role="menubar"
and each option asrole="menuitem"
. This way, if a screen reader is running, it knows to treat them as a menu. Even though we have our own voice, this ensures compatibility. According to MDN, ARIA roles provide semantic meaning so screen reade (WAI-ARIA Roles - Accessibility | MDN)t consistently. We implement roles likebutton
,menuitem
,slider
(if we had any slider controls) so that any built-in AT recognizes DeepType’s components. Even our custom “focus highlight” that moves could be conveyed via ARIA focus changes (like moving a real invisible focus). - No hover dependency, large click targets: We design for keyboard, not mouse hover. So all interactions must be triggered by focus+activate, never just hover. This is important as many blind users can’t hover. And for motor impaired, hovering might not be possible. We also ensure buttons are large and well spaced (easy to hit with a gaze or a shaky hand if using a switch). This ties into high contrast (next section) – large, distinct buttons.
While DeepType can be used without looking, we still provide a visual interface that is optimized for low vision and color-blind users:
- Color and Contrast: We adhere to at least WCAG 2.1 AA contrast ratio (4.5:1 for normal text, 3:1 for large text) and aim for AAA (7:1) wherever feasible. The default theme likely will be white text on black background or vice versa, which yields very high contrast. Our palette will be limited to a few colors, used consistently (e.g., one accent color for focus or correct input indicators). We’ll avoid color combinations known to be problematic (like red/green together, since many have red-green color blindness). Any color coding will also have a secondary indicator (like a symbol or text). For example, if we use green text for correct and red for wrong on a visible scoreboard, we’ll also prefix with a “✔” or “✖” symbol so color isn’t the only cue.
- Font and Size: All text is in a clear, sans-serif font (for readability, e.g., Arial, Verdana, or a specifically accessible font like APHont or Atkinson Hyperlegible). We’ll use a large base font size (at least 18px for body, larger for headings). Users can also customize text size in settings. For dyslexic users, maybe allow a font choice (though our main audience is visually impaired, some low-vision users might also have tracking difficulties, so we consider fonts accordingly).
- UI Element Design: Buttons and other controls will have strong outlines and filled shapes to distinguish them. For example, instead of a thin outline checkbox, we might use a big toggleswitch with labels “On/Off”. We’ll ensure focus state of a button is very visually obvious (often default focus indicators are tiny dotted lines – we’ll override that with something bolder).
- Reduced Clutter: A sparse design benefits low-vision users who might zoom in. We keep screens uncluttered so that zooming doesn’t hide critical info off-screen. Also, fewer elements means easier high-contrast styling (no need to differentiate many shades). We try to use text and simple icons, avoiding background images or patterns that could reduce contrast or introduce confusion.
- Dark Mode / Light Mode: Likely, a dark background with light text is best for many visually impaired (less glare). But some prefer light background. We’ll offer at least these two high-contrast modes out of the box, possibly more (like yellow on black, which some with tunnel vision prefer). The user can choose the scheme that suits their vision. All our color choices will be stored as variables so that switching theme is seamless.
- Testing: We will test the interface with common color-blind filters and contrast checkers. Also test on a monochrome setting (imagine someone using a device in high contrast mode where everything is forced black & white – our design should still function). We also consider Windows High Contrast mode (which overrides app colors). Ideally, our app still works if OS forces its palette (which it will if running as a Windows app with high contrast enabled – we should ensure our text doesn’t disappear or something under those conditions).
- No Reliance on Vision for Key Info: Even though we make visuals as clear as possible, we still adhere to a rule: anything indicated visually (like “this letter on the on-screen keyboard is highlighted”) is also conveyed via audio. This overlaps with voice-first, but it’s a guideline to ensure equal access. For instance, if a visual user sees a progress bar, a blind user hears a progress percentage spoken. The high-contrast visual is for those who use it, but not mandatory to understand.
Using proper ARIA (Accessible Rich Internet Applications) attributes is essential to make our custom UI understandable by screen readers and assistive tech:
- Role Attribution: Every interactive element gets an appropriate
role
. Buttons will be<button>
HTML elements orrole="button"
if a custom element. Links use<a>
orrole="link"
. If we have a custom control (like a toggle or a non-standard widget), we’ll find the closest ARIA role (e.g.,role="switch"
for an on/off toggle, which screen readers announce as a toggle and include state). ARIA roles ensure that assistive tools present and support int (WAI-ARIA Roles - Accessibility | MDN)consistent way. - Labels and Descriptions: We must provide labels for controls that have no visible text. For example, an icon-only button (if we had one) needs
aria-label="Pause lesson"
so screen readers know what it is. Even if there is visible text, sometimes we may want a clearer screen reader label. For instance, a “Next >” button might be clearer as “Next Lesson” for a screen reader, so we’d doaria-label="Next Lesson"
on it. We’ll also usearia-describedby
where longer help text is relevant. For example, an input field for a profile name might have a description like “We use this to greet you in the app”, and we link that viaaria-describedby
so the user can hear it if they navigate for more info. - Live Regions: During lessons, there may be dynamically updating text (like a live WPM speed or an error counter). We will utilize ARIA live regions (
aria-live="polite"
orassertive"
depending on importance) to announce changes. E.g., if we display “Errors: 3”, each time it increments we might announce “Mistake count: 3”. Using a live region allows screen readers to automatically read changes without losing focus context. We’d likely set it to “polite” so it waits until the user is not in the middle of something to announce. - Focus Management: ARIA alone is not enough; we also manage keyboard focus. After certain actions, we might need to programmatically move focus to a logical place. For instance, after closing a modal dialog (like a help popup), return focus to the element that opened it. We’ll follow WAI-ARIA authoring practices for dialogs, menus, etc., which specify where focus should go and using
aria-modal
,aria-expanded
on toggles, etc. If we make a custom dropdown or pop-up, we ensure to trap focus within it while open and restore on close (these details matter for screen reader and keyboard-only users). - Testing with Screen Readers: We will test with NVDA, JAWS, and VoiceOver to ensure our ARIA labels make sense in the actual announcement. Sometimes what we think is clear might be verbose or awkward when spoken. We might fine-tune labels accordingly. For example, if a screen reader already says “Button”, we don’t need to include the word “button” in our label (to avoid “Start button button”). We’ll use accessible name computation rules to get optimal output.
While DeepType is not image-heavy, any non-text content will have a text equivalent:
- Logo and Branding: Our logo on the app (if present) will have
alt="DeepType"
oraria-label="DeepType logo"
if it’s decorative we might mark italt=""
(null alt) to skip it for screen readers. But likely we want the name read. - Illustrations: If there are any illustrations (maybe on a welcome screen or documentation), they will have descriptive alt text. E.g.,
alt="A person typing on a keyboard with eyes closed, representing touch typing confidence."
Something that conveys the idea, unless purely decorative in which case null alt. - Charts/Graphs: Not likely in the user interface, but if, say, a teacher dashboard has a progress chart, we’ll provide a summary (e.g., an ARIA live region summary: “Student’s speed increased from 10 to 20 WPM last week”). If we had to include a graph in a report, we’d ensure a textual table or summary is available.
- Audio Descriptions: Since our primary output is audio, not visual video, we might not have videos requiring captions. But if there were any instructional videos, we’d need captions and audio descriptions. However, our content is mostly self-voicing interactive, so that covers it.
- Iconography: All icons used in buttons will have proper labels. If an icon is purely decorative (like a decorative flourish), we mark it hidden from assistive tech (
aria-hidden="true"
or CSS). - ARIA for Non-Text Elements: If we use a canvas or custom element (imagine a game visualizer or something), we’d use
role="img"
with anaria-label
describing it, or provide a textual alternative adjacent. But likely, we’ll avoid complex graphics.
- Tab Order & Logical Navigation: The tab order (focus order) of elements in the DOM will match the logical order we read them in voice. This prevents confusion for those using keyboard nav. We ensure modals and popups appear logically after triggers in DOM or use appropriate
aria-*
to inform AT.
10000
- ARIA Alerts for Important Events: If something critical happens (like connection lost or an error), we use
role="alert"
region to immediately notify the user via screen reader. E.g., if internet goes out and cloud sync fails, an alert might say “Warning: offline, progress will save locally.” - No Flashing / Seizure Risks: We avoid any flashing visuals that could trigger seizures (WCAG guideline to not flash >3 times/sec in high intensity). Our interface is mostly static with subtle transitions, so likely fine.
- Keyboard Shortcuts: For advanced users (especially sighted power users or those who prefer keyboard shortcuts to voice), we might implement shortcuts (like Press L to go to Lesson menu, P to pause, etc.). If we do, we’ll ensure to document them and add
aria-keyshortcuts
attribute so screen readers can announce them. This is a nice-to-have that can speed up use without conflicting with normal typing (maybe only active when in menus, not during typing practice to avoid catching normal typing as shortcuts). - Form Inputs: If we have any forms (sign up, etc.), each input has a
<label>
oraria-label
. Error messages on forms are linked viaaria-describedby
and we use ARIArole="alert"
on them so they’re announced when appearing. - Testing with Diverse Users: Ultimately, guidelines are validated by user testing. We plan to test the UI with blind users (using screen readers), low-vision users (maybe those with partial sight), and possibly users with cognitive disabilities for simplicity feedback. Their feedback will inform tweaks to our UI wording, timing, etc. For example, an ADHD user might want the option to turn off voice chit-chat and get more direct instructions (so maybe a “concise mode” vs “friendly mode”). We can accommodate that.
By following these guidelines, we ensure DeepType’s UI is inclusive and user-friendly for our entire audience. Adhering to ARIA and accessibility standards isn’t just about compliance; it’s about creating a smoother experience. As one accessibility expert mantra says: “Build it right for the extreme users, and it will work even better for everyone.” We believe DeepType’s accessible design will not only empower blind users but also result in a generally well-designed product that anyone could use (for instance, a sighted person might appreciate the voice feedback when they switch to another window but still practice typing by listening).
All these considerations will be documented in our design system, and every developer on the project will be versed in using them. Accessibility is not a separate module but a thread running through every feature – from color choices to code structure with ARIA roles. This guarantees that when DeepType is launched, it sets a benchmark for accessible educational software.
Below is an outline of the DeepType codebase structure and the key files with their content. The code is heavily commented to explain functionality, following our style guidelines (including ASCII art headers for each file, and console logs for clarity during debugging and learning). This should serve as both the actual code and a learning aid for developers (including those who are visually impaired or have ADHD, as requested, through clear comments and structured sections).
Repository Structure:
DeepType/
├── README.md
├── package.json
├── public/
│ └── index.html # Main HTML file
├── src/
│ ├── app.js # Main application logic (front-end)
│ ├── styles.css # Global styles (high contrast, etc.)
│ ├── lessons.js # Lesson content and curriculum definitions
│ ├── voice.js # Voice input/output module
│ ├── adaptive.js # Adaptive learning logic
│ ├── ui.js # UI rendering and navigation (menus, focus management)
│ ├── storage.js # Data storage and Supabase integration
│ └── assets/ # (if any audio/image assets)
├── desktop/
│ └── tauri.conf.json # Config for Tauri desktop app (if applicable)
└── server/
└── server.js # Backend server (for API calls to OpenAI, if used)
(Note: Some of these files/modules could be combined in implementation, but are separated here for clarity of roles.)
Now, we’ll present the content of key files with explanatory comments and ASCII art headers.
This is the entry HTML page that loads the app. It sets up basic structure and accessibility attributes (lang, meta).
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>DeepType – AI Typing Tutor</title>
<!-- High contrast dark theme by default -->
<link rel="stylesheet" href="styles.css" />
</head>
<body>
<!-- Main application container -->
<div id="app" role="application" aria-label="DeepType application">
<!-- We will dynamically inject content here via app.js -->
</div>
<!-- Including the main script -->
<script src="app.js" type="module"></script>
</body>
</html>
Notes:
- The
role="application"
on the #app container tells screen readers that this is a complex web app, potentially adjusting how keyboard events are handled (for some SR, it might switch off virtual cursor mode when focused inside, expecting the app to manage focus). - We use
type="module"
for app.js, meaning we can use ES6 imports (assuming a bundler or modern browser environment). - The app content will be rendered dynamically by our JS (to allow easy updates of content as user navigates).
- We set
lang="en"
; if we support multiple languages, we might adjust that attribute dynamically or have separate pages per locale.
This is the main application script that ties together modules, handles initial load and global events.
/***********************************************
* _____ ______
* | __ \ | ___|
* | | | |_ _ ___ ___| |_ _ __ ___ ______
* | | | | | | |/ __/ _ \ _| '__/ _ \|______|
* | |__| | |_| | (_| __/ | | | | (_) |
* |_____/ \__,_|\___\___\_| |_| \___/
*
* File: app.js
* Description: Main entry point for DeepType front-end.
* Initializes the application, handles navigation flow,
* and ties together UI, voice, lessons, and adaptivity.
***********************************************/
import * as UI from './ui.js';
import * as Voice from './voice.js';
import * as Lessons from './lessons.js';
import * as Adaptive from './adaptive.js';
import * as Storage from './storage.js';
// Global state
let currentLesson = null;
let userProfile = null;
let singleSwitchMode = false;
// Initialize application
document.addEventListener('DOMContentLoaded', () => {
console.log("DeepType app initializing...");
Storage.init(); // initialize storage (e.g., Supabase or local)
userProfile = Storage.loadProfile(); // attempt to load user profile if exists
// Determine if single switch mode should be default (could be from profile or query param)
singleSwitchMode = userProfile?.preferences?.singleSwitch || false;
// Setup voice output system
Voice.initTTS();
// Attempt to init voice recognition (will check for support)
Voice.initSTT(onVoiceCommand);
// Render the main menu
UI.renderMainMenu(singleSwitchMode);
Voice.speak("Welcome to DeepType. Press Enter or say 'start' to begin your first lesson.");
// Setup global keyboard handlers
setupKeyboardControls();
});
// Handle voice commands globally
function onVoiceCommand(command) {
console.log("Voice command heard:", command);
// Normalize command text
if (!command) return;
command = command.toLowerCase();
if (command.includes('start')) {
UI.startLessonFlow();
} else if (command.includes('repeat')) {
UI.repeatCurrentPrompt();
} else if (command.includes('menu') || command.includes('exit')) {
UI.renderMainMenu(singleSwitchMode);
} else if (command.includes('next')) {
// This could be used to skip or simulate pressing the "next" button in single-switch scanning
UI.focusNextOption();
} else if (command.includes('help')) {
UI.showHelp();
} else {
Voice.speak("Sorry, I didn't catch that. Try saying 'repeat' or 'menu'.");
}
}
// Setup keyboard controls for single-switch and general navigation
function setupKeyboardControls() {
document.body.addEventListener('keydown', (e) => {
if (UI.isInputActive()) {
// If the user is currently typing in a lesson input field, don't intercept normal typing
return;
}
// Keyboard shortcuts and navigation
if (e.key === ' ' && singleSwitchMode) {
// Space in single-switch mode: select or progress
if (!UI.handleSwitchSelect()) {
UI.focusNextOption();
}
e.preventDefault();
} else if (e.key === 'ArrowRight' || e.key === 'ArrowDown') {
UI.focusNextOption();
e.preventDefault();
} else if (e.key === 'ArrowLeft' || e.key === 'ArrowUp') {
UI.focusPreviousOption();
e.preventDefault();
} else if (e.key === 'Enter') {
// Enter triggers default action (like activating focused button)
UI.activateFocusedOption();
e.preventDefault();
} else if (e.key === 'Escape') {
UI.renderMainMenu(singleSwitchMode);
e.preventDefault();
} else if (e.key === 'F1') {
// F1 for help
UI.showHelp();
e.preventDefault();
}
// (We could add more shortcuts if needed)
});
}
Highlights & Explanation:
- We import modules for UI, Voice, Lessons, Adaptive logic, and Storage (Supabase or local).
- We keep a global
singleSwitchMode
flag. This can be set from user preferences. - On DOMContentLoaded, we initialize storage (which might setup Supabase or retrieve from localStorage if offline).
Voice.initTTS()
might load voices, etc., andVoice.initSTT(onVoiceCommand)
will set up speech recognition and callonVoiceCommand
when a phrase is recognized.- We immediately render main menu via
UI.renderMainMenu
and prompt user with a welcome voice. (We assume UI module manipulates DOM to show some menu). - We attach a global keydown listener. If an input is active (meaning user is typing in a practice exercise, which we detect via UI state), we let keys through (so the user can type letters normally).
- We define keyboard navigation: spacebar triggers either select or next in single-switch mode. We check
UI.handleSwitchSelect()
– maybe it returns true if it selected an item, false if it just opened a menu, then we call focusNext for scanning. - Arrow keys allow navigation too (for users who have full keyboard).
- Enter triggers activation of a focused element (like pressing a button).
- Escape brings back main menu (like a way to cancel or exit current context).
- F1 triggers a help overlay.
onVoiceCommand
normalizes recognized text and routes commands. We look for keywords like 'start', 'repeat', 'menu', etc.- For 'start', we call
UI.startLessonFlow()
– that presumably picks the appropriate lesson and transitions to lesson interface. - 'repeat' triggers UI to repeat current prompt (like if user says repeat during a lesson, it re-voices the instruction).
- 'menu' or 'exit' goes back to main menu.
- 'next' might simulate focus next for scanning or skip something.
- 'help' shows help.
- If command is unrecognized, we ask again.
- For 'start', we call
- This design allows adding more voice commands easily by extending that if-else (or changing to a mapping).
We included lots of console.log
statements for debugging (to track commands etc.). Those logs are helpful in development or if a coder wants to see what’s happening (especially for an ADHD coder who might benefit from stepping through logs to maintain focus).
ASCII art at the top identifies the file and provides a quick summary.
This module manages the DOM updates and navigation. It uses ARIA roles and keeps track of focus for keyboard/switch control.
/***********************************************
* _ _ _
* | | | (_)
* | | | |_ ___ ___
* | | | | / __|/ __|
* | |__| | \__ \ (__
* \____/|_|___/\___|
*
* File: ui.js
* Description: Handles User Interface rendering and navigation.
* Manages menus, focus (for single-switch), and lesson display.
***********************************************/
// Import other needed modules (assuming circular deps resolved by careful use)
import { speak } from './voice.js';
import { currentLesson } from './app.js'; // If needed, or we pass state into functions
// Track focused element index for menus
let focusIndex = 0;
let currentMenuOptions = [];
// Helper: remove existing content
function clearAppContainer() {
const app = document.getElementById('app');
app.innerHTML = '';
focusIndex = 0;
currentMenuOptions = [];
}
// Render main menu
export function renderMainMenu(singleSwitchMode) {
clearAppContainer();
const app = document.getElementById('app');
// Create title
const title = document.createElement('h1');
title.innerText = 'DeepType';
title.className = 'title';
app.appendChild(title);
// Menu container with role=menu
const menu = document.createElement('div');
menu.setAttribute('role', 'menu');
menu.id = 'mainMenu';
app.appendChild(menu);
// Define menu options
const options = [
{ text: 'Start Lesson', action: startLessonFlow },
{ text: 'Settings', action: openSettings },
{ text: 'Help', action: showHelp },
{ text: 'Exit', action: exitApp }
];
currentMenuOptions = options;
options.forEach((opt, index) => {
const btn = document.createElement('button');
btn.innerText = opt.text;
btn.className = 'menuItem';
btn.setAttribute('role', 'menuitem');
btn.setAttribute('tabindex', index === 0 ? '0' : '-1'); // Only first is tab-able initially
// When focused or clicked, call the action
btn.addEventListener('click', opt.action);
btn.addEventListener('focus', () => { focusIndex = index; });
menu.appendChild(btn);
});
// Set initial focus
const firstButton = menu.querySelector('button');
if (firstButton) firstButton.focus();
console.log("Main menu rendered. Options:", options.map(o => o.text));
speak("Main menu. Options: Start Lesson, Settings, Help, Exit.");
}
// Called when user activates "Start Lesson"
export function startLessonFlow() {
console.log("Starting lesson flow...");
// Determine which lesson to start: if userProfile exists and has progress, pick next; else Lesson 1.
const lessonToStart = /* logic to pick lesson, e.g., Lessons.getNextLesson(userProfile.progress) */ Lessons.getLesson(0);
if (lessonToStart) {
currentLesson = lessonToStart;
renderLesson(currentLesson);
} else {
speak("No lesson available.");
}
}
// Render a lesson interface
export function renderLesson(lesson) {
clearAppContainer();
const app = document.getElementById('app');
// Create heading for lesson (for low-vision users)
const lh = document.createElement('h2');
lh.innerText = lesson.title;
lh.className = 'lessonTitle';
app.appendChild(lh);
// Create a prompt area
const prompt = document.createElement('div');
prompt.id = 'prompt';
prompt.setAttribute('aria-live', 'polite'); // so updates are read
prompt.innerText = ''; // will be filled
app.appendChild(prompt);
// Create an input area if needed (hidden text input to capture typing)
const hiddenInput = document.createElement('input');
hiddenInput.type = 'text';
hiddenInput.id = 'typingInput';
hiddenInput.setAttribute('aria-label', 'Typing input'); // label for SR
hiddenInput.style.position = 'absolute';
hiddenInput.style.opacity = '0';
app.appendChild(hiddenInput);
hiddenInput.focus(); // focus so keystrokes go here
// Listen for keystrokes in the lesson input
hiddenInput.addEventListener('input', onUserType);
// Kick off the first prompt of the lesson
speak(lesson.intro); // e.g., "Lesson 1. Place your fingers on home row..."
setTimeout(() => {
presentNextPrompt();
}, 1000);
}
// Keep track of current step within lesson
let currentStepIndex = 0;
function presentNextPrompt() {
const lesson = currentLesson;
if (!lesson) return;
const promptEl = document.getElementById('prompt');
const step = lesson.steps[currentStepIndex];
if (!step) {
lessonComplete();
return;
}
// Show the prompt text (if any) in large font for those who can see
promptEl.innerText = step.promptText || '';
// Speak the prompt (if it's a letter, maybe speak differently vs word)
const toSpeak = step.audioPrompt || step.promptText;
speak(toSpeak);
// Expect the user's input (the onUserType handler will check it)
}
// Handle user typing in lesson
function onUserType(e) {
const input = e.target;
const lesson = currentLesson;
const step = lesson.steps[currentStepIndex];
const expected = step.expectedInput; // e.g., 'F' or 'cat'
const userInput = input.value;
// For single-character inputs, we check immediately.
if (step.type === 'char') {
const char = userInput.slice(-1); // last typed char
if (!char) return;
if (char.toLowerCase() === expected.toLowerCase()) {
console.log(`Correct key pressed: ${char}`);
speak("Correct"); // positive feedback
currentStepIndex++;
input.value = '';
setTimeout(presentNextPrompt, 500);
} else {
console.log(`Incorrect key. Expected ${expected}, got ${char}`);
speak(`Oops, that was ${char}. Try again.`);
input.value = '';
// (We don't advance stepIndex, user will try again same prompt)
}
} else if (step.type === 'word') {
// If expecting a word, we might wait for space or enter
if (userInput.endsWith(' ')) {
const attempt = userInput.trim();
if (attempt.toLowerCase() === expected.toLowerCase()) {
console.log(`Correct word typed: ${attempt}`);
speak("Good job, that was correct.");
currentStepIndex++;
} else {
console.log(`Incorrect word. Expected ${expected}, got ${attempt}`);
speak(`You typed ${attempt}, but the word was ${expected}. We'll practice it again.`);
// maybe repeat same step or push it later for practice
Adaptive.recordMistake(expected);
}
input.value = '';
setTimeout(presentNextPrompt, 500);
}
}
// else: other types (sentences etc.) similar approach
}
// Repeat current prompt (for voice command or button)
export function repeatCurrentPrompt() {
const lesson = currentLesson;
if (!lesson) return;
const step = lesson.steps[currentStepIndex];
if (step) {
const toSpeak = step.audioPrompt || step.promptText;
speak(`I said: ${toSpeak}`);
}
}
// Mark lesson as complete
function lessonComplete() {
speak("Lesson complete! Great work.");
// Ideally record progress
Storage.saveProgress(currentLesson.id);
// Then return to menu or next lesson
currentLesson = null;
setTimeout(() => {
renderMainMenu(false);
}, 2000);
}
// Navigation helpers
export function focusNextOption() {
const menu = document.getElementById('mainMenu');
if (!menu) return;
const items = menu.querySelectorAll('button');
if (items.length === 0) return;
focusIndex = (focusIndex + 1) % items.length;
items[focusIndex].focus();
speak(items[focusIndex].innerText); // announce the newly focused option
}
export function focusPreviousOption() {
const menu = document.getElementById('mainMenu');
if (!menu) return;
const items = menu.querySelectorAll('button');
if (items.length === 0) return;
focusIndex = (focusIndex - 1 + items.length) % items.length;
items[focusIndex].focus();
speak(items[focusIndex].innerText);
}
export function activateFocusedOption() {
const menu = document.getElementById('mainMenu');
if (!menu) return;
const items = menu.querySelectorAll('button');
if (items[focusIndex]) {
items[focusIndex].click();
}
}
// For single switch:
// handleSwitchSelect returns true if an action was taken (like a menu item activated)
export function handleSwitchSelect() {
const menu = document.getElementById('mainMenu');
if (menu) {
// if menu open, Space acts as Enter on the focused option
activateFocusedOption();
return true;
}
// If no menu (maybe in lesson), we could treat Space as some other action, but by default:
return false;
}
// Dummy implementations of other menu actions:
export function openSettings() {
speak("Settings is not implemented yet.");
// would render settings UI with options (like toggle singleSwitchMode)
}
export function showHelp() {
speak("Help: You can say commands like start, repeat, menu. During lessons, type the requested keys. Press Escape to return to menu.");
}
export function exitApp() {
speak("Exiting. Goodbye!");
console.log("Exiting DeepType app.");
// In a web app, we might just reload or clear, in a desktop app, possibly close window.
// For now:
clearAppContainer();
const msg = document.createElement('p');
msg.innerText = "You have exited the app. Refresh to restart.";
document.getElementById('app').appendChild(msg);
}
Highlights & Explanation:
renderMainMenu
creates a menu withrole="menu"
and each button asrole="menuitem"
. We managetabindex
such that initially only the first item is focusable (to avoid tabbing into others without our control).- We store
currentMenuOptions
to map index to action. - We focus the first button and speak the menu options. We call
speak()
with a summary of options. - For each button, we attach focus and click events. On focus, we update
focusIndex
(so we know which index is focused for arrow key logic). - The
startLessonFlow
function gets a lesson (from Lessons module) and callsrenderLesson
. renderLesson
builds the lesson UI: it shows a title, a prompt area (witharia-live
so screen reader will read any changes politely).- It also adds a hidden text input to capture typing. This input is positioned off-screen (opacity 0, could also use
left: -999px
etc.) to not distract sighted users, but it's needed to capture key events reliably. We label it for SR as "Typing input" (though the SR user will primarily rely on our voice, but if they use their own SR, it might announce when they type). - We immediately focus this hidden input so that all keypresses go there without user needing to click anything (they just start typing).
- We attach
onUserType
to itsinput
event to handle typing. - We then start the lesson by speaking an intro and calling
presentNextPrompt()
. presentNextPrompt
sets the prompt text (for low-vision users to see) and speaks it. It useslesson.steps
array with each step havingpromptText
,audioPrompt
,expectedInput
, etc. If no more steps, callslessonComplete()
.onUserType
logic: if expecting a character:- It grabs the last typed char from input (since we keep input field value).
- If correct, give feedback and move to next step.
- If wrong, clear input, speak feedback, and stay on same step (so user can try again). Also maybe record mistake for adaptivity.
- If expecting a word:
- We wait until user types space (meaning they finished the word). Then check correctness, give appropriate response, and move on or repeat.
- Note: This simplistic approach uses space as terminator. Alternatively, we might have them press Enter to submit a word. But space is used to separate words if typing a sentence, so this logic might need refinement to handle multi-word inputs. For now, single-word training is fine.
- The code calls
Adaptive.recordMistake(expected)
when a word is mistyped – presumably to log that for later practice. (We haven’t written adaptive.js yet, but it would provide such function.) repeatCurrentPrompt
will re-speak the current prompt with a prefix "I said: ...".lessonComplete
speaks completion, saves progress via Storage, and returns to main menu after a short delay.- Navigation helpers:
focusNextOption
andfocusPreviousOption
move the focusIndex and focus the corresponding button in the menu. They also callspeak()
to announce the option text (so a blind user knows which option is now focused).activateFocusedOption
simulates pressing enter on the current focus by triggering the button’s click event.handleSwitchSelect
ties into our app-level key handler: if menu open, space triggers activation (and returns true meaning it consumed the press). If no menu, maybe in a lesson context, we could have other logic (not implemented here).
- Basic
openSettings
,showHelp
,exitApp
are stubs for now.showHelp
voices some instructions. Ideally, it would present a nice overlay with all commands (with an accessible format). - We included console logs to track what's being rendered, key correctness, etc., which is useful for debugging and also if a developer runs this in dev tools, they can see these logs for understanding flow.
We ensure important interactive elements have ARIA roles and labels:
- The prompt has
aria-live="polite"
to announce changes. - The hidden input has a label (though if using their own screen reader, they might not need it since they won't focus it consciously).
- Buttons have roles and text (text is visible so acts as label as well).
- If we had any icon-only or dynamic text, we'd label them.
Manages text-to-speech (TTS) and speech-to-text (STT). We abstract these so we can swap between Web Speech API or other backends. For simplicity, we’ll try Web Speech API and fallback to none if unsupported.
/***********************************************
* __ ___ ____ _____ _ ____ ___
* \ \ / / | | _ \_ _|/ \ / ___|_ _|
* \ \ / /| | | |_) || | / _ \| | | |
* \ V / | |__| __/ | |/ ___ \ |___ | |
* \_/ |_____|_| |_/_/ \_\____|___|
*
* File: voice.js
* Description: Voice input (STT) and output (TTS) management.
* Provides speak() for TTS and sets up speech recognition for voice commands.
***********************************************/
// Text-to-Speech (TTS)
let synth;
let voice;
export function initTTS() {
synth = window.speechSynthesis;
if (!synth) {
console.warn("TTS not supported in this browser.");
return;
}
// Optional: choose a specific voice (e.g., a female English voice)
const voices = synth.getVoices();
// find an English voice
voice = voices.find(v => v.lang.startsWith('en') && v.name.includes('Google')) || voices[0];
console.log("TTS initialized. Using voice:", voice ? voice.name : 'default');
}
export function speak(text) {
if (!window.speechSynthesis) {
return; // no TTS available
}
if (synth.speaking) {
synth.cancel(); // stop current speech if any (to avoid overlap)
}
const utter = new SpeechSynthesisUtterance(text);
utter.rate = 0.9; // slightly slower for clarity
utter.pitch = 1;
utter.volume = 1;
if (voice) utter.voice = voice;
synth.speak(utter);
}
// Speech-to-Text (STT) for voice commands
let recognition;
export function initSTT(commandCallback) {
if (!('webkitSpeechRecognition' in window || 'SpeechRecognition' in window)) {
console.warn("Speech Recognition not supported in this browser.");
return;
}
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.continuous = true; // continuous listening
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.onresult = (event) => {
const transcript = event.results[event.results.length - 1][0].transcript.trim();
console.log("STT result:", transcript);
commandCallback(transcript);
};
recognition.onerror = (event) => {
console.error("Speech Recognition error:", event.error);
// If continuous listening stops due to error, attempt restart
if (event.error === 'not-allowed' || event.error === 'service-not-allowed') {
console.warn("Microphone access denied or STT not allowed.");
speak("Voice commands are not available.");
} else {
// try to restart recognition on network or other errors
try { recognition.start(); } catch(e) {}
}
};
recognition.onend = () => {
console.log("STT onend fired, restarting...");
// automatically restart listening unless intentionally stopped
try { recognition.start(); } catch(e) {}
};
try {
recognition.start();
console.log("Speech recognition started.");
// Provide an audible cue that voice commands are active (if desired)
} catch (e) {
console.error("Speech recognition couldn't start:", e);
}
}
Highlights & Explanation:
-
initTTS()
sets up speechSynthesis, picks a voice.getVoices()
might be empty initially until onvoiceschanged event fires; in real code we might wait or call it after a slight timeout. But for brevity, we take what's available. -
speak(text)
uses the Web Speech API TTS to speak. It cancels any ongoing speech to avoid overlap (so if multiple calls happen quickly, we speak the latest). -
Rate is set to 0.9 (slightly slower than default 1.0) for clarity. This might be user-adjustable later.
-
initSTT(commandCallback)
uses webkitSpeechRecognition if available. It's experimental but works in Chrome. -
We set continuous listening with no interim results (we only care about final commands).
-
On result, we extract the transcript and pass it to the callback provided (which in app.js is
onVoiceCommand
). -
On error:
- If not allowed, we notify user (lack of mic permission).
- If other errors (like network), we attempt to restart
recognition.start()
.
-
On end: we auto-restart to keep listening (unless it was purposely stopped). This ensures continuous voice command listening.
-
We wrap
recognition.start()
in try-catch as it can throw if called at bad times or user hasn't interacted (Chrome might require user gesture to start audio capture). -
We log starting, and maybe would beep or speak to indicate voice command readiness. (For now, just log).
-
We keep
recognition
global so we could implement a stop function if needed (not shown here, but if user pauses voice commands). -
This design ensures the app is always listening for voice commands in the background while the user is practicing (which is what we want for "repeat" etc., but caution: the recognizer might pick up the TTS voice. That’s a known challenge. We might need to tune it by e.g. only listening during silent periods or using a push-to-talk style. This code might erroneously transcribe the tutor’s spoken instructions as commands, which is an issue. One approach: temporarily stop recognition when we speak instructions, or use a VAD (voice activity detection) to pause recognition while TTS is active. For brevity, not handled here, but a real implementation should address it. Alternatively, keep commands distinct enough that instructions wouldn’t trigger them.)
-
ASCII art spells "VOICE".
Contains lesson definitions and possibly logic to fetch next lesson.
/***********************************************
* _ _
* | | ___ ___ ___| |_
* | |__/ _ \/ __/ __| __|
* |____\___/\__\__ \ |_
* |___/\__|
*
* File: lessons.js
* Description: Defines lesson content and provides accessors.
* Each lesson contains a title, intro text, and a sequence of steps.
***********************************************/
export const lessons = [
{
id: 0,
title: "Lesson 1: Home Row (F and J)",
intro: "Lesson 1. Let's start with the home row keys F and J. " +
"Place your left index on F, right index on J. Now, type the letter F when you hear it.",
steps: [
{ type: 'char', promptText: "Type F", audioPrompt: "Press the F key.", expectedInput: "F" },
{ type: 'char', promptText: "Type J", audioPrompt: "Now press the J key.", expectedInput: "J" },
{ type: 'char', promptText: "Type F", audioPrompt: "Again, press F.", expectedInput: "F" },
{ type: 'char', promptText: "Type J", audioPrompt: "Again, press J.", expectedInput: "J" },
// ... more practice
{ type: 'word', promptText: "Type 'fj'", audioPrompt: "Now type F J as a two-letter word.", expectedInput: "fj" }
]
},
{
id: 1,
title: "Lesson 2: Home Row (D and K)",
intro: "Lesson 2. Next, we'll add D and K.",
steps: [
// Similar structure...
]
}
// Additional lessons...
];
// Function to get a lesson by id or index
export function getLesson(id) {
return lessons.find(lsn => lsn.id === id) || null;
}
// Placeholder: pick next lesson for a user given their progress
export function getNextLesson(progress) {
// If progress is an array of completed lesson IDs:
for (let lsn of lessons) {
if (!progress || !progress.includes(lsn.id)) {
return lsn;
}
}
return null;
}
Highlights & Explanation:
- We define a simple array of lesson objects. Each has:
id
,title
,intro
(what to speak at start), andsteps
.- Each step has
type
(could be 'char', 'word', 'sentence', etc.),promptText
for display,audioPrompt
for what voice says (if we want a different wording from what’s displayed), andexpectedInput
.
- In lesson1, we practice F and J individually and then as a pair.
getLesson(id)
returns the lesson object (to start a specific lesson).getNextLesson(progress)
would normally use stored progress to find the next uncompleted lesson. Here, we just loop through lessons and return the first not in completed progress array.- For progress tracking,
Storage.saveProgress
would likely push completed ID into some array and save for user. - More lessons can be added following this structure.
- We might later use adaptivity to alter steps or insert new steps, but base curriculum is here.
Handles adaptive logic (for now maybe just collects mistakes, could later adjust lesson order).
/***********************************************
* _ _ _
* / \ ___ ___| |_(_)_ __
* / _ \ / __/ __| __| | '_ \
* / ___ \ (__\__ \ |_| | |_) |
* /_/ \_\___|___/\__|_| .__/
* |_|
* File: adaptive.js
* Description: Adaptive learning utilities.
* Records mistakes and can modify or suggest extra practice.
***********************************************/
// Simple structure to record mistakes frequency
const mistakeCount = {};
// Record a mistake for a particular key or word
export function recordMistake(item) {
mistakeCount[item] = (mistakeCount[item] || 0) + 1;
console.log(`Adaptive: recorded mistake for "${item}". Total mistakes: ${mistakeCount[item]}`);
// (We could decide to dynamically insert a review step for this item)
}
// Suggest an extra practice step if certain threshold reached
export function maybeInjectPractice(currentLesson) {
// For example, if any key has >3 mistakes, inject a practice step after current one:
for (let item in mistakeCount) {
if (mistakeCount[item] >= 3) {
// Find if not already in current lesson steps soon:
const practiceStep = {
type: item.length === 1 ? 'char' : 'word',
promptText: `Type ${item}`,
audioPrompt: `Let's practice ${item}`,
expectedInput: item
};
console.log(`Adaptive: injecting extra practice for "${item}"`);
// Insert practice step after current index (assuming we have access to the index here or handle differently).
// This is a simplistic approach; a more robust system might queue practice for next lesson.
return practiceStep;
}
}
return null;
}
Highlights & Explanation:
- We use a simple dictionary
mistakeCount
to count errors per item (could be letter or word). recordMistake(item)
increments count and logs it.maybeInjectPractice(currentLesson)
is a hook that could be called at certain times (e.g., after finishing a lesson or between steps) to decide if we should insert an extra step.- Here, if any item has 3 or more mistakes, we prepare a practice step focusing on it.
- This injection logic is simplistic: ideally, we might insert into the lesson flow. But doing that on the fly could complicate
presentNextPrompt
. Perhaps easier: schedule it for next lesson, or just alert the user. - But the code indicates returning a practiceStep, which the calling code (if integrated in onUserType maybe) could handle by inserting into lesson.steps or something.
- Right now, nothing calls
maybeInjectPractice
in our UI. We might integrate it at lessonComplete or somewhere if needed. - The adaptivity now is very minimal. In a more complex version, we might adjust difficulty or skip ahead if user is doing well.
Manages saving/loading from localStorage or Supabase (depending on environment). For simplicity, a local stub is shown:
/***********************************************
* ____ _______ _ ____ _____ ___ ___
* / ___|_ _\ \ / \ / ___|_ _|_ _/ _ \
* | | | | \ \/ _ \| | | | | | | | |
* | |___ | | | | (_) | |___ | | | | |_| |
* \____| |_| |_|\___/ \____| |_| |___\___/
*
* File: storage.js
* Description: Handles saving/loading user data.
* If online, uses Supabase; otherwise falls back to localStorage.
***********************************************/
// Supabase client if needed (assuming we have included supabase script or using @supabase/supabase-js)
let supabase = null;
const SUPABASE_KEY = '<your-supabase-key>';
const SUPABASE_URL = '<your-supabase-url>';
export function init() {
if (window.supabase) {
supabase = window.supabase.createClient(SUPABASE_URL, SUPABASE_KEY);
console.log("Supabase initialized.");
} else {
console.log("Supabase not loaded, using local storage only.");
}
}
// Load user profile from local storage (or supabase if logged in)
export function loadProfile() {
try {
const data = localStorage.getItem('deeptype_profile');
if (data) {
const profile = JSON.parse(data);
console.log("Loaded profile from local storage:", profile);
return profile;
}
} catch (e) {
console.error("Failed to load profile:", e);
}
return null;
}
// Save progress (simple: record lesson completed)
export function saveProgress(lessonId) {
let profile = loadProfile() || { progress: [] };
if (!profile.progress.includes(lessonId)) {
profile.progress.push(lessonId);
}
try {
localStorage.setItem('deeptype_profile', JSON.stringify(profile));
console.log(`Progress saved. Completed lessons: ${profile.progress}`);
} catch (e) {
console.error("Failed to save progress:", e);
}
// If using Supabase and logged in, we'd also send update to DB here
}
// Other storage functions like saveSettings, etc., could be added similarly.
Highlights & Explanation:
- On
init
, if the Supabase library is available, create a client (with placeholders for URL/KEY which should be configured). loadProfile
tries to get from localStorage and parse. (In a real scenario, if user is logged in with Supabase Auth, we’d fetch from DB).saveProgress
updates a local profile’s progress array and writes it back to localStorage. Also logs it.- If supabase was connected and user logged, we would call something like
supabase.from('profiles').update({progress: profile.progress}).eq('id', user.id)
etc. - But offline usage is fine with localStorage.
- We can add other settings similarly (like singleSwitch preference).
- This is simplified for demonstration.
With these modules, the codebase provides a full loop:
- App starts -> main menu -> lesson -> interactive typing -> adaptive feedback -> back to menu.
- The code is annotated so that developers can understand what each part does.
- ASCII art headers help a developer locate files quickly (especially in a printed or linear reading scenario, the big text stands out).
- Console logs and comments provide a narrative of the program flow, which can help someone learning or debugging the system.
Console Logs & Debugging Enhancements:
We h
8B42
ave used console.log
generously to mark major events (initialization, rendering, correctness of input, etc.). In a debug mode, these logs could be turned on to follow what's happening step by step:
- E.g., "Correct key pressed: F" tells us input handling logic worked for that case.
- "Main menu rendered. Options: ..." confirms UI built properly.
- "STT result: repeat" shows what the speech recognizer picked up.
For an educational coding perspective, one could run this in a browser console and watch these logs to see the internal state changes. This is beneficial for developers new to this code or even for advanced users curious about the internals.
We can easily extend or adapt this structure:
- If building with a framework (React), the structure might differ (e.g., using components and state instead of directly manipulating DOM), but the logical separation (UI, voice, data) remains similar.
- We kept it framework-agnostic for clarity.
Finally, note that we ignore image embedding or charts as requested. All output here is purely textual or via spoken text (for the app itself). If we had images, we included alt text approach.
Conclusion: The above code and documentation present a complete picture of DeepType’s implementation approach. From the executive vision down to code level, we’ve aligned everything with the goal of an accessible, intelligent typing tutor. The structure can be built upon as we add more features (for instance, integrating the conversational AI in the future might add another module, or hooking up Supabase in earnest once keys are set, etc.).
Each code file starts with an ASCII art banner to make it fun and to break the monotony (which helps ADHD coders by chunking sections clearly). Screen reader users can skip over the ASCII art (since it’s in a comment, it won’t be read aloud) and get straight to the content; the art is more for visual appeal in code editors.
We have thus provided a comprehensive dossier that covers vision, design, strategy, and technical implementation for DeepType by Empathy Labs. By following this plan, the DeepType project is well-positioned to succeed and make a meaningful impact in assistive education, demonstrating how thoughtful use of AI and UX design can create inclusive technology for all.
URL: https://lovable.dev/projects/6c141cc3-da58-4115-8f0a-4af77342f723
There are several ways of editing your application.
Use Lovable
Simply visit the Lovable Project and start prompting.
Changes made via Lovable will be committed automatically to this repo.
Use your preferred IDE
If you want to work locally using your own IDE, you can clone this repo and push changes. Pushed changes will also be reflected in Lovable.
The only requirement is having Node.js & npm installed - install with nvm
Follow these steps:
# Step 1: Clone the repository using the project's Git URL.
git clone <YOUR_GIT_URL>
# Step 2: Navigate to the project directory.
cd <YOUR_PROJECT_NAME>
# Step 3: Install the necessary dependencies.
npm i
# Step 4: Start the development server with auto-reloading and an instant preview.
npm run dev
Edit a file directly in GitHub
- Navigate to the desired file(s).
- Click the "Edit" button (pencil icon) at the top right of the file view.
- Make your changes and commit the changes.
Use GitHub Codespaces
- Navigate to the main page of your repository.
- Click on the "Code" button (green button) near the top right.
- Select the "Codespaces" tab.
- Click on "New codespace" to launch a new Codespace environment.
- Edit files directly within the Codespace and commit and push your changes once you're done.
This project is built with .
- Vite
- TypeScript
- React
- shadcn-ui
- Tailwind CSS
Simply open Lovable and click on Share -> Publish.
We don't support custom domains (yet). If you want to deploy your project under your own domain then we recommend using Netlify. Visit our docs for more details: Custom domains