AI-Multimodal-Automation

This AI assistant integrates real-time speech recognition using Whisper ASR and dynamic visual input capture via OpenCV and mss to interpret and respond to user context with high accuracy. It utilizes LangChain in conjunction with 4o to generate intelligent, context-aware responses, while Text-to-Speech (TTS) system delivers natural voice output. The assistant features a multimodal fusion framework that synchronizes audio and visual data streams, enabling a richer understanding of user intent. Additionally, it incorporates conversational memory to sustain coherent multi-turn interactions, supporting fluid and natural human-computer communication. This architecture is designed for real-time operation, with the flexibility to scale into applications such as workflow automation, context-driven task execution, and adaptive user assistance.

to run: => create a virtual environment => python assistant.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
assistant.py		assistant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Multimodal-Automation

About

Uh oh!

Releases

Packages

Languages

niladrik2001/AI-Multimodal-Automation

Folders and files

Latest commit

History

Repository files navigation

AI-Multimodal-Automation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages