This repository provides a comprehensive guide and resources related to the integration of specialized AI tools, including OpenAI's Whisper, ChatGPT, and Amazon Polly, as discussed in our research paper.
- Overview
- System Architecture
- Integration Process
- API Details
- Configuration and Preprocessing
- Sequence Diagram
- Codebase
- License
The iSafeCom system leverages advanced AI tools to facilitate transcription, translation, and text-to-speech functionalities. This repository aims to provide clarity on the integration process, ensuring technological reproducibility.
The system primarily integrates three tools:
- Whisper: Developed by OpenAI, it transcribes and translates user voice inputs.
- ChatGPT: Utilizes advanced NLP capabilities to generate appropriate responses.
- Amazon Polly: Converts text responses into natural-sounding voice outputs.
A detailed breakdown of the integration process is provided, showcasing the flow of data and interactions between the tools. The sequence diagram visually represents this flow, offering a clear understanding of the system's workings.
- Whisper API: Used for transcription and translation functionalities.
- ChatGPT API: Handles text-based responses and interactions.
- Amazon Polly API: Manages text-to-speech conversion.
Each tool requires specific configuration settings and preprocessing steps. These are elaborated upon in the respective sections, ensuring seamless integration.
- Whisper: Configuration settings are provided in
WhisperConfig.json
. Voice data undergoes specific formatting before processing. - ChatGPT: Configuration is detailed in
ChatGPTConfig.json
. - Amazon Polly: Audio data is formatted in WAV/MP3 for processing.
The sequence diagram provides a visual representation of the data flow and interactions between the integrated tools. It serves as a practical guide for understanding the system's architecture and functionalities.
The repository contains the following C# scripts detailing the system's functionalities:
- ChatGPT.cs: Handles interactions with the ChatGPT API, managing text-based responses.
- NpcDialog.cs: Manages dialogues and interactions with NPCs (Non-Player Characters) within the game environment.
- NpcInfo.cs: Contains information and attributes related to NPCs.
- TextToSpeech.cs: Interfaces with Amazon Polly for text-to-speech conversion.
- Whisper.cs: Manages voice input transcription and translation using the Whisper tool.
- WorldInfo.cs: Contains information about the game world, including environmental details and world attributes.
This work is licensed under a Creative Commons Attribution 4.0 International License. For more details, see the LICENSE file or visit Creative Commons Attribution 4.0 International License.