Technical background
Table of contents
Used technologies
Application itself uses a lot of technologies to implement many of its parts. For example it utilises localhost sockets communication for the communication between main app and AI module. This page will try to explain all the technical approaches that were used in the development.
Used tech |
Explanation |
App |
|
|
Main backend for app behavior |
|
Main code runtime |
|
Language used for backend |
|
Used for resource intensive computations |
AI module |
|
|
ASR * |
|
NLP * |
|
TTS * |
|
Inference for Whisper model |
|
Inference for VITS model |
* ASR - Automatic Speech Recognition
* NLP - Natural Language Processing
* TTS - Text To Speech synthesis
Project schematic
Project itself is divided into several branches and structures, so that the it would be modular and easy to expand. It currently uses many technologies/providers for the documentation and versioning, for example: Huggingface (Repository for SEDAS-whisper model weights, link), Github (Repositories and organisational page for the whole SEDAS ecosystem link), ReadTheDocs (Hosting the documentation for the whole project). The whole code is 100% open-source and every part of it is available on the Github organisation page. All the major subprojects of this ecosystem are arranged below.
Major SEDAS projects and their licenses
App schematic
App itself is divided into several modules, that are connected together using several communication mechanismus (See below):
IPC (Inter Process Communication) - A protocol for the communication between frontend and backend. This is a very important communication mechanism, because it allows app to send signals to backend when they are triggered in user GUI and vice versa.
Worker threads - This allows app to utilize its nonblocking architecture. These are primarily implemented in simulation time management, backup saving. Primarily this is used in methods, that could potentialy take a lot of time and block the app from responding properly.
MSC (Module Socket Communication) - A protocol that is implemented in the communication between app modules and main backend. Most of the modules are written in C++ and are programmed to be running independently. The motivation to make modules behave like this, was to make module testing easier (
CMakeconfigurations +invokelibrary) and also allowing app to run smoothly without the module blocking.
Neural networks
AI module is structured accordingly. We have to PTT (Push To Talk) signal, that is invoked on the ATCo GUI. This signals the start of the ATCo voice recording. Using another PTT signal, we stop the voice from recording, which is then converted to Wavefile format that is then sent to the ASR model (Whisper).
The raw transcription is then programmaticaly processed (getting rid of timestamps, etc.) and then sent to the Rule-based NLP mechanism. This mechanism separated callsign, command and value from the transcription. We then check the callsign with the pseudopilot database (i. e. if specific pseudopilot exists). If yes, whe then send a signal change to the plane database to set new heading according to command and value. After that, pseudopilot generates a response that is then sent to TTS model that generates a wavefile. That wavefile is then played using the system player.
Note
System currently supports only the Pipewire audio system. Unfortunately, porting to generic audio system that could be cross platform is still in development. Github issue.
Plane/Environment calculations
Note
Add some explanation