From 7dfa55926b4db1ef63ef7341ceea8e08bcb947f1 Mon Sep 17 00:00:00 2001 From: pluja <64632615+pluja@users.noreply.github.com> Date: Sat, 21 Mar 2026 09:45:43 +0200 Subject: [PATCH] Move speech under artificial intelligence --- README.md | 53 ++++++++++++++++++++--------------------------------- 1 file changed, 20 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index 8c898dc..2b1085c 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,7 @@ - [ChatGPT](#chatgpt) - [AI Coding](#ai-coding) - [Text To Speech](#text-to-speech) + - [Speech To Text](#speech-to-text) - [Image Generation](#image-generation) - [Bookmarking](#bookmarking) - [Book and web annotations](#book-and-web-annotationshighlights-management) @@ -109,9 +110,7 @@ - [Wikipedia](#wikipedia) - [YouTube](#youtube) - [Screen Recording](#screen-recording) -- [Speech to Text](#speech-to-text) - [Teamworking Tools](#teamworking-tools) -- [Text To Speech](#text-to-speech) - [Translation](#translation) - [Uncategorized](#uncategorized) - [Utilities](#utilities) @@ -272,9 +271,25 @@ When using cloud-based AI services, the data you input is often collected and st - [RooCode](https://github.com/RooCodeInc/Roo-Code) - Cline fork with some improvements. - [OpenCode](https://github.com/anomalyco/opencode/) - The open source coding agent. Connect local models or any providers of your choice. -#### Text To Speech +#### Text to Speech -Go to the [Text To Speech](#text-to-speech) section. +- [Kokoro FastAPI](https://github.com/remsky/Kokoro-FastAPI) - Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model w/CPU, ONNX and NVIDIA GPU support, handling, and auto-stitching. +- [MeloTTS](https://github.com/myshell-ai/MeloTTS) - a high-quality multi-lingual text-to-speech library by MIT and MyShell.ai. +- [Piper](https://github.com/rhasspy/piper) - A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. +- [Espeak](https://github.com/espeak-ng/espeak-ng) - eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents. Voices will sound rather robotic. + +#### Speech to Text + +- **Models** + - [Moonshine](https://github.com/moonshine-ai/moonshine) - Fast and accurate automatic speech recognition (ASR) for edge devices. + - [OpenAI Whisper](https://github.com/openai/whisper) - Whisper is a general-purpose speech recognition model that can be run locally offline. It can transcribe audio from and to multiple languages. + - [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model. + - [ParakeetTDT](https://parakeettdt.com/) - Efficient audio transcription. Convert speech to text with unprecedented speed and accuracy using NVIDIA advanced AI speech recognition model. + +- **Apps and services** + - [OpenWhispr](https://github.com/OpenWhispr/openwhispr) - Voice-to-text dictation and productivity app with AI agents, meeting transcription, notes, and local/cloud speech recognition. Privacy-first and available cross-platform. Open source alternative to wisprflow. + - [Sasayaki](https://github.com/pluja/sasayaki) - Tiny android dictation app that turns speech into clear writing. + - [Speaches](https://github.com/speaches-ai/speaches) - OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. #### Image Generation @@ -1437,35 +1452,6 @@ Odysee website contains some trackers and is a heavy site. You can use these alt [Back to top 🔝](#contents) -## Speech to Text - -### Models - -- [Moonshine](https://github.com/moonshine-ai/moonshine) - Fast and accurate automatic speech recognition (ASR) for edge devices. -- [OpenAI Whisper](https://github.com/openai/whisper) - Whisper is a general-purpose speech recognition model that can be run locally offline. It can transcribe audio from and to multiple languages. - - [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model. -- [ParakeetTDT](https://parakeettdt.com/) - Efficient audio transcription. Convert speech to text with unprecedented speed and accuracy using NVIDIA advanced AI speech recognition model. - -### Apps and services - -- [OpenWhispr](https://github.com/OpenWhispr/openwhispr) - Voice-to-text dictation and productivity app with AI agents, meeting transcription, notes, and local/cloud speech recognition. Privacy-first and available cross-platform. Open source alternative to wisprflow. -- [Sasayaki](https://github.com/pluja/sasayaki) - Tiny android dictation app that turns speech into clear writing. -- [Speaches](https://github.com/speaches-ai/speaches) - OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. - -[Back to top 🔝](#contents) - -## Text to Speech - -⛔ **Avoid** using tools that run on a 3rd party cloud. Generally you are sending your text and voice data to a 3rd party to process them, which could lead to leaking biometric data such as your voice, or sharing private and / or unnecessary text with the 3rd party. - -✅ **Instead use** -- [Kokoro FastAPI](https://github.com/remsky/Kokoro-FastAPI) - Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model w/CPU, ONNX and NVIDIA GPU support, handling, and auto-stitching. -- [MeloTTS](https://github.com/myshell-ai/MeloTTS) - a high-quality multi-lingual text-to-speech library by MIT and MyShell.ai. -- [Piper](https://github.com/rhasspy/piper) - A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. -- [Espeak](https://github.com/espeak-ng/espeak-ng) - eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents. Voices will sound rather robotic. - -[Back to top 🔝](#contents) - ## Translation ⛔ **Avoid** - Google Translate [![](https://shields.tosdr.org/en_217.svg)](https://tosdr.org/en/service/217) @@ -1569,6 +1555,7 @@ Such programs come filled with trackers and telemetry. You can get a full list o [Back to top 🔝](#contents) ## VPNs + ⛔ **Avoid** - [Free VPNs](https://techcrunch.com/2020/09/24/free-vpn-bad-for-privacy/) from Google Play or any appstore. These services are not free as they will suck your connections' data, keep logs and profile you to [sell your data to advertisers](https://thenextweb.com/news/be-cautious-free-vpns-are-selling-your-data-to-3rd-parties). If a government wants to track someone, such apps will be the first ones to fall.