You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You don't have to read the whole docs, [**here**](https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh) is an online AI agent to help you.
80
+
Meet any problem? Chat with our free online AI agent [**here**](https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh) to help you.
81
81
82
82
> **Note:** For Windows users with NVIDIA GPU, follow these steps before installation:
@@ -121,8 +121,8 @@ docker run -d -p 8501:8501 --gpus all videolingo
121
121
122
122
## APIs
123
123
VideoLingo supports OpenAI-Like API format and various TTS interfaces:
124
-
- LLM: `claude-3-5-sonnet-20240620`, `deepseek-chat(v3)`, `gemini-2.0-flash-exp`, `gpt-4o`, ... (sorted by performance)
125
-
- WhisperX: Run whisperX locally or use 302.ai API
124
+
- LLM: `claude-3-5-sonnet`, `gpt-4.1`, `deepseek-v3`, `gemini-2.0-flash`, ... (sorted by performance, be cautious with gemini-2.5-flash...)
125
+
- WhisperX: Run whisperX (large-v3) locally or use 302.ai API
126
126
- TTS: `azure-tts`, `openai-tts`, `siliconflow-fishtts`, **`fish-tts`**, `GPT-SoVITS`, `edge-tts`, `*custom-tts`(You can modify your own TTS in custom_tts.py!)
127
127
128
128
> **Note:** VideoLingo works with **[302.ai](https://gpt302.saaslink.net/C2oHR9)** - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!
@@ -133,13 +133,13 @@ For detailed installation, API configuration, and batch mode instructions, pleas
133
133
134
134
1. WhisperX transcription performance may be affected by video background noise, as it uses wav2vac model for alignment. For videos with loud background music, please enable Voice Separation Enhancement. Additionally, subtitles ending with numbers or special characters may be truncated early due to wav2vac's inability to map numeric characters (e.g., "1") to their spoken form ("one").
135
135
136
-
2. Using weaker models can lead to errors during intermediate processes due to strict JSON format requirements for responses. If this error occurs, please delete the `output` folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error.
136
+
2. Using weaker models can lead to errors during processes due to strict JSON format requirements for responses (tried my best to prompt llm😊). If this error occurs, please delete the `output` folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error.
137
137
138
138
3. The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages, as well as the impact of the translation step. However, this project has implemented extensive engineering processing for speech rates to ensure the best possible dubbing results.
139
139
140
140
4.**Multilingual video transcription recognition will only retain the main language**. This is because whisperX uses a specialized model for a single language when forcibly aligning word-level subtitles, and will delete unrecognized languages.
141
141
142
-
5.**Cannot dub multiple characters separately**, as whisperX's speaker distinction capability is not sufficiently reliable.
142
+
5.**For now, cannot dub multiple characters separately**, as whisperX's speaker distinction capability is not sufficiently reliable.
# *🔬 h264_nvenc GPU acceleration for ffmpeg, make sure your GPU supports it
45
+
ffmpeg_gpu: false
46
+
36
47
# *Youtube settings
37
48
youtube:
38
49
cookies_path: ''
@@ -49,8 +60,6 @@ subtitle:
49
60
# *Summary length, set low to 2k if using local LLM
50
61
summary_length: 8000
51
62
52
-
# *Number of LLM multi-threaded accesses, set to 1 if using local LLM
53
-
max_workers: 4
54
63
# *Maximum number of words for the first rough cut, below 18 will cut too finely affecting translation, above 22 is too long and will make subsequent subtitle splitting difficult to align
0 commit comments