Balabolka :: Utilidad para servicios en línea

The command line application allows to use online text-to-speech services: text files or subtitles can be converted to audio files. The utility can be used for testing purposes: it will help you to choose a cloud computing service that satisfies your needs. The separate application for Yandex SpeechKit is available for downloading, because Yandex is the Russian IT company with close government ties.

Online services with speech technologies:

Google Cloud TTS;
Amazon Polly;
Baidu TTS;
CereVoice Cloud;
Descript TTS;
IBM Watson TTS;
Iciba TTS;
iTranslate TTS;
Microsoft Azure;
Naver TTS;
OpenAI TTS;
Youdao TTS;
Yandex SpeechKit.

Download Balabolka (Online TTS Utility)

Tamaño: MB

Versión: Lista de cambios

Licencia: Gratuito (Freeware)

Sistema operativo:

Command Line Utility for Yandex SpeechKit Using: Descargar ( MB)
The program converts text or subtitles to audio files by using of the Yandex service.
To perform operations via the Yandex API, it is necessary to authenticate using an API-key.

Línea de comandos

The utility handles various command line parameters to be able to read text aloud or save as an audio file. The command line options use the syntax "bal4web [options ...]", all parameters must be separated by a space. Options can appear in any order on the command line so long as they are paired with their related parameters. Use the "bal4web -?" command line to get help on the command line syntax and parameters.

-s nombre_del_servicio: Sets the name of the online TTS service ("google" or "g", "amazon" or "a", "baidu" or "b", "cerevoice" or "c", "descript" or "d", "ibm" or "i", "iciba" or "k", "itranslate" or "t", "microsoft" or "m", "naver" or "n", "openai" or "o", "youdao" or "y"). The default is "google".
-l nombre_del_idioma: Sets the language name for the online TTS service. The name is a combination of an ISO 639 two-letter lowercase culture code associated with a language and an ISO 3166 two-letter uppercase subculture code associated with a country or region. For example: es-ES, de-DE, fr-FR. El valor predeterminado es "en-US".
Note: Descript TTS and OpenAI TTS perform the language identification for input text, so these services ignore the option now. These services can recognize several dozen languages on their own.
-g género: Sets the gender for the online TTS service (if supported). The available values: "female" or "f", "male" or "m". The default value is not defined. This parameter is supported by services: Amazon Polly, CereProc TTS, Descript TTS, Google TTS, IBM Watson TTS, iTranslate TTS, Microsoft Azure, Naver TTS, OpenAI TTS. If a voice name is specified, there is no need to set its gender.
-n nombre_de_voz: Sets the voice name for the online TTS service (if supported). The default value is not defined. This parameter is supported by services Amazon Polly, CereProc TTS, Descript TTS, Google Cloud TTS, IBM Watson TTS, Microsoft Azure, Naver TTS, OpenAI TTS.
-r velocidad_del_habla: Sets the rate of the synthesized speech (if supported).
El valor predeterminado es "1.0" (la velocidad media del habla).
Amazon Polly: from "0.20" to "2.00".
CereProc TTS: from "0.30" to "4.00".
Descript TTS, Naver TTS, OpenAI TTS, Youdao TTS: from "0.70" to "2.00".
Google TTS, IBM Watson TTS, Microsoft Azure: from "0.10" to "3.00".
Google Cloud: from "0.25" to "4.00".
iTranslate TTS: from "0.50" to "2.00".
-p entero: Sets the speaking pitch in a range of -20 to 20 (if supported). El valor predeterminado es 0.
This option is supported by Amazon Polly, CereProc TTS, Google Cloud TTS, IBM Watson TTS, Microsoft Azure.
-v entero: Establece el volumen en el rango de 0 a 200 (el valor predeterminado es 100).
-st estilo: Sets the voice-specific speaking style. The voice can express emotions like cheerfulness, empathy or calmness. This option is supported by some voices in Microsoft Azure. Styles are not available if the WebSocket protocol for Microsoft Azure is used.
--style-degree grado_del_estilo o -sd grado_del_estilo: Sets the intensity of the speaking style in a range of "0.01" to "2.00" (for styles supported by Microsoft Azure). The default is "1.00". The option allows to specify a stronger or softer style to make the speech more expressive or subdued.
-m: Prints the list of supported languages (genders and voices' names, if available) for the online TTS service.
-f archivo_de_texto: Establece el nombre del archivo de texto de entrada. La línea de comandos puede contener varias opciones -f.
-fl nombre_de_archivo: Establece el nombre del archivo de texto con la lista de archivos de entrada (un nombre de archivo por línea). La línea de comandos puede contener varias opciones -fl.
-w archivo_de_onda: Establece el nombre del archivo de salida en formato WAV.
-c: Toma como entrada el texto del portapapeles.
-t texto: El texto de entrada se puede tomar de la línea de comandos. La línea de comandos puede contener varias opciones -t.
-i: Toma el texto de entrada de STDIN.
-o: Escribe los datos sonoros en STDOUT. Si se especifica la opción, la opción -w se ignora.
--encoding codificación o -enc codificación: Establece la codificación del texto de entrada ("ansi", "utf8" o "unicode"). Si no se especifica la opción, el programa detectará la codificación del texto.
--silence-begin entero o -sb entero: Ajusta la longitud del silencio al principio del archivo de audio (en milisegundos).
El valor predeterminado es 0.
--silence-end entero o -se entero: Ajusta la longitud del silencio al final del archivo de audio (en milisegundos).
El valor predeterminado es 0.
-ln entero: Selecciona una línea del archivo de texto empleando un número de línea. La numeración de las líneas empieza por "1". Para seleccionar más de una línea se puede emplear el intervalo de números (por ejemplo, "26-34"). La línea de comandos puede contener varias opciones -ln.
-e entero: Sets the length of pauses between sentences (in milliseconds). The value should be set less than 5000. If the option is not specified, the service will use the default pauses between sentences. This parameter is supported by Microsoft Azure only.
-d nombre_de_archivo: Usa un diccionario para la corrección de la pronunciación (*.BXD, *.REX o *.DIC). La línea de comandos puede contener varias opciones -d.
-lrc: Crea el archivo LRC utilizando el texto de entrada. El texto se sincronizará con el habla en el archivo de audio.
-srt: Crea el archivo SRT utilizando el texto de entrada. Los subtítulos se sincronizarán con la voz del archivo de audio.
-sub: El texto se procesará como subtítulos. La opción puede ser útil al especificar las opciones -i o -c.
-host nombre_de_host: Sets the hostname of the proxy server.
-port entero: Sets the port number of the proxy server.
-fr entero: Establece la frecuencia de muestreo de la salida de audio en kHz (8, 11, 16, 22, 24, 32, 44, 48). Si no se especifica la opción, se utilizará el valor predeterminado de la voz seleccionada.
-ae codificación_de_audio: Sets the audio encoding for data returned by Google Cloud or Microsoft Azure ("linear16", "mp3" or "oggopus"). With this setting, it is possible to improve the sound quality. The option is available if the API key is specified. It is not recommended to be used without special necessity: apply it for testing purposes only.
--ignore-square-brackets o -isb: Ignorar el texto entre [corchetes cuadrados].
--ignore-curly-brackets o -icb: Ignorar el texto entre {llaves}.
--ignore-angle-brackets o -iab: Ignorar el texto entre <corchetes angulares>.
--ignore-round-brackets o -irb: Ignorar texto entre (corchetes redondos).
--ignore-url o -iu: Ignorar las URL dentro del texto.
--ignore-comments or -ic: Omite los comentarios. Los comentarios de una sola línea comienzan con // y continúan hasta el final de la línea. Los comentarios de varias líneas comienzan con /* y terminan con */.
-dp: Muestra información del progreso en una ventana de la consola.
-cfg nombre_de_archivo: Establece el nombre del archivo de configuración con las opciones de la línea de comandos (un archivo de texto en el que cada línea contiene una opción). Si no se especifica la opción, se utilizará el archivo bal4web.cfg que se encuentra en la misma carpeta que la utilidad.
-h: Muestra la lista de opciones de línea de comandos disponibles.
--lrc-length entero: Ajusta la longitud máxima de líneas para el archivo LRC (en caracteres).
--lrc-fname nombre_de_archivo: Establece el nombre del archivo LRC. La opción puede ser útil cuando se especifica la opción -o.
--lrc-enc codificación: Establece la codificación del archivo LRC ("ansi", "utf8" o "unicode"). El valor predeterminado es "ansi".
--lrc-offset entero: Ajusta el desplazamiento del tiempo para el archivo LRC (en milisegundos).
--lrc-artist texto: Establece la etiqueta de ID para el archivo LRC: intérprete.
--lrc-album texto: Establece la etiqueta de ID para el archivo LRC: álbum.
--lrc-title texto: Establece la etiqueta de ID para el archivo LRC: título.
--lrc-author texto: Establece la etiqueta de ID para el archivo LRC: autor.
--lrc-creator texto: Establece la etiqueta de ID para el archivo LRC: creador del archivo LRC.
--lrc-sent: Inserta líneas en blanco después de las frases en el archivo LRC.
--lrc-para: Inserta líneas en blanco después de los párrafos en el archivo LRC.
--srt-length entero: Ajusta la longitud máxima de líneas para el archivo SRT (en caracteres).
--srt-fname nombre_de_archivo: Establece el nombre del archivo SRT. La opción puede ser útil cuando se especifica la opción -o.
--srt-enc codificación: Establece la codificación del archivo SRT ("ansi", "utf8" o "unicode"). El valor predeterminado es "ansi".
--raw: Grabar los datos de audio en el formato RAW PCM; los datos no contienen el encabezado del formato WAV. La opción se utiliza junto con la opción -o.
--ignore-length o -il: No grabar la dimensión de los datos de audio en el encabezado del formato WAV. La opción se utiliza junto con la opción -o.
--wss: Use the WebSocket protocol for Microsoft Azure. It allows to improve sound quality of audio files (24 KHz instead of 16 KHz). The option is ignored if the subscription key for the Microsoft Azure Cognitive Services is defined. Use the option -m to check if a voice supports the WebSocket protocol or not.
--sub-format texto: Establece el formato de subtítulos ("srt", "lrc", "ssa", "ass", "smi" o "vtt"). Si no se especifica la opción, el formato se definirá acorde a la extensión del archivo.
--sub-fit o -sf: Automatically increases the speech rate to fit time intervals (when the program converts subtitles to audio file). The SoundTouch library will be used for changing tempo.
--sub-max entero o -sm entero: Sets the maximal rate of speech in a range of 110% to 200% (when the program converts subtitles to audio file). The program will automatically increase the speech rate without exceeding the set rate value.

--aws-keyid texto o -ak texto: Sets AWS access key ID for the Amazon Polly. It is recommended to apply such key if you have it.
--aws-secret texto o -as texto: Sets AWS secret access key for the Amazon Polly.
--aws-region texto o -ar texto: Sets AWS region for the Amazon Polly.
--crv-email texto o -ce texto: Sets the email address used when registering on the CereProc website. This information is necessary for CereVoice Cloud API authorization. It is recommended to apply such email if you have it.
--crv-pwd texto o -cp texto: Sets the password used when registering on the CereProc website. This information is necessary for CereVoice Cloud API authorization. It is recommended to apply such password if you have it.
--gc-apikey texto o -gk texto: Sets API key ID for the Google Cloud. It is recommended to apply such key if you have it.
--ms-apikey texto o -mk texto: Sets the subscription key for the Microsoft Azure Cognitive Services. It is recommended to apply such key if you have it.
--ms-region texto o -mr texto: Sets the subscription region for the Microsoft Azure Cognitive Services.

Ejemplos de comandos

Create the text file LANGUAGE.TXT with the list of all supported languages and genders for the Google TTS service:

bal4web -s Google -m > language.txt

Convert text from BOOK.TXT to speech and save as BOOK.WAV:

bal4web -f "d:\Text\book.txt" -w "d:\Sound\book.wav" -s Google -l en-US -g female

Convert subtitles to speech and save as MOVIE.WAV:

bal4web -f "d:\Subtitles\movie.srt" -w "d:\Sound\movie.wav" -s m -l de-DE -n Conrad -r 1.1

bal4web -f "d:\Subtitles\movie.srt" -w "d:\Sound\movie.wav" -s m -l de-DE -n Conrad --sub-fit

The example of use together with LAME.EXE:

bal4web -f d:\book.txt -s Baidu -l en-US -o --raw | lame -r -s 16 -m m -h - d:\book.mp3

The example of use together with OGGENC2.EXE:

bal4web -f d:\book.txt -s Baidu -l en-US -o -il | oggenc2 --ignorelength - -o d:\book.ogg

Archivo de configuración

Se puede guardar el archivo de configuración "bal4web.cfg" en la misma carpeta que la aplicación de consola.

Un ejemplo del contenido del archivo:

-f d:\Text\book.txt
-w d:\Sound\book.wav
-s Google
-l de-DE
-g female
-d d:\Dict\rules.bxd
-lrc
--lrc-length 75
--lrc-enc utf8

El programa puede combinar opciones del archivo de configuración y de la línea de comandos.

Licencia

You are free to use and distribute software for noncommercial purposes. For commercial use or distribution, you need to get permission from the copyright holder.