Balabolka :: Programa para serviços de síntese de voz online

The command line application allows to use online text-to-speech services: text files or subtitles can be converted to audio files. The utility can be used for testing purposes: it will help you to choose a cloud computing service that satisfies your needs. The separate application for Yandex SpeechKit is available for downloading, because Yandex is the Russian IT company with close government ties.

Online services with speech technologies:

Google Cloud TTS;
Amazon Polly;
Baidu TTS;
CereVoice Cloud;
Descript TTS;
IBM Watson TTS;
Iciba TTS;
iTranslate TTS;
Microsoft Azure;
Naver TTS;
OpenAI TTS;
Youdao TTS;
Yandex SpeechKit.

Baixar o programa BAL4WEB

Tamanho: MB

Versão: Registro de alterações

Licença: Freeware

Sistema operacional:

Command Line Utility for Yandex SpeechKit Using: Baixar ( MB)
The program converts text or subtitles to audio files by using of the Yandex service.
To perform operations via the Yandex API, it is necessary to authenticate using an API-key.

Linha de comando

The utility handles various command line parameters to be able to read text aloud or save as an audio file. The command line options use the syntax "bal4web [options ...]", all parameters must be separated by a space. Options can appear in any order on the command line so long as they are paired with their related parameters. Use the "bal4web -?" command line to get help on the command line syntax and parameters.

-s nome_do_serviço: Sets the name of the online TTS service ("google" or "g", "amazon" or "a", "baidu" or "b", "cerevoice" or "c", "descript" or "d", "ibm" or "i", "iciba" or "k", "itranslate" or "t", "microsoft" or "m", "naver" or "n", "openai" or "o", "youdao" or "y"). The default is "google".
-l nome_da_língua: Sets the language name for the online TTS service. The name is a combination of an ISO 639 two-letter lowercase culture code associated with a language and an ISO 3166 two-letter uppercase subculture code associated with a country or region. For example: pt-BR, de-DE, fr-FR. The default is "en-US".
Note: Descript TTS and OpenAI TTS perform the language identification for input text, so these services ignore the option now. These services can recognize several dozen languages on their own.
-g género: Sets the gender for the online TTS service (if supported). The available values: "female" or "f", "male" or "m". The default value is not defined. This parameter is supported by services: Amazon Polly, CereProc TTS, Descript TTS, Google TTS, IBM Watson TTS, iTranslate TTS, Microsoft Azure, Naver TTS, OpenAI TTS. If a voice name is specified, there is no need to set its gender.
-n nome_da_voz: Sets the voice name for the online TTS service (if supported). The default value is not defined. This parameter is supported by services Amazon Polly, CereProc TTS, Descript TTS, Google Cloud TTS, IBM Watson TTS, Microsoft Azure, Naver TTS, OpenAI TTS.
-r velocidade_de_fala: Sets the rate of the synthesized speech (if supported).
The default is "1.00" (average human speech rate).
Amazon Polly: from "0.20" to "2.00".
CereProc TTS: from "0.30" to "4.00".
Descript TTS, Naver TTS, OpenAI TTS, Youdao TTS: from "0.70" to "2.00".
Google TTS, IBM Watson TTS, Microsoft Azure: from "0.10" to "3.00".
Google Cloud: from "0.25" to "4.00".
iTranslate TTS: from "0.50" to "2.00".
-p número: Sets the speaking pitch in a range of -20 to 20 (if supported). The default is 0.
This option is supported by Amazon Polly, CereProc TTS, Google Cloud TTS, IBM Watson TTS, Microsoft Azure.
-v número: Definir o volume na faixa de 0 a 200 (o padrão é 100).
-st style: Sets the voice-specific speaking style. The voice can express emotions like cheerfulness, empathy or calmness. This option is supported by some voices in Microsoft Azure. Styles are not available if the WebSocket protocol for Microsoft Azure is used.
--style-degree style_degree ou -sd style_degree: Sets the intensity of the speaking style in a range of "0.01" to "2.00" (for styles supported by Microsoft Azure). The default is "1.00". The option allows to specify a stronger or softer style to make the speech more expressive or subdued.
-m: Prints the list of supported languages (genders and voices' names, if available) for the online TTS service.
-f nome_do_arquivo: Sets the name of the input text file. The command line may contain few options -f.
-fl nome_do_arquivo: Sets the name of the text file with the list of input files (one file name per line). The command line may contain few options -fl.
-w nome_do_arquivo: Sets the name of the output file in WAV format.
-c: Gets the text input from the clipboard.
-t linha_de_texto: Gets the text input from the command line. The command line may contain few options -t.
-i: Gets the text input from STDIN.
-o: Writes sound data to STDOUT; if the option is specified, the option -w is ignored.
--encoding codificação ou -enc codificação: Codificação de texto a partir da entrada padrão ("ansi", "utf8" ou "unicode"). Se a opção não for especificada, o programa detectará a codificação do texto.
--silence-begin número ou -sb número: Especificar a duração da pausa no início do ficheiro áudio (em milissegundos). O padrão é 0.
--silence-end número ou -se número: Especificar a duração da pausa no final de um ficheiro áudio (em milissegundos). O padrão é 0.
-ln número: Selects a line from the text file by using of a line number. The line numbering starts at "1". The interval of numbers can be used for selecting of more than one line (for example, "26-34"). The command line may contain few options -ln.
-e número: Sets the length of pauses between sentences (in milliseconds). The value should be set less than 5000. If the option is not specified, the service will use the default pauses between sentences. This parameter is supported by Microsoft Azure only.
-d nome_do_arquivo: Applies a dictionary for pronunciation correction (*.BXD, *.DIC or *.REX). The command line may contain few options -d.
-lrc: Creates the LRC file. Lyrics will be synchronized with the speech in the output audio file.
-srt: Creates the SRT file. Subtitles will be synchronized with the speech in the output audio file.
-sub: Input text will be processed as subtitles. The option may be useful, when the options -i or -c are specified.
-host nome_do_anfitrião: Sets the hostname of the proxy server.
-port número: Sets the port number of the proxy server.
-fr número: Sets the output audio sampling frequency in kHz (8, 11, 16, 22, 24, 32, 44, 48). If the option is not specified, the default value for the selected serice will be used.
-ae codificação_áudio: Sets the audio encoding for data returned by Google Cloud or Microsoft Azure ("linear16", "mp3" or "oggopus"). With this setting, it is possible to improve the sound quality. The option is available if the API key is specified. It is not recommended to be used without special necessity: apply it for testing purposes only.
--ignore-square-brackets ou -isb: Ignore text in [square brackets].
--ignore-curly-brackets ou -icb: Ignore text in {curly brackets}.
--ignore-angle-brackets ou -iab: Ignore text in <angle brackets>.
--ignore-round-brackets ou -irb: Ignore text in (round brackets).
--ignore-url ou -iu: Ignore URLs.
--ignore-comments ou -ic: Ignore comments in text. Single-line comments start with // and continue until the end of the line. Multiline comments start with /* and end with */.
-dp: Display progress information in a console window.
-cfg nome_do_arquivo: Sets the name of the configuration file with the command line options (a text file where each line contains one option). If the option is not specified, the file bal4web.cfg in the same folder as the utility will be used.
-h: Mostrar a descrição das opções da linha de comando.
--lrc-length número: Especificar o comprimento máximo de cordas para ficheiro de formato LRC (em caracteres).
--lrc-fname nome_do_arquivo: Nome de ficheiro do formato LRC. A opção pode ser útil em casos em que a linha de comando estiver definido o parámetro -o.
--lrc-enc codificação: Codificação de ficheiro do formato LRC ("ansi", "utf8" ou "unicode"). O valor padrão é "ansi".
--lrc-offset número: Especificar a mudança da hora no ficheiro do formato LRC (em milissegundos).
--lrc-artist texto: Etiqueta para ficheiro do formato LRC: intérprete da obra.
--lrc-album texto: Etiqueta para ficheiro do formato LRC: álbum.
--lrc-title texto: Etiqueta para ficheiro do formato LRC: título da obra.
--lrc-author texto: Etiqueta para ficheiro do formato LRC: autor.
--lrc-creator texto: Etiqueta para ficheiro do formato LRC: criador do ficheiro.
--lrc-sent: Inserts blank lines after sentences when creating the LRC file.
--lrc-para: Inserts blank lines after paragraphs when creating the LRC file.
--srt-length número: Especificar o comprimento máximo de cordas para ficheiro de formato SRT (em caracteres).
--srt-fname nome_do_arquivo: Nome de ficheiro do formato SRT. A opção pode ser útil em casos em que a linha de comando estiver definido o parámetro -o.
--srt-enc codificação: Codificação de ficheiro do formato SRT ("ansi", "utf8" ou "unicode"). O valor padrão é "ansi".
--raw: Gravar áudio no formato PCM RAW; os dados não contêm título do formato WAV. Esta opção é utilizada em conjunto com -o.
--ignore-length ou -il: Não inscrever o tamanho dos dados de áudio no título do formato WAV. Esta opção é utilizada em conjunto com -o.
--wss: Use the WebSocket protocol for Microsoft Azure. It allows to improve sound quality of audio files (24 KHz instead of 16 KHz). The option is ignored if the subscription key for the Microsoft Azure Cognitive Services is defined. Use the option -m to check if a voice supports the WebSocket protocol or not.
--sub-format texto: Formato dos subtítulos ("srt", "lrc", "ssa", "ass", "smi" ou "vtt"). Se não for especificado, o formato será determinado pela extensão do nome do ficheiro de subtítulos.
--sub-fit ou -sf: Automatically increases the speech rate to fit time intervals (when the program converts subtitles to audio file). The SoundTouch library will be used for changing tempo.
--sub-max número ou -sm número: Sets the maximal rate of speech in a range of 110% to 200% (when the program converts subtitles to audio file). The program will automatically increase the speech rate without exceeding the set rate value.

--aws-keyid texto ou -ak texto: Sets AWS access key ID for the Amazon Polly. It is recommended to apply such key if you have it.
--aws-secret texto ou -as texto: Sets AWS secret access key for the Amazon Polly.
--aws-region texto ou -ar texto: Sets AWS region for the Amazon Polly.
--crv-email texto ou -ce texto: Sets the email address used when registering on the CereProc website. This information is necessary for CereVoice Cloud API authorization. It is recommended to apply such email if you have it.
--crv-pwd texto ou -cp texto: Sets the password used when registering on the CereProc website. This information is necessary for CereVoice Cloud API authorization. It is recommended to apply such password if you have it.
--gc-apikey texto ou -gk texto: Sets API key ID for the Google Cloud. It is recommended to apply such key if you have it.
--ms-apikey texto ou -mk texto: Sets the subscription key for the Microsoft Azure Cognitive Services. It is recommended to apply such key if you have it.
--ms-region texto ou -mr texto: Sets the subscription region for the Microsoft Azure Cognitive Services.

Exemplo de comandos

Create the text file LANGUAGE.TXT with the list of all supported languages and genders for the Google TTS service:

bal4web -s Google -m > language.txt

Convert text from BOOK.TXT to speech and save as BOOK.WAV:

bal4web -f "d:\Text\book.txt" -w "d:\Sound\book.wav" -s Google -l en-US -g female

Convert subtitles to speech and save as MOVIE.WAV:

bal4web -f "d:\Subtitles\movie.srt" -w "d:\Sound\movie.wav" -s m -l de-DE -n Conrad -r 1.1

bal4web -f "d:\Subtitles\movie.srt" -w "d:\Sound\movie.wav" -s m -l de-DE -n Conrad --sub-fit

The example of use together with LAME.EXE:

bal4web -f d:\book.txt -s Baidu -l en-US -o --raw | lame -r -s 16 -m m -h - d:\book.mp3

The example of use together with OGGENC2.EXE:

bal4web -f d:\book.txt -s Baidu -l en-US -o -il | oggenc2 --ignorelength - -o d:\book.ogg

Arquivo de configuração

É possível salvar o arquivo de configuração "bal4web.cfg" na mesma pasta que o aplicativo de console.

Um exemplo do conteúdo do arquivo:

-f d:\Text\book.txt
-w d:\Sound\book.wav
-s Google
-l de-DE
-g female
-d d:\Dict\rules.bxd
-lrc
--lrc-length 75
--lrc-enc utf8

O programa pode combinar opções do arquivo de configuração e da linha de comando.

Licença

You are free to use and distribute software for noncommercial purposes. For commercial use or distribution, you need to get permission from the copyright holder.