AudioTranscribeWhisperSettings interface

Settings for an Audio Transcribe operation using Whisper sdk (whisper-cpp) see: NorskTransform.audioTranscribeWhisper()

Signature:

export interface AudioTranscribeWhisperSettings extends ProcessorNodeSettings<AudioTranscribeWhisperNode>

Properties

Property Type Description

contextPrompt?

boolean

(Optional) Whether to supply a prompt consisting of the tokens recognised in the previous chunk. By default it is not supplied, relying only on the overlap of transcription chunks to resolve partial words occurring at the start/end of a chunk.

initialPrompt?

string

(Optional) Initial prompt to prime the model - this is in addition to prompting based on past transcription history.

keepMs?

number

(Optional) Duration of audio to keep when clearing the buffer to allow for partial-word recognition. Default 400ms

language?

string

(Optional) Language setting for the Whisper model. Leave unset to auto-detect (with a multi-language model)

maxTokens?

number

(Optional) Max tokens per segment

model

string

The file name of the GGML-format whisper model.

Information: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

Model downloads: https://huggingface.co/ggerganov/whisper.cpp/tree/main

noFallback?

boolean

(Optional)

numThreads?

number

(Optional) Number of threads to use. Note using a large number of threads rarely improves performance

outputStreamId

number

Stream ID of the output subtitles

samplingStrategy?

WhisperSamplingStrategy

(Optional) Greedy (default) or beam sampling strategy

speedUp?

boolean

(Optional) Experimental: speed-up the audio by 2x using Phase Vocoder. Can significantly reduce the quality of the output

stepMs?

number

(Optional) The duration of audio that is accumulated before performing one transcription step. Decreasing this value will decrease latency but also decrease performance. Visualiser metrics are available to monitor the duration of each "step" operation, if this is not clearly faster than the audio duration real-time output will not be attained and the workflow will back up.

Default 3000ms.

suppressNonSpeechTokens?

boolean

(Optional) Whether to suppress non-speech tokens

tinyDiarize?

boolean

(Optional) Enable tiny-diarize if supported in the given model

translate?

boolean

(Optional) Whether to translate a non-English input to English, or leave the foreign-language transcription in the source language.

useGpu?

boolean

(Optional) Use GPU if available. In the cases where GPU is available, it may not necessarily increase performance, but instead opt to move load from CPU to GPU.