AudioTranscribeWhisperSettings interface

Settings for an Audio Transcribe operation using Whisper sdk (whisper-cpp) see: NorskTransform.audioTranscribeWhisper()

Signature:

export interface AudioTranscribeWhisperSettings extends ProcessorNodeSettings<AudioTranscribeWhisperNode>

Extends: ProcessorNodeSettings<AudioTranscribeWhisperNode>

Properties

Property	Type	Description
initialPrompt?	string	(Optional) Initial prompt to prime the model - this is in addition to prompting based on past transcription history.
keepMs?	number	(Optional) Duration of audio to keep when clearing the buffer to allow for partial-word recognition. Default 400ms
language?	string	(Optional) Language setting for the Whisper model. Leave unset to auto-detect (with a multi-language model)
maxTokens?	number	(Optional) Max tokens per segment
model	string	The file name of the GGML-format whisper model. Information: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md Model downloads: https://huggingface.co/ggerganov/whisper.cpp/tree/main
noFallback?	boolean	(Optional)
numThreads?	number	(Optional) Number of threads to use. Note using a large number of threads rarely improves performance
outputStreamId	number	Stream ID of the output subtitles
samplingStrategy?	WhisperSamplingStrategy	(Optional) Greedy (default) or beam sampling strategy
speedUp?	boolean	(Optional) Experimental: speed-up the audio by 2x using Phase Vocoder. Can significantly reduce the quality of the output
stepMs?	number	(Optional) The duration of audio that is accumulated before performing one transcription step. Decreasing this value will decrease latency but also decrease performance. Visualiser metrics are available to monitor the duration of each "step" operation, if this is not clearly faster than the audio duration real-time output will not be attained and the workflow will back up. Default 3000ms.
suppressNonSpeechTokens?	boolean	(Optional) Whether to suppress non-speech tokens
tinyDiarize?	boolean	(Optional) Enable tiny-diarize if supported in the given model
translate?	boolean	(Optional) Whether to translate a non-English input to English, or leave the foreign-language transcription in the source language.
useGpu?	boolean	(Optional) Use GPU if available. In the cases where GPU is available, it may not necessarily increase performance, but instead opt to move load from CPU to GPU.

Property

Type

Description

initialPrompt?

string

(Optional) Initial prompt to prime the model - this is in addition to prompting based on past transcription history.

keepMs?

number

(Optional) Duration of audio to keep when clearing the buffer to allow for partial-word recognition. Default 400ms

language?

string

(Optional) Language setting for the Whisper model. Leave unset to auto-detect (with a multi-language model)

maxTokens?

number

(Optional) Max tokens per segment

model

string

The file name of the GGML-format whisper model.

Information: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

Model downloads: https://huggingface.co/ggerganov/whisper.cpp/tree/main

noFallback?

boolean

(Optional)

numThreads?

number

(Optional) Number of threads to use. Note using a large number of threads rarely improves performance

outputStreamId

number

Stream ID of the output subtitles

samplingStrategy?

WhisperSamplingStrategy

(Optional) Greedy (default) or beam sampling strategy

speedUp?

boolean

(Optional) Experimental: speed-up the audio by 2x using Phase Vocoder. Can significantly reduce the quality of the output

stepMs?

number

(Optional) The duration of audio that is accumulated before performing one transcription step. Decreasing this value will decrease latency but also decrease performance. Visualiser metrics are available to monitor the duration of each "step" operation, if this is not clearly faster than the audio duration real-time output will not be attained and the workflow will back up.

Default 3000ms.

suppressNonSpeechTokens?

boolean

(Optional) Whether to suppress non-speech tokens

tinyDiarize?

boolean

(Optional) Enable tiny-diarize if supported in the given model

translate?

boolean

(Optional) Whether to translate a non-English input to English, or leave the foreign-language transcription in the source language.

useGpu?

boolean

(Optional) Use GPU if available. In the cases where GPU is available, it may not necessarily increase performance, but instead opt to move load from CPU to GPU.