Documentation Index Fetch the complete documentation index at: https://nvd-54.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Get started using the Soniox audio transcription loader in LangChain.
Install the package:
npm install @soniox/langchain
Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY = your_api_key
Basic transcription
Example how to transcribe audio file using the SonioxAudioTranscriptLoader and generate the summary with an LLM.
import { SonioxAudioTranscriptLoader } from "@soniox/langchain" ;
import { ChatOpenAI } from "@langchain/openai" ;
import { ChatPromptTemplate } from "@langchain/core/prompts" ;
import { StringOutputParser } from "@langchain/core/output_parsers" ;
const audioFileUrl = "https://soniox.com/media/examples/coffee_shop.mp3" ;
const loader = new SonioxAudioTranscriptLoader (
{
audio : audioFileUrl ,
},
{
language_hints : [ "en" ] ,
// Any other transcription parameters you find here
// https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription
}
) ;
console . log ( `Transcribing ${ audioFileUrl } ...` ) ;
const docs = await loader . load () ;
const transcriptText = docs[ 0 ] . pageContent ;
console . log ( `Transcript: ${ transcriptText } ` ) ;
// Create a chain to summarize the transcript
const prompt = ChatPromptTemplate . fromTemplate (
"Write a concise summary of the following speech: \n\n {transcript}"
) ;
const chain = prompt
. pipe ( new ChatOpenAI ( { model : "gpt-5-mini" } ))
. pipe ( new StringOutputParser ()) ;
const summary = await chain . invoke ( { transcript : transcriptText } ) ;
console . log (summary) ;
You can also transcribe audio from binary data:
// Fetch the file
const response = await fetch ( "https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3" ) ;
const audioBuffer = await response . bytes () ; // Uint8Array
const loader = new SonioxAudioTranscriptLoader ( {
audio : audioBuffer ,
} )
const docs = await loader . load () ;
console . log (docs[ 0 ] . pageContent) ; // Transcribed text
Translation
Translate from any detected language to a target language:
const loader = new SonioxAudioTranscriptLoader (
{
audio : audioFileUrl ,
},
{
translation : {
type : "one_way" ,
target_language : "fr" ,
},
language_hints : [ "en" ] ,
}
) ;
const docs = await loader . load () ;
let originalText = "" ;
let translatedText = "" ;
for ( const token of docs[ 0 ] . metadata . tokens) {
if (token . translation_status === "translation" ) {
translatedText += token . text ;
} else {
originalText += token . text ;
}
}
console . log (originalText) ;
console . log (translatedText) ;
You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about Soniox translation .
Language hints
Soniox automatically detects and transcribes speech in 60+ languages . When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages.
Language hints do not restrict recognition—they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
const loader = new SonioxAudioTranscriptLoader (
{
audio : audioFileUrl ,
},
{
language_hints : [ "en" , "es" ] ,
}
) ;
const docs = await loader . load () ;
For more details, see the Soniox language hints documentation .
Speaker diarization
Enable speaker identification to distinguish between different speakers:
const loader = new SonioxAudioTranscriptLoader (
{
audio : audioFileUrl ,
},
{
enable_speaker_diarization : true ,
}
) ;
const docs = await loader . load () ;
// Access speaker information in the metadata
let currentSpeaker = null ;
let output = "" ;
for ( const token of docs[ 0 ] . metadata . tokens) {
if (currentSpeaker !== token . speaker) {
currentSpeaker = token . speaker ;
output += ` \n Speaker ${ currentSpeaker } : ${ token . text . trimStart () } ` ;
} else {
output += token . text ;
}
}
console . log (output) ;
// Analyze the conversation
const prompt = ChatPromptTemplate . fromTemplate (
`Analyze the following conversation between speakers.
Identify the intent of each speaker.
Conversation:
{conversation}`
) ;
const chain = prompt
. pipe ( new ChatOpenAI ( { model : "gpt-5-mini" } ))
. pipe ( new StringOutputParser ()) ;
const analysis = await chain . invoke ( { conversation : output } ) ;
console . log (analysis) ;
Language identification
Enable automatic language detection and identification:
const loader = new SonioxAudioTranscriptLoader (
{
audio : audioFileUrl ,
},
{
enable_language_identification : true ,
}
) ;
Context for improved accuracy
Provide domain-specific context to improve transcription accuracy:
const loader = new SonioxAudioTranscriptLoader (
{
audio : audioBuffer ,
},
{
context : {
general : [
{ key : "industry" , value : "healthcare" },
{ key : "meeting_type" , value : "consultation" }
] ,
terms : [ "hypertension" , "cardiology" , "metformin" ] ,
translation_terms : [
{ source : "blood pressure" , target : "presión arterial" },
{ source : "medication" , target : "medicamento" }
]
}
}
) ;
For more details, see the Soniox context documentation .
API 参考
Constructor parameters
SonioxLoaderParams (required)
Parameter Type Required Description audioUint8Array | stringYes Audio file as buffer or URL audioFormatSonioxAudioFormatNo Audio file format apiKeystringNo Soniox API key (defaults to SONIOX_API_KEY env var) apiBaseUrlstringNo API base URL (defaults to https://api.soniox.com/v1) pollingIntervalMsnumberNo Polling interval in ms (min: 1000, default: 1000) pollingTimeoutMsnumberNo Polling timeout in ms (default: 180000)
SonioxLoaderOptions (optional)
Parameter Type Description modelSonioxTranscriptionModelIdModel to use (default: "stt-async-v4") translationobjectTranslation configuration language_hintsstring[]Language hints for transcription language_hints_strictbooleanEnforce strict language hints enable_speaker_diarizationbooleanEnable speaker identification enable_language_identificationbooleanEnable language detection contextobjectContext for improved accuracy
Browse the documentation for a full list of supported options.
aac - Advanced Audio Coding
aiff - Audio Interchange File Format
amr - Adaptive Multi-Rate
asf - Advanced Systems Format
flac - Free Lossless Audio Codec
mp3 - MPEG Audio Layer III
ogg - Ogg Vorbis
wav - Waveform Audio File Format
webm - WebM Audio
Return value
The load() method returns an array containing a single Document object:
type Document {
pageContent: string, // The transcribed text
metadata: SonioxTranscriptResponse // Full transcript with metadata
}
The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.
type SonioxTranscriptResponse = {
id : string ;
text ?: string | null ;
tokens ?: SonioxTranscriptToken[] | null ;
}
Token type:
type SonioxTranscriptToken = {
text : string ;
start_ms ?: number | null ;
end_ms ?: number | null ;
confidence ?: number | null ;
speaker ?: number | string | null ;
language ?: string | null ;
translation_status ?: string | null ;
};
You can learn more about the SonioxTranscriptResponse type in the Soniox REST API Reference .
相关内容
将这些文档连接到 Claude、VSCode 等工具,通过 MCP 获取实时答案。