Audio Requirements

Follow these guidelines to get the most accurate analysis results from each recording. This page covers supported formats, duration, encoding, recording environment best practices, and audio quality issue codes.

Supported formats

Format	Notes
`wav`	Lossless quality; larger file sizes
`flac`	Lossless compression; smaller than WAV with no quality loss
`mp3`	Accepted; often smaller than WAV; helps you stay under the 32 MB upload limit
`m4a`	Accepted; common from mobile devices; often smaller than WAV

The API detects the format from the uploaded file. Submitting an unsupported format returns UNSUPPORTED_FORMAT with HTTP 400.

File size

Limit	Value
Maximum per file	32 MB

For longer recordings, MP3, M4A, or FLAC usually keep file size under the limit while preserving good analysis quality.

Duration

Duration	Outcome
Less than 15 seconds	Returns `400 Bad Request`; submit a recording of at least 15 seconds
15–19 seconds	Processed; check `audio_quality` for guidance on use
20–120 seconds	Recommended range for best results
2–20 minutes	Accepted; longer recordings have longer processing times (up to several minutes)
Greater than 20 minutes (1200 seconds)	Returns `400 Bad Request`; trim to 20 minutes or fewer before submitting

Recordings on the shorter end of the 15–19 second range are more likely to produce insufficient_speech issues.

Sample rate

Parameter	Value
Minimum	8000 Hz
Recommended	16000 Hz
Maximum	No hard upper limit; higher rates are downsampled internally

Record and submit at 16000 Hz where possible for best results across all models.

Channels

Mono recordings are recommended. Stereo recordings are accepted — the API automatically downmixes stereo input to mono before analysis. For best results, mixing to mono before submitting is still preferred.

Speaker diarization

Speaker diarization identifies and separates individual speakers in a multi-speaker recording, then analyzes each speaker independently. Use diarization when the recording contains two or more speakers — for example, a clinician-patient interview or a group therapy session.

When to use diarization

Single speaker — diarize=false (default). Recommended for most scenarios.
Multiple speakers — diarize=true. The API identifies each speaker and runs analysis separately on their voice segments.

Single-speaker recordings analyzed with diarize=true still work — the diarizer detects one speaker and processes normally — but diarization adds a flat 50 tokens on top of the base rate (see Billing & cost).

How to enable

V2 API: Pass diarize=true as a form field on POST /v2/models/{model_name}/analyze or POST /v2/signs/{sign_name}/analyze.

curl -X POST https://api.amplifierhealth.com/v2/models/apex/analyze \
  -H "X-Account-ID: your-account-id" \
  -H "X-API-Key: your-api-key" \
  -F "diarize=true" \
  -F "audio=@recording.wav;type=audio/wav"

V1 API: Use the dedicated diarization endpoints — POST /api/v1/{condition}/analyze-and-diarize-audio or POST /api/v1/{condition}/analyze-and-diarize-audio-url. V1 diarization is async only. See Legacy API — Conditions for details.

Processing time

Diarization adds a speaker separation step before analysis, which increases total processing time. For longer recordings, processing can take several minutes. Submit returns a job immediately; retrieve results via GET /v2/jobs/{job_id} or webhook.

Submitting audio

Audio is submitted as multipart/form-data with the file in the audio field. For a complete working example with all parameters, see Model API.

Recording environment guidelines

The acoustic environment affects every model. Poor recording conditions are the leading cause of quality issues. Follow the guidelines below to get the best results.

Environment:

Record in a quiet space with minimal background noise. Avoid open offices, public spaces, or rooms with significant echo.
Eliminate sources of intermittent noise (fans, air conditioning, notifications) where possible.

Microphone:

Use a close-proximity microphone. Headset microphones, lavaliers, and phone handsets held close to the mouth produce consistently better results than speakerphone or far-field microphones.
Avoid clipping. If the speaker's voice is loud, lower the input gain until audio levels peak around -6 dBFS.

Speaker:

Capture a single speaker per recording.
The speaker should be the subject being assessed, not a clinician or interviewer.

Tip

Before deploying in a new environment — a clinic room, a mobile app, an enterprise desktop — test recording levels with a sample recording and check audio_quality.issues in the response. Fix issues before going live.

Free-form speech

Most models work well with free-form conversational speech. Ask the speaker to talk naturally about any topic: how their day is going, how they have been feeling, or anything they are comfortable sharing.

Language support

Amplifier's models are currently trained on English-language speech. Support for additional languages is in active development.

Before deploying in a multilingual environment:

Confirm supported languages for each model you intend to use. Contact Amplifier for the current language support matrix.
Verify that each model supports the languages in your deployment.
Contact support@amplifierhealth.com before deploying in any multilingual context.

Warning

Submitting audio in an unsupported language can produce signal values without a quality warning. Always verify language compatibility for your target population before deploying in production.

Audio quality issue codes

When the API detects recording problems, it returns issue codes in audio_quality.issues. Results may still be returned alongside issues. Use the issue codes to decide whether to use the result or re-record.

Code	Effect on results	Recommended action
`poor_voice_quality`	Recording may be below optimal for analysis	Re-record in a quieter environment with a better microphone
`insufficient_speech`	Limited speech detected; more continuous speech may improve results	Re-record with more active continuous speech; eliminate long pauses
`high_background_noise`	Background noise may affect analysis quality	Re-record in a quieter space or with a closer microphone
`invalid_speaker`	No clear single-speaker human speech detected	Verify the recording contains the intended speaker's voice

When issues contains invalid_speaker, request a new recording before routing on the result. See Interpreting Results for guidance on how to handle quality conditions in your routing logic.