Skip to content

Audio Requirements

Follow these guidelines to get the most accurate analysis results from each recording. This page covers supported formats, duration, encoding, recording environment best practices, and audio quality issue codes.

Supported formats

FormatNotes
wavLossless quality; larger file sizes
flacLossless compression; smaller than WAV with no quality loss
mp3Accepted; often smaller than WAV; helps you stay under the 32 MB upload limit
m4aAccepted; common from mobile devices; often smaller than WAV

The API detects the format from the uploaded file. Submitting an unsupported format returns UNSUPPORTED_FORMAT with HTTP 400.

File size

LimitValue
Maximum per file32 MB

For longer recordings, MP3, M4A, or FLAC usually keep file size under the limit while preserving good analysis quality.

Duration

DurationOutcome
Less than 15 secondsReturns 400 Bad Request; submit a recording of at least 15 seconds
15–19 secondsProcessed; check audio_quality for guidance on use
20–120 secondsRecommended range for best results
2–20 minutesAccepted; longer recordings have longer processing times (up to several minutes)
Greater than 20 minutes (1200 seconds)Returns 400 Bad Request; trim to 20 minutes or fewer before submitting

Recordings on the shorter end of the 15–19 second range are more likely to produce insufficient_speech issues.

Sample rate

ParameterValue
Minimum8000 Hz
Recommended16000 Hz
MaximumNo hard upper limit; higher rates are downsampled internally

Record and submit at 16000 Hz where possible for best results across all models.

Channels

Mono recordings are recommended. Stereo recordings are accepted — the API automatically downmixes stereo input to mono before analysis. For best results, mixing to mono before submitting is still preferred.

Speaker diarization

Speaker diarization identifies and separates individual speakers in a multi-speaker recording, then analyzes each speaker independently. Use diarization when the recording contains two or more speakers — for example, a clinician-patient interview or a group therapy session.

When to use diarization

  • Single speakerdiarize=false (default). Recommended for most scenarios.
  • Multiple speakersdiarize=true. The API identifies each speaker and runs analysis separately on their voice segments.

Single-speaker recordings analyzed with diarize=true still work — the diarizer detects one speaker and processes normally — but diarization adds a flat 50 tokens on top of the base rate (see Billing & cost).

How to enable

V2 API: Pass diarize=true as a form field on POST /v2/models/{model_name}/analyze or POST /v2/signs/{sign_name}/analyze.

curl -X POST https://api.amplifierhealth.com/v2/models/apex/analyze \
  -H "X-Account-ID: your-account-id" \
  -H "X-API-Key: your-api-key" \
  -F "diarize=true" \
  -F "audio=@recording.wav;type=audio/wav"

V1 API: Use the dedicated diarization endpoints — POST /api/v1/{condition}/analyze-and-diarize-audio or POST /api/v1/{condition}/analyze-and-diarize-audio-url. V1 diarization is async only. See Legacy API — Conditions for details.

Processing time

Diarization adds a speaker separation step before analysis, which increases total processing time. For longer recordings, processing can take several minutes. Submit returns a job immediately; retrieve results via GET /v2/jobs/{job_id} or webhook.

Submitting audio

Audio is submitted as multipart/form-data with the file in the audio field. For a complete working example with all parameters, see Model API.

Recording environment guidelines

The acoustic environment affects every model. Poor recording conditions are the leading cause of quality issues. Follow the guidelines below to get the best results.

Environment:

  • Record in a quiet space with minimal background noise. Avoid open offices, public spaces, or rooms with significant echo.
  • Eliminate sources of intermittent noise (fans, air conditioning, notifications) where possible.

Microphone:

  • Use a close-proximity microphone. Headset microphones, lavaliers, and phone handsets held close to the mouth produce consistently better results than speakerphone or far-field microphones.
  • Avoid clipping. If the speaker's voice is loud, lower the input gain until audio levels peak around -6 dBFS.

Speaker:

  • Capture a single speaker per recording.
  • The speaker should be the subject being assessed, not a clinician or interviewer.

Tip

Before deploying in a new environment — a clinic room, a mobile app, an enterprise desktop — test recording levels with a sample recording and check audio_quality.issues in the response. Fix issues before going live.

Free-form speech

Most models work well with free-form conversational speech. Ask the speaker to talk naturally about any topic: how their day is going, how they have been feeling, or anything they are comfortable sharing.

Language support

Amplifier's models are currently trained on English-language speech. Support for additional languages is in active development.

Before deploying in a multilingual environment:

  • Confirm supported languages for each model you intend to use. Contact Amplifier for the current language support matrix.
  • Verify that each model supports the languages in your deployment.
  • Contact support@amplifierhealth.com before deploying in any multilingual context.

Warning

Submitting audio in an unsupported language can produce signal values without a quality warning. Always verify language compatibility for your target population before deploying in production.

Audio quality issue codes

When the API detects recording problems, it returns issue codes in audio_quality.issues. Results may still be returned alongside issues. Use the issue codes to decide whether to use the result or re-record.

CodeEffect on resultsRecommended action
poor_voice_qualityRecording may be below optimal for analysisRe-record in a quieter environment with a better microphone
insufficient_speechLimited speech detected; more continuous speech may improve resultsRe-record with more active continuous speech; eliminate long pauses
high_background_noiseBackground noise may affect analysis qualityRe-record in a quieter space or with a closer microphone
invalid_speakerNo clear single-speaker human speech detectedVerify the recording contains the intended speaker's voice

When issues contains invalid_speaker, request a new recording before routing on the result. See Interpreting Results for guidance on how to handle quality conditions in your routing logic.