Audio Requirements
Follow these guidelines to get the most accurate analysis results from each recording. This page covers supported formats, duration, encoding, recording environment best practices, and audio quality issue codes.
Supported formats
| Format | Notes |
|---|---|
wav | Lossless quality; larger file sizes |
flac | Lossless compression; smaller than WAV with no quality loss |
mp3 | Accepted; often smaller than WAV; helps you stay under the 32 MB upload limit |
m4a | Accepted; common from mobile devices; often smaller than WAV |
The API detects the format from the uploaded file. Submitting an unsupported format returns UNSUPPORTED_FORMAT with HTTP 400.
File size
| Limit | Value |
|---|---|
| Maximum per file | 32 MB |
For longer recordings, MP3, M4A, or FLAC usually keep file size under the limit while preserving good analysis quality.
Duration
| Duration | Outcome |
|---|---|
| Less than 15 seconds | Returns 400 Bad Request; submit a recording of at least 15 seconds |
| 15–19 seconds | Processed; check audio_quality for guidance on use |
| 20–120 seconds | Recommended range for best results |
| 2–20 minutes | Accepted; longer recordings have longer processing times (up to several minutes) |
| Greater than 20 minutes (1200 seconds) | Returns 400 Bad Request; trim to 20 minutes or fewer before submitting |
Recordings on the shorter end of the 15–19 second range are more likely to produce insufficient_speech issues.
Sample rate
| Parameter | Value |
|---|---|
| Minimum | 8000 Hz |
| Recommended | 16000 Hz |
| Maximum | No hard upper limit; higher rates are downsampled internally |
Record and submit at 16000 Hz where possible for best results across all models.
Channels
Mono recordings are recommended. Stereo recordings are accepted — the API automatically downmixes stereo input to mono before analysis. For best results, mixing to mono before submitting is still preferred.
Speaker diarization
Speaker diarization identifies and separates individual speakers in a multi-speaker recording, then analyzes each speaker independently. Use diarization when the recording contains two or more speakers — for example, a clinician-patient interview or a group therapy session.
When to use diarization
- Single speaker —
diarize=false(default). Recommended for most scenarios. - Multiple speakers —
diarize=true. The API identifies each speaker and runs analysis separately on their voice segments.
Single-speaker recordings analyzed with diarize=true still work — the diarizer detects one speaker and processes normally — but diarization adds a flat 50 tokens on top of the base rate (see Billing & cost).
How to enable
V2 API: Pass diarize=true as a form field on POST /v2/models/{model_name}/analyze or POST /v2/signs/{sign_name}/analyze.
curl -X POST https://api.amplifierhealth.com/v2/models/apex/analyze \
-H "X-Account-ID: your-account-id" \
-H "X-API-Key: your-api-key" \
-F "diarize=true" \
-F "audio=@recording.wav;type=audio/wav"V1 API: Use the dedicated diarization endpoints — POST /api/v1/{condition}/analyze-and-diarize-audio or POST /api/v1/{condition}/analyze-and-diarize-audio-url. V1 diarization is async only. See Legacy API — Conditions for details.
Processing time
Diarization adds a speaker separation step before analysis, which increases total processing time. For longer recordings, processing can take several minutes. Submit returns a job immediately; retrieve results via GET /v2/jobs/{job_id} or webhook.
Submitting audio
Audio is submitted as multipart/form-data with the file in the audio field. For a complete working example with all parameters, see Model API.
Recording environment guidelines
The acoustic environment affects every model. Poor recording conditions are the leading cause of quality issues. Follow the guidelines below to get the best results.
Environment:
- Record in a quiet space with minimal background noise. Avoid open offices, public spaces, or rooms with significant echo.
- Eliminate sources of intermittent noise (fans, air conditioning, notifications) where possible.
Microphone:
- Use a close-proximity microphone. Headset microphones, lavaliers, and phone handsets held close to the mouth produce consistently better results than speakerphone or far-field microphones.
- Avoid clipping. If the speaker's voice is loud, lower the input gain until audio levels peak around -6 dBFS.
Speaker:
- Capture a single speaker per recording.
- The speaker should be the subject being assessed, not a clinician or interviewer.
Tip
Before deploying in a new environment — a clinic room, a mobile app, an enterprise desktop — test recording levels with a sample recording and check audio_quality.issues in the response. Fix issues before going live.
Free-form speech
Most models work well with free-form conversational speech. Ask the speaker to talk naturally about any topic: how their day is going, how they have been feeling, or anything they are comfortable sharing.
Language support
Amplifier's models are currently trained on English-language speech. Support for additional languages is in active development.
Before deploying in a multilingual environment:
- Confirm supported languages for each model you intend to use. Contact Amplifier for the current language support matrix.
- Verify that each model supports the languages in your deployment.
- Contact support@amplifierhealth.com before deploying in any multilingual context.
Warning
Submitting audio in an unsupported language can produce signal values without a quality warning. Always verify language compatibility for your target population before deploying in production.
Audio quality issue codes
When the API detects recording problems, it returns issue codes in audio_quality.issues. Results may still be returned alongside issues. Use the issue codes to decide whether to use the result or re-record.
| Code | Effect on results | Recommended action |
|---|---|---|
poor_voice_quality | Recording may be below optimal for analysis | Re-record in a quieter environment with a better microphone |
insufficient_speech | Limited speech detected; more continuous speech may improve results | Re-record with more active continuous speech; eliminate long pauses |
high_background_noise | Background noise may affect analysis quality | Re-record in a quieter space or with a closer microphone |
invalid_speaker | No clear single-speaker human speech detected | Verify the recording contains the intended speaker's voice |
When issues contains invalid_speaker, request a new recording before routing on the result. See Interpreting Results for guidance on how to handle quality conditions in your routing logic.
