Models
AudioShake’s Models define the specific type of audio processing applied to your content.
Each model represents a distinct audio processing operation—such as isolating vocals, removing background music, or transcribing dialogue—and can be combined within a single Tasks API request to produce multiple outputs from the same source file.
Models are organized by use case:
- Instrument Stem Separation — break down songs into individual components like vocals, drums, and bass.
- Dialogue, Music, and Effects — isolate voices or remove background elements for film, TV, and dubbing.
- Transcription and Alignment — convert spoken content into synchronized text and timestamps.
Use these models to design flexible workflows for music production, post-production, accessibility, and AI data preparation.
Instrument Stem Separation
These models isolate or extract musical components from a mixed track.
They’re useful for remixing, immersive audio, gaming, and music education.
All models can be called via the /tasks route and support standard formats like WAV, MP3, or FLAC.
| Name | Model Key | Description | Credits / Minute | Max Length |
|---|---|---|---|---|
| Vocals | vocals | Extracts vocal elements from a mix. | 1.0 | 3 Hours |
| Lead Vocals | vocals_lead | Vocal performances carrying the primary melodic or lyrical content of the track. Excellent for karoke. | 1.0 | 3 Hours |
| Backing Vocals | vocals_backing | Extracts only the backing vocals including harmonies, chants, ad-libs, and choirs. | 1.0 | 3 Hours |
| Instrumental | instrumental | Generates an instrumental-only version by removing vocals. | 1.0 | 3 Hours |
| Drums | drums | Isolates percussion and rhythmic elements. | 1.0 | 3 Hours |
| Bass | bass | Separates bass instruments and low-frequency sounds. | 1.0 | 3 Hours |
| Guitar | guitar | Isolates guitar stems (acoustic, electric, classical). | 1.0 | 3 Hours |
| Electric Guitar | guitar_electric | Isolates electric guitar stems. | 1.0 | 3 Hours |
| Acoustic Guitar | guitar_acoustic | Isolates acoustic guitar stems (including classical guitar). | 1.0 | 3 Hours |
| Piano | piano | Extracts only acoustic piano. | 1.0 | 3 Hours |
| Keys | keys | Extracts all keyboard instruments including piano, electric piano, organ, etc. | 1.0 | 3 Hours |
| Strings | strings | Isolates orchestral string instruments like violin, cello, and viola. | 1.0 | 3 Hours |
| Wind | wind | Extracts wind instruments such as flute and saxophone. | 1.0 | 3 Hours |
| Other | other | Captures remaining instrumentation after main stems are removed. | 1.0 | 3 Hours |
| Other-x-Guitar | other-x-guitar | Residual instrumentation after removing vocals, drums, bass, and guitar. | 1.0 | 3 Hours |
To include a residual stem in your results, set "residual": true in the target metadata when creating your task. For more info, contact support@audioshake.ai
Example — Using Models in a Tasks API Request
curl -sS -X POST "https://api.audioshake.ai/tasks" \
-H "x-api-key: $AUDIOSHAKE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
"targets": [
{ "model": "vocals", "formats": ["wav"] },
{ "model": "instrumental", "formats": ["wav"] },
{ "model": "transcription", "formats": ["json"], "language": "en" }
]
}'
Dialogue, Music, & Effects
| Name | Model | Description | Credits / Minute | Max Length |
|---|---|---|---|---|
| Dialogue | dialogue | Isolates speech or vocals from any other sound | 1.5 | 3 Hours |
| Speech Clarity | speech_clarity | Isolates and improves intelligibility of all speech in highly noisy or low-resolution environments | 1.5 | 3 Hours |
| Effects | effects | Removes dialogue and music but retains the ambience, sound effects, and environmental noise | 1.5 | 3 Hours |
| Music removal | music_removal | Removes music from audio while retaining dialogue, background effects, and natural sound | N/A | 1 Hour |
| Background (Music & FX) | music_fx | Removes dialogue to extracting a clean background stem of music and effects | 1.5 | 3 Hours |
| Music detection | music_detection | Detects the portions of an audio file that contain music | 0.5 | 3 Hours |
| Multi-Voice | multi_voice | Separates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. | N/A | 1 Hour |
Currently Music Removal and Multi-Voice separation are not available via the /tasks route. Please contact support@audioshake.ai for access.
Transcription & Alignment
| Name | Model Name | Description | Credits / Minute | Max Length |
|---|---|---|---|---|
| Transcription | transcription | Text representation of spoken words or audio content | 1 | 1 Hour |
| Alignment | alignment | Synchronization of audio and corresponding text or captions | 1 | 1 Hour |
Alignment-Only Targets
You can run Alignment as a standalone target. If you don't provide a transcript, alignment will automatically generate one for you. If you already have an accurate transcript, you can provide it to skip transcription and only generate synchronized timestamps.
Provide either a transcriptUrl (public URL to your transcript file) or a transcriptAssetId (if you've already uploaded the transcript as an asset).
curl -sS -X POST "https://api.audioshake.ai/tasks" \
-H "x-api-key: $AUDIOSHAKE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
"assetId": "",
"targets": [
{
"model": "alignment",
"formats": ["json", "txt", "srt"],
"transcriptUrl": "",
"transcriptAssetId": ""
}
]
}'
Use url or assetId for your source audio/video file, and transcriptUrl or transcriptAssetId within the target for your transcript. You only need to provide one of each pair.