Models

AudioShake’s Models define the specific type of audio processing applied to your content.
Each model represents a distinct audio processing operation—such as isolating vocals, removing background music, or transcribing dialogue—and can be combined within a single Tasks API request to produce multiple outputs from the same source file.

Models are organized by use case:

Instrument Stem Separation — break down songs into individual components like vocals, drums, and bass.
Dialogue, Music, and Effects — isolate voices or remove background elements for film, TV, and dubbing.
Transcription and Alignment — convert spoken content into synchronized text and timestamps.

Use these models to design flexible workflows for music production, post-production, accessibility, and AI data preparation.

Instrument Stem Separation

These models isolate or extract musical components from a mixed track.
They’re useful for remixing, immersive audio, gaming, and music education.
All models can be called via the /tasks route and support standard formats like WAV, MP3, or FLAC.

Name	Model Key	Description	Credits / Minute	Max Length
Vocals	`vocals`	Extracts vocal elements from a mix.	1.0	3 Hours
Lead Vocals	`vocals_lead`	Vocal performances carrying the primary melodic or lyrical content of the track. Excellent for karoke.	1.0	3 Hours
Backing Vocals	`vocals_backing`	Extracts only the backing vocals including harmonies, chants, ad-libs, and choirs.	1.0	3 Hours
Instrumental	`instrumental`	Generates an instrumental-only version by removing vocals.	1.0	3 Hours
Drums	`drums`	Isolates percussion and rhythmic elements.	1.0	3 Hours
Bass	`bass`	Separates bass instruments and low-frequency sounds.	1.0	3 Hours
Guitar	`guitar`	Isolates guitar stems (acoustic, electric, classical).	1.0	3 Hours
Electric Guitar	`guitar_electric`	Isolates electric guitar stems.	1.0	3 Hours
Acoustic Guitar	`guitar_acoustic`	Isolates acoustic guitar stems (including classical guitar).	1.0	3 Hours
Piano	`piano`	Extracts only acoustic piano.	1.0	3 Hours
Keys	`keys`	Extracts all keyboard instruments including piano, electric piano, organ, etc.	1.0	3 Hours
Strings	`strings`	Isolates orchestral string instruments like violin, cello, and viola.	1.0	3 Hours
Wind	`wind`	Extracts wind instruments such as flute and saxophone.	1.0	3 Hours
Other	`other`	Captures remaining instrumentation after main stems are removed.	1.0	3 Hours
Other-x-Guitar	`other-x-guitar`	Residual instrumentation after removing vocals, drums, bass, and guitar.	1.0	3 Hours

Residual Stems

To include a residual stem in your results, set "residual": true in the target metadata when creating your task. For more info, contact support@audioshake.ai

Example — Using Models in a Tasks API Request

curl -sS -X POST "https://api.audioshake.ai/tasks" \
  -H "x-api-key: $AUDIOSHAKE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
    "targets": [
      { "model": "vocals", "formats": ["wav"] },
      { "model": "instrumental", "formats": ["wav"] },
      { "model": "transcription", "formats": ["json"], "language": "en" }
    ]
  }'

Dialogue, Music, & Effects

Name	Model	Description	Credits / Minute	Max Length
Dialogue	dialogue	Isolates speech or vocals from any other sound	1.5	3 Hours
Speech Clarity	speech_clarity	Isolates and improves intelligibility of all speech in highly noisy or low-resolution environments	1.5	3 Hours
Effects	effects	Removes dialogue and music but retains the ambience, sound effects, and environmental noise	1.5	3 Hours
Music removal	music_removal	Removes music from audio while retaining dialogue, background effects, and natural sound	N/A	1 Hour
Background (Music & FX)	music_fx	Removes dialogue to extracting a clean background stem of music and effects	1.5	3 Hours
Music detection	music_detection	Detects the portions of an audio file that contain music	0.5	3 Hours
Multi-Voice	multi_voice	Separates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker.	N/A	1 Hour

Music Removal & Multi-Voice Availability

Currently Music Removal and Multi-Voice separation are not available via the /tasks route. Please contact support@audioshake.ai for access.

Transcription & Alignment

Name	Model Name	Description	Credits / Minute	Max Length
Transcription	transcription	Text representation of spoken words or audio content	1	1 Hour
Alignment	alignment	Synchronization of audio and corresponding text or captions	1	1 Hour

Alignment-Only Targets

You can run Alignment as a standalone target. If you don't provide a transcript, alignment will automatically generate one for you. If you already have an accurate transcript, you can provide it to skip transcription and only generate synchronized timestamps.

Provide either a transcriptUrl (public URL to your transcript file) or a transcriptAssetId (if you've already uploaded the transcript as an asset).

curl -sS -X POST "https://api.audioshake.ai/tasks" \
  -H "x-api-key: $AUDIOSHAKE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
    "assetId": "",
    "targets": [
      {
        "model": "alignment",
        "formats": ["json", "txt", "srt"],
        "transcriptUrl": "",
        "transcriptAssetId": ""
      }
    ]
  }'

tip

Use url or assetId for your source audio/video file, and transcriptUrl or transcriptAssetId within the target for your transcript. You only need to provide one of each pair.

Instrument Stem Separation​

Example — Using Models in a Tasks API Request​

Dialogue, Music, & Effects​

Transcription & Alignment​

Alignment-Only Targets​

Instrument Stem Separation

Example — Using Models in a Tasks API Request

Dialogue, Music, & Effects

Transcription & Alignment

Alignment-Only Targets