Resemble AI supports four ways to upload custom datasets to the platform:

In any of the scenarios, we recommend uploading at least 20 minutes of audio data.

Single Audio File

Upload a single audio file in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate.

Single Audio File + Transcript

Upload a zip or tarball that contains a single audio file and a transcript file. The audio file must be in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate. The transcript must be a .txt file.

Multiple Audio Files + Transcripts

Upload a zip or tarball that contains multiple audio files and a transcript file in a CSV format. The audio file must be in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate. The transcript must be a CSV.

Each audio file should be between 1.5 to 12 seconds in duration.

Folder Structure

The folder structure you upload must contain a wavs folder and a metadata.csv file. For example:

data/
  metadata.csv
  wavs/
    wav1.wav
    wav2.wav
    wav3.wav

Transcript Details

The metadata.csv must be split by | and should contain the base filename and the transcription. See the following example (** note that we only use the base filename and remove the extension):

wav1|this is what is in my file
wav2|please remove the extensions
wav3|each file should be between 1.5 to 12 seconds long

Multiple Audio Files

Upload a zip or tarball that contains multiple audio files. The audio file must be in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate.