Resemble AI supports four ways to upload custom datasets to the platform:
In any of the scenarios, we recommend uploading at least 20 minutes of audio data.
Upload a single audio file in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate.
Upload a zip or tarball that contains a single audio file and a transcript file. The audio file must be in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate. The transcript must be a .txt file.
Upload a zip or tarball that contains multiple audio files and a transcript file in a CSV format. The audio file must be in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate. The transcript must be a CSV.
Each audio file should be between 1.5 to 12 seconds in duration.
Folder Structure
The folder structure you upload must contain a wavs folder and a metadata.csv file. For example:
data/
metadata.csv
wavs/
wav1.wav
wav2.wav
wav3.wav
Transcript Details
The metadata.csv must be split by |
and should contain the base filename and the transcription. See the following example (** note that we only use the base filename and remove the extension):
wav1|this is what is in my file
wav2|please remove the extensions
wav3|each file should be between 1.5 to 12 seconds long
Upload a zip or tarball that contains multiple audio files. The audio file must be in RIFF (.wav) PCM, 16-bit or 24-bit format at 8khz, 16khz, 22khz, 44khz or 48khz sampling rate.