Build your Dataset

apiIn this section, you will find a detailed list of tips and recommendations to ensure the quality and consistency of the audio used in your dataset. These guidelines will help you compile high quality recordings that will be used to train your voice model. Follow these recommendations to ensure accurate and improved results during the training of your model.

In this list we understand that you have already recorded your files and that you are going to start the per-processing process of the audios to have them ready. If you don't have these audios yet, you have a list of recommendations in this link. If you have files with instrumentals we are currently developing an API to separate instrumentals and vocals using UVR or use their program directly(you can download in this link).

At least 15 minutes of dry (no effects) and monophonic (one note at a time) vocal recordings are required.
It's best to have examples that cover your entire range. Chest, blend, falsetto; big and short intervals; high and clean notes; etc. The more variety, the better.
clean EQd (subtractive) to reduce muddy or harsh frequencies in the recording
subtly pitch corrected (slow attack, moderate strength) unless it's a key part of the vocal style
De-essed to reduce any harsh sibilance
compressed lightly to even out dynamic range/reduce peaks (~4-5db of gain reduction at most)
boosted (additive EQd) to fit the style of the vocal
limited to a peak of -6db with overall levels between -6 and -12db.
high/low passed to remove frequencies below 40hz–100hz and above 20khz
phase re-balanced

PreviousRate Limits NextAudio Recording

Last updated 1 year ago