Setting Up for Local Training (WIP)

This guide is in WIP so I'm not 100% sure if I wrote everything, if there's a mistake or an update, let us know on Uberduck's discord.

So you choose for local training it's because of your GPU? Just kidding i wasn't gonna judge on your GPU... unless... anyways, after you have installed python/conda and some dependencies, time for the real job.

EDIT: We'll be using Uberduck's training pipeline tho because, well, it's easier than that.

First things first, as it's written on this Github page:

GitHub - uberduck-ai/uberduck-ml-dev: ML models for UberduckGitHub

We must create (another) environment for python 3.8

But Awesome, didn't we created this already for tacotron2?

You might be right, but you can delete environment if you want to, or keep it.

The reason why it's because this will not work on 3.7, and i remind you that, that if you gonna use THAT environment you will get yourself an errors so good luck fixing it.

Now, what we need to do is to create an environment, so what you need to do, is open conda console, and type this command:

conda create -n 'uberduck-ml-dev' python=3.8

Alright, cool that you did that, next's up, is typing source command!

But wait a second... I'm on windows, not on linux, right?

So let's say this, if you are on windows, you can probably skip that part, otherwise if you are on linux, then you better type a command like that:

source activate uberduck-ml-dev

As it's written here:

The source command reads and executes commands from the file specified as its argument in the current shell environment. It is useful to load functions, variables, and configuration files into shell scripts.

Alright and the last thing to do is using a pip command inside the environment, this will download everything it got:

pip install git+git://github.com/uberduck-ai/uberduck-ml-dev

Cool, now that we got git and some dependencies it's time to move on. BUT, before we do that, we need to take a peak on that training thing. So let's take a peak on google's colab page of training shall we?

What's interesting we've got is that, we'll need to do something. You may notice those commands that works on linux right? Yeah right, HOWEVER, we'll do also on Windows since, you are using Windows too right...? Right? everyone's using Windows.

So let's start step-by-step on how it goes.

First up, is nvidia-smi command, now we might know how it works tho, but if you don't know, it just shows you info about what GPU you have, and version of CUDA. This is most likely very important.

For example you may have RTX 3K series with some 32 GB VRAM, that's cool and all, you might be able to train smoothly, But again that depends on VRAM, pay attention to that.

Oh yeah by the way, before you CAN do that training, we need to install CUDA toolkit, I know how we can get it, but I'll tell that later, Moving on...

We've installed already Uberduck's training pipeline so let's just skip to the other one.

Next we have to create projects folder, what you need is to redirect where you are at (Desktop, external hard drive whatever) Just do it like this: Right click -> New folder and type whatever name you need, then inside of that folder create another one and type project simple right?

For linux, when you are inside of that folder just type mkdir project

The next thing you will do is that, well. You pull everything you have in your dataset, your transcript text and wavs folder. Well go on, put it in to the projects folder.

Next is mounting drive, it's just specifically for google let's just move on.

Now specifically, we need to download that tacotron2_statedict.pt file, you can go and use wget on Linux, but again if you are on windows, don't worry, just download this file and save it in to your project folder:

https://github.com/johnpaulbin/uberduck-ml-dev/releases/download/v1/tacotron2_statedict.pt

Next step is this one, an ffmpeg stuff. Basically, it converts your unconverted audio files in to something that needs for tacotron2. Which if you have MP3/WAVS(that doesn't meet all requirements)/FLAC then it will convert in to some requirements for tacotron2. But that script probably requires ether working linux or it might work on python, I'm not 100% sure, I have not tested this, But again, if you haven't converted your audio files on windows then here's a chance:

FFmpeg/FFBatch Tutorial (WIN/MAC/Linux Only)

Okay now this, is what you need. This is very important if you load this script, or else you might get in errors. What you want to do is that, well, copy that code I'll provide here and then paste it in your IDE or Notepad++ and save that python file in project folder:

suffix.py

import fileinput
for line in fileinput.input('list.txt', inplace=1):
    print('{0}{1}'.format(line.rstrip('\n'), '|0'))

Now, you might wanna edit for that on your transcript, because if you gonna do multi-speaker, better edit for that.

Let's say, One person is |0 and the other person is |1 and so on.

Now you have to download torchmoji models, it helps to predict some emotions, basically. But again, can't access wget on windows? no worries, just use these links and save them into your project file:

Alright cool, our last step is to write a config file. Basically, let's examine this config file and what does it do:

{
    "batch_size": 18, // Depends on your VRAM, very important to pay attention.
    "checkpoint_name": "uberduck_model", // Rename into something else or keep it.
    "checkpoint_path": "checkpoints", // Create a folder for it and it'll save there.
    "cudnn_enabled": true, // I recommend not to touch it.
    "dataset_path": ".", // That one too ether.
    "debug": false, // Dunno what that is but probably don't touch it.
    "distributed_run": false, // Ignore it.
    "epochs": 5001, // Depends how many times it trains. If it stops there at value 5001 it stops.
    "epochs_per_checkpoint": 10, // Now I recommend going ether 10 or 5, because that's just even better.
    "fp16_run": false, // Ignore it.
    "include_f0": false, // Ignore it.
    "learning_rate": 5e-4, // You can pull 3 or 4 but 4 is recommended.
    "log_dir": "runs", // Create a folder
    "n_speakers": 1, // Now this depends on how many speakers you have, if you have 2 people, change value in to 2.
    "p_arpabet": 0.0, // If your transcript has arpabet, change in to 1.0
    "has_speaker_embedding": true, // Ignore.
    "sample_inference_speaker_ids": [0], // Ignore.
    "sample_rate": 22050, // Ignore.
    "steps_per_sample": 50, // Ignore.
    "text_cleaners": ["english_cleaners"], // Ignore.
    "max_decoder_steps": 1000, // Ignore.
 
 
    "training_audiopaths_and_text": "transcription.txt", // Change these two in to your transcription file.
    "val_audiopaths_and_text": "transcription.txt",
 

    "warm_start_name": "checkpoints/tacotron2_statedict.pt", // Basically, that'll still save on checkpoint, if you want continue, go to checkpoint folder and copy and paste model file name in there,
                                                 // E.G checkpoints/tacotron2_130.pt
    "ignore_layers": ["speaker_embedding.weight"], // Just ignore below that, you don't need to change.
    "seed": 123,
    "gst_dim": 2304,
    "gst_type": "torchmoji",
    "torchmoji_vocabulary_file":"vocabulary.json",
    "torchmoji_model_file": "pytorch_model.bin"
}

Wow, what a long config file. *COUGH* anyways, i provide you everything in comments, so go ahead copy, edit and save it on your project file.

Alright FINALLY now we are getting in to a fun part, training.

BUT, WAIT A SECOND!!!!

Remember when I said we need CUDA Toolkit? Yeah... So we need to download that, without that it will train your CPU instead, and we don't want that.

So first things first, you have to go in to NVIDIA Developers website:

NVIDIA DeveloperNVIDIA Developer

And the next thing you have to do is click that big top button JOIN:

They won't just give a download without developer account, you must have developer account or you won't download CUDA Toolkit just like that.

Provide an email and then you gonna have to write everything about what you wanna do with NVIDIA Developer thing, And maybe a reason why you need developer account.

Now once that's done, click DOWNLOADS

Then what we are looking for is this:

If you scroll down you'll find this menu, once that's done, click on it.

Now after that, click Download Now, and then choose which to install:

If you have lower graphics card however, you gonna have to go for versions and select on what CUDA toolkit version you are about to install.

If you have CUDA 11+ (which RTX 3K/2K Series) then you might go for it.

If you have CUDA 10 (which GTX 1080 Ti or GTX 1660 Ti) then go for 10.

Alright now that we got that, install it (with examples and documentation if you want to) and boom. You have CUDA toolkit installed on your computer and it will be added automatically as path. Alright moving on to training locally.

Execute with this command: python -m uberduck_ml_dev.exec.train_tacotron2 --config "tacotron2_config.json"

To actually see what's going on, you'll need some Tensorboard, if you don't have Tensorboard yet then type this:

pip install tensorboard

After that, do not close terminal for training, open another conda environment/cmd window and then activate virtual environment that you just created, and then type this command:

tensorboard --logdir=checkpoints/logs

The thing is, it saves logs inside checkpoints folder but that's the thing. but if you did another directory then change logs directory in to the ones that you wrote.

And now you should see this window:

The only one thing we should pay attention, training.loss and validation.loss.

validation.loss should be at least 0.15.

Most likely alignment should be the same stonks as it is aka graph going up.

When is it good to go, press CTRL+C on terminal to stop.

To give it a listen, go ahead and share your model via Gdrive and go to this colab:

Google Colaboratory

Now once that's done you'll see this page:

You'll figure out yourself, SEE YA-

But Awesome, I cannot do it myself...

... *sigh* okay here's a thing:

Click play on setup cell
Then select which inferencing will be, I suggest nvidia_taco2
Paste your model id that you uploaded with gdrive ID AND REMEMBER NOT LINK ITSELF, NOT. READY ON "TESTING VOICE MODEL" PAGE
You can also check for torchmoji's, that way it will have some emotions.
Now go ahead type something in input and AYYYYY YA DID IT!

So see, it's not that complicated.

But... What about this?

YOU STILL HAVEN'T FIGURED OUT YET?!? 🤦‍♂️

Alright... There's "Testing Voice Model" page that you can go:

Testing Voice Model (CPU)

Then you'll see gdrive tutorial, okay? cool, now you getting it.

Well you are done for that, jeez, what a long stage we were doing in.

Thanks for reading this long guide and hopefully you'll be able to train it locally on your machine.

PreviousAdvanced Training Page NextTesting Voice Model

Last updated 3 years ago

Was this helpful?