Setting Up for Local Training (WIP)
This guide is in WIP so I'm not 100% sure if I wrote everything, if there's a mistake or an update, let us know on Uberduck's discord.
Last updated
This guide is in WIP so I'm not 100% sure if I wrote everything, if there's a mistake or an update, let us know on Uberduck's discord.
Last updated
So you choose for local training it's because of your GPU? Just kidding i wasn't gonna judge on your GPU... unless... anyways, after you have installed python/conda and some dependencies, time for the real job.
EDIT: We'll be using Uberduck's training pipeline tho because, well, it's easier than that.
First things first, as it's written on this Github page:
We must create (another) environment for python 3.8
But Awesome, didn't we created this already for tacotron2?
You might be right, but you can delete environment if you want to, or keep it.
The reason why it's because this will not work on 3.7, and i remind you that, that if you gonna use THAT environment you will get yourself an errors so good luck fixing it.
Now, what we need to do is to create an environment, so what you need to do, is open conda console, and type this command:
Alright, cool that you did that, next's up, is typing source command!
But wait a second... I'm on windows, not on linux, right?
So let's say this, if you are on windows, you can probably skip that part, otherwise if you are on linux, then you better type a command like that:
As it's written here:
The
source
command reads and executes commands from the file specified as its argument in the current shell environment. It is useful to load functions, variables, and configuration files into shell scripts.
Alright and the last thing to do is using a pip command inside the environment, this will download everything it got:
Cool, now that we got git and some dependencies it's time to move on. BUT, before we do that, we need to take a peak on that training thing. So let's take a peak on google's colab page of training shall we?
What's interesting we've got is that, we'll need to do something. You may notice those commands that works on linux right? Yeah right, HOWEVER, we'll do also on Windows since, you are using Windows too right...? Right? everyone's using Windows.
So let's start step-by-step on how it goes.
First up, is nvidia-smi
command, now we might know how it works tho, but if you don't know, it just shows you info about what GPU you have, and version of CUDA. This is most likely very important.
For example you may have RTX 3K series with some 32 GB VRAM, that's cool and all, you might be able to train smoothly, But again that depends on VRAM, pay attention to that.
Oh yeah by the way, before you CAN do that training, we need to install CUDA toolkit, I know how we can get it, but I'll tell that later, Moving on...
We've installed already Uberduck's training pipeline so let's just skip to the other one.
Next we have to create projects folder, what you need is to redirect where you are at (Desktop, external hard drive whatever) Just do it like this: Right click -> New folder and type whatever name you need, then inside of that folder create another one and type project
simple right?
For linux, when you are inside of that folder just type mkdir project
The next thing you will do is that, well. You pull everything you have in your dataset, your transcript text and wavs folder. Well go on, put it in to the projects folder.
Next is mounting drive, it's just specifically for google let's just move on.
Now specifically, we need to download that tacotron2_statedict.pt file, you can go and use wget on Linux, but again if you are on windows, don't worry, just download this file and save it in to your project folder:
Next step is this one, an ffmpeg stuff. Basically, it converts your unconverted audio files in to something that needs for tacotron2. Which if you have MP3/WAVS(that doesn't meet all requirements)/FLAC then it will convert in to some requirements for tacotron2. But that script probably requires ether working linux or it might work on python, I'm not 100% sure, I have not tested this, But again, if you haven't converted your audio files on windows then here's a chance:
Okay now this, is what you need. This is very important if you load this script, or else you might get in errors. What you want to do is that, well, copy that code I'll provide here and then paste it in your IDE or Notepad++ and save that python file in project folder:
Now, you might wanna edit for that on your transcript, because if you gonna do multi-speaker, better edit for that.
Let's say, One person is |0 and the other person is |1 and so on.
Now you have to download torchmoji models, it helps to predict some emotions, basically. But again, can't access wget on windows? no worries, just use these links and save them into your project file:
Alright cool, our last step is to write a config file. Basically, let's examine this config file and what does it do:
Wow, what a long config file. *COUGH* anyways, i provide you everything in comments, so go ahead copy, edit and save it on your project file.
Alright FINALLY now we are getting in to a fun part, training.
BUT, WAIT A SECOND!!!!
Remember when I said we need CUDA Toolkit? Yeah... So we need to download that, without that it will train your CPU instead, and we don't want that.
So first things first, you have to go in to NVIDIA Developers website:
And the next thing you have to do is click that big top button JOIN:
They won't just give a download without developer account, you must have developer account or you won't download CUDA Toolkit just like that.
Provide an email and then you gonna have to write everything about what you wanna do with NVIDIA Developer thing, And maybe a reason why you need developer account.
Now once that's done, click DOWNLOADS
Then what we are looking for is this:
If you scroll down you'll find this menu, once that's done, click on it.
Now after that, click Download Now, and then choose which to install:
If you have lower graphics card however, you gonna have to go for versions and select on what CUDA toolkit version you are about to install.
If you have CUDA 11+ (which RTX 3K/2K Series) then you might go for it.
If you have CUDA 10 (which GTX 1080 Ti or GTX 1660 Ti) then go for 10.
Alright now that we got that, install it (with examples and documentation if you want to) and boom. You have CUDA toolkit installed on your computer and it will be added automatically as path. Alright moving on to training locally.
Execute with this command: python -m uberduck_ml_dev.exec.train_tacotron2 --config "tacotron2_config.json"
To actually see what's going on, you'll need some Tensorboard, if you don't have Tensorboard yet then type this:
After that, do not close terminal for training, open another conda environment/cmd window and then activate virtual environment that you just created, and then type this command:
The thing is, it saves logs inside checkpoints folder but that's the thing. but if you did another directory then change logs directory in to the ones that you wrote.
And now you should see this window:
The only one thing we should pay attention, training.loss and validation.loss.
validation.loss should be at least 0.15.
Most likely alignment should be the same stonks as it is aka graph going up.
When is it good to go, press CTRL+C on terminal to stop.
To give it a listen, go ahead and share your model via Gdrive and go to this colab:
Now once that's done you'll see this page:
You'll figure out yourself, SEE YA-
But Awesome, I cannot do it myself...
... *sigh* okay here's a thing:
Click play on setup cell
Then select which inferencing will be, I suggest nvidia_taco2
Paste your model id that you uploaded with gdrive ID AND REMEMBER NOT LINK ITSELF, NOT. READY ON "TESTING VOICE MODEL" PAGE
You can also check for torchmoji's, that way it will have some emotions.
Now go ahead type something in input and AYYYYY YA DID IT!
So see, it's not that complicated.
But... What about this?
Alright... There's "Testing Voice Model" page that you can go:
Then you'll see gdrive tutorial, okay? cool, now you getting it.
Well you are done for that, jeez, what a long stage we were doing in.
Thanks for reading this long guide and hopefully you'll be able to train it locally on your machine.
YOU STILL HAVEN'T FIGURED OUT YET?!?