Simplified Training Page
Last updated
Last updated
Alright, so you've probably came across with this page. But things are hard with advanced page. But no worries, one of uberduck mod has made a simplified page so you would understand and it's also user-friendly.
Go ahead and open colab page.
Now once you have done clicking on it, you'll be then meet with this page:
Okay, here's a thing you should do, copy to drive so that you won't have to start over again and touch the user's colab page.
The next thing you should do is this, go ahead click on these 3 cells hidden:
First things first, we'll need to check for GPU. So as the list goes, if you have:
P100
V100
T4
Then it should be good to go, otherwise, if you have K80 or P4 then it will slows your process. Here how we can fix this.
Go to runtime and select factory reset runtime:
You should get this window pop-up:
If this window didn't pop up, that means it won't just factory reset and it needs this window to be popped up. If that didn't do that, then clear some cookies on your browser, disable adblock or open in a different browser.
Now once you clicked yes and selected check GPU as listed above, then you should be good to go.
Colab update: Apparently there's a 0.05% (I think) chance they might give you a good GPU, otherwise you may get stuck on the k80 so try not to factory reset it too much.
Then we will be clicking this cell:
This will prevent you to disconnect automatically.
Finally:
If you have a lot of datasets and I mean A LOT on your speech sound files, then i suggest compress zipping on it, I made tutorial for not only just sending dataset to the discord server but also for your training:
Next time you'll run up into this section:
Mount your google drive is pretty obvious, you just need your google account and that's all.
After mounted you wanna run this 2nd cell, this will install tacotron and created wavs folder
If you came across this problem:
That's, probably fine. You don't have to worry about this. Unless if someone says that those files are that important, I'll edit this guide in a moment.
Alright now before you run everything in to, go ahead and click this folder icon on the left side:
You will have this opened:
Here's what you want to do. Open tacotron2, and then you should see wavs folder:
What you wanna do is import .wav's into wavs folder, you have ether drag all .wav's into wavs folder, or just right click and select upload.
Also, do the same with filelists with your text document that you transcribed or else you'll end up with an error.
Now here comes this main part:
Here's what you should do:
model_filename is your model name to upload for uberduck.
Training_file is the one where you should place the path in there, which is "filelists/*textname*.txt"
hparams.batch_size is the ones where tacotron will try to train, depending on GPU power. Please don't put too much or you'll end up with an error out of memory. So I recommend putting for free users 20.
output_directory keep it as it is. or if you want to change file location for it, go for it.
One more thing I'd like to point out is that you can edit epochs now, i have no idea what it is but it probably depends for your wav files, but if that doesn't sound good for you, you can just increase epochs a bit.
Now once that's done out of the way, click play on this cell and moving on to the next section:
This first cell will convert all wavs into mel-spectrograms, which is maybe a picture for spectrograms, but yeah. you get the idea. But basically, that will turn .wav's into .npy, which, I have no idea what that means but that's what tacotron needs it for.
Now what you'll do is click 2nd cell. This will check on what's missing, your text file or audio files.
Check on the troubleshooting page on what's missing or has errors to it:
If everything's done correctly that it says finished checked, you may now begin training by clicking the third cell.
Now you should wait, it takes time to train like this. So right now you'll see this picture:
Yeah, let tacotron train and once that's done you should have it like this:
That's the one you should have on your graph, especially your validation loss must be probably less than 0.1 (0.07 or 0.05, etc)
Don't let this train too long or it will be over-trained and that is gonna be suck so much.
So once that done, click the training cell again to stop BUT PRESS IT ONCE, OR ELSE YOU'LL SCREW THIS UP!
It should end up like this:
The keyboard interrupt should be fine, that means you have stopped training and your model is ready to be tested with synthesis.
Congratulations, you have now done training the model. You are now ready to listen to what your model sounds like.