アプリケーション開発ポータルサイト
ServerNote.NET
Amazon.co.jpでPC関連商品タイムセール開催中!
カテゴリー【UbuntuHardwarePython
TSUKUMOのマルチGPUパソコンWA9J-X211/XTのUbuntu Server 22.04でMYCOEIROINK作成コードを実行し音声合成モデルをトレーニングする
POSTED BY
2024-04-18

TSUKUMOのマルチGPUパソコンWA9J-X211/XTにUbuntu Server 20.04.2 LTSを導入する【1】
TSUKUMOのマルチGPUパソコンWA9J-X211/XTにUbuntu Server 20.04.2 LTSを導入する【2】
TSUKUMOのマルチGPUパソコンWA9J-X211/XTにUbuntu Server 22.04とVOICEVOX Engineを導入する
TSUKUMOのマルチGPUパソコンWA9J-X211/XTにUbuntu Server 22.04とCOEIROINK Engineを導入する

前回TSUKUMO-GPU-PC+UbuntuにCOEIROINKを導入することに成功したので、いよいよオリジナルの音声モデルをこのTSUKUMO-PC-Ubuntu上のデュアルGPUで作成したいです。公式ではそれをMYCOEIROINKといいますが、Google Colab上で動く作成コードが公開されています。

https://colab.research.google.com/drive/1BqaB-Zv5RuaQp-OW0effsFVGCYwvaJ4R?usp=sharing

要はこのコードと同様のことを、Ubuntu上のPythonコマンドとして打っていけば、理論上はローカルマシンのGPUを使ってトレーニングできるはずで、結果成功したのでメモします。

作業ディレクトリの作成

なるべくColabのコードがそのまま使えるよう、同じフォルダを作ります。

sudo -s
cd /
mkdir content
chown hogeuser:hogegroup content

容量に余裕のあるパーティションに作ってシンボリックリンクでもいいですが、ここでは直接ルート/にcontentフォルダを作り、一般ユーザに所有変更しました。ここからの作業は一般ユーザで行います。

cd /content
mkdir -p drive/MyDrive/MYCOEIROINK_WORK
cd drive/MyDrive/MYCOEIROINK_WORK
mkdir downloads voices

「必要なデータのダウンロード」を行う

wget 'https://www.dropbox.com/scl/fi/1rklec40ro74lbr0udgcr/dummy_speaker_info_2.zip?rlkey=dr984tgbbw8ur26rgkagc3kc1&dl=1' -O '/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/dummy_speaker_info_2.zip'
wget 'https://www.dropbox.com/s/uph2t4e19t4bvr9/mycoe_pretrain_model.zip?dl=1' -O '/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/mycoe_pretrain_model.zip'
wget 'https://www.dropbox.com/s/fkv8hvt1y82hsh6/mycoe_pretrain_model_2.zip?dl=1' -O '/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/mycoe_pretrain_model_2.zip'
wget 'https://www.dropbox.com/s/vq9cfizgufdxvry/mycoe_pretrain_model_3.zip?dl=1' -O '/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/mycoe_pretrain_model_3.zip'

サンプル読み上げ音声データをダウンロードする

ひとまずトレーニングを試したいので、すでに録音していただいているあみたろさんのコーパス読み上げ録音データをvoicesにダウンロードします。

wget 'https://amitaro.net/download/corpus/ITAcorpus_amitaro_forMYCOEIROINK%202.2.zip' -O '/content/drive/MyDrive/MYCOEIROINK_WORK/voices/ITA_CORPUS.zip'

Python3.9仮想環境の作成と有効化

MYCOEIROINKはPython3.9を想定しているようなのでセットアップします。こちらの記事などでPython3.9環境下のvenvが動くようにしておきます。

cd /content/drive/MyDrive/MYCOEIROINK_WORK
python3.9 -m venv --without-pip venv
source ./venv/bin/activate
(venv) curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
(venv) python3 get-pip.py

「zip内の内容確認」「音声の前処理」コードの作成と実行

corpus_managerを手動でクローンしておく

cd /content
git clone https://github.com/shirowanisan/coeiroink-corpus-manager.git

ソース内で使用されているモジュールをインストールしておく

(venv) pip install soundfile librosa numpy

以下の改変コードを/content/drive/MyDrive/MYCOEIROINK_WORK/mycoeiro_check_voice_zip.pyとして設置する

Pythonmycoeiro_check_voice_zip.pyGitHub Source
#@title 「zip内の内容確認」のために左の再生ボタンをおしてください
import glob
import shutil
import os
import soundfile as sf

colab_workspace = '/content/tmp'

if os.path.exists(colab_workspace):
  shutil.rmtree(colab_workspace)
os.makedirs(colab_workspace, exist_ok=True)

zip_folder = '/content/drive/MyDrive/MYCOEIROINK_WORK/voices'
if len(glob.glob(zip_folder + '/*.zip')) == 0:
  raise Exception(f"zipが見つかりません。")
zip_path = glob.glob(zip_folder + '/*.zip')[0]
workspace_zip_path = colab_workspace + '/wavs.zip'
wavs_folder = colab_workspace + '/wavs'
shutil.copyfile(zip_path, workspace_zip_path)
shutil.unpack_archive(workspace_zip_path, wavs_folder)

wav_paths = sorted(glob.glob(wavs_folder + '/**/*.wav', recursive=True))
wav_names = [wav_path.split('/')[-1].replace('.wav', '') for wav_path in wav_paths]

#!git clone https://github.com/shirowanisan/coeiroink-corpus-manager.git
with open('/content/coeiroink-corpus-manager/marged-corpus.txt', encoding='utf-8') as f:
  text = f.readlines()
ita_corpus_keys = [s.split(':')[0] for s in text]
ita_corpus_values = [s.split(':')[-1] for s in text]
ita_corpus_dict = dict(zip(ita_corpus_keys, ita_corpus_values))

for wav_name in wav_names:
  if wav_name not in ita_corpus_keys:
    raise Exception(f"「{wav_name}」というwavファイルが含まれており、このファイル名はMYCOEIROINK対象のコーパスに含まれていません。")

if len(wav_names) < 10:
  raise Exception(f"wavファイルの数が「{len(wav_names)}」ですが、wavファイルは10以上必要です。")

incorrect_fs_flag = False
incorrect_fs_list = []
for wav_path in wav_paths:
  wav, original_fs = sf.read(wav_path)
  if len(wav.shape) == 2:
    raise Exception(f"「{wav_path.split('/')[-1]}」が、ステレオの可能性があります。モノラルにしてください。")
  if original_fs != 44100:
    incorrect_fs_list.append(wav_path.split('/')[-1])
    incorrect_fs_flag = True
if incorrect_fs_flag:
  print("WARNING: 44.1kHz以外の音声が含まれています。MYCOEでは44.1kHz以外の音声は44.1kHzに変換して利用されます。")
  print(incorrect_fs_list)

#@title 音声の開始終了の無音区間を自動で削除する場合は「ON」にしてください。(開始終了の無音区間がない方が、音声生成の速度が速くなる傾向にあります。)

trim_flag = 'ON' #@param ["ON", "OFF"] {type: "string"}

import librosa
from librosa.util import normalize

MAX_WAV_VALUE = 32768.0
sampling_fs = 44100

normalized_wavs_path = '/content/normalized_wavs'
if os.path.exists(normalized_wavs_path):
  shutil.rmtree(normalized_wavs_path)
os.makedirs(normalized_wavs_path, exist_ok=True)

text = ''
# wavのサンプリング周波数と音量の調整とテキストの作成
for wav_path in wav_paths:
  wav_name = wav_path.split('/')[-1].replace('.wav', '')
  text += wav_name + ':' + ita_corpus_dict[wav_name]
  wav, original_fs = sf.read(wav_path)
  if original_fs != sampling_fs:
    wav = librosa.resample(wav, orig_sr=original_fs, target_sr=sampling_fs)
  if trim_flag == "ON":
    wav = librosa.effects.trim(wav, top_db=30)[0]
  normalized_wav = normalize(wav) * 0.90
  sf.write(normalized_wavs_path + '/' + wav_path.split('/')[-1], normalized_wav, sampling_fs, 'PCM_16')

corpus_path = '/content/corpus'
if os.path.exists(corpus_path):
  shutil.rmtree(corpus_path)
os.makedirs(corpus_path, exist_ok=True)

with open('/content/corpus/transcripts_utf8.txt', 'w', encoding='UTF-8') as f:
  f.write(text)

print(f"今回の学習に使われる音声の数は全部で「{len(wav_paths)}」個となっています。ご確認ください。")

実行する

(venv) python3 mycoeiro_check_voice_zip.py
今回の学習に使われる音声の数は全部で「424」個となっています。ご確認ください。

「Speaker_infoフォルダの作成」コードの作成と実行

手動で1つzipを解凍しておく

cd /content
unzip /content/drive/MyDrive/MYCOEIROINK_WORK/downloads/dummy_speaker_info_2.zip

以下の改変コードを/content/mycoeiro_speaker_info.pyとして設置する

Pythonmycoeiro_speaker_info.pyGitHub Source
#@title 「Speaker_infoフォルダの作成」のために左の再生ボタンをおしてください
#%cd /content/
#!unzip /content/drive/MyDrive/MYCOEIROINK_WORK/downloads/dummy_speaker_info_2.zip

import glob
import uuid
import random
import shutil
import json
import os

speaker_info_contents = sorted(glob.glob('/content/drive/MyDrive/MYCOEIROINK_WORK/speaker_info/*'))
if len(speaker_info_contents) == 0:
  speaker_uuid = str(uuid.uuid1())
  speaker_id = random.randint(10001, 2147483647)

  shutil.move('/content/dummy_speaker_info/icons/<speaker_id>.png', f"/content/dummy_speaker_info/icons/{speaker_id}.png")
  shutil.move('/content/dummy_speaker_info/voice_samples/<speaker_id>_001.wav', f"/content/dummy_speaker_info/voice_samples/{speaker_id}_001.wav")
  shutil.move('/content/dummy_speaker_info/voice_samples/<speaker_id>_002.wav', f"/content/dummy_speaker_info/voice_samples/{speaker_id}_002.wav")
  shutil.move('/content/dummy_speaker_info/voice_samples/<speaker_id>_003.wav', f"/content/dummy_speaker_info/voice_samples/{speaker_id}_003.wav")
  shutil.move('/content/dummy_speaker_info/model/<speaker_id>', f"/content/dummy_speaker_info/model/{speaker_id}")

  metas = {
      "speakerName": "MYCOEIROINK",
      "speakerUuid": speaker_uuid,
      "styles": [
          {
              "styleName": "のーまる",
              "styleId": speaker_id
          }
      ]
  }

  with open('dummy_speaker_info/metas.json', mode='w', encoding='utf-8') as f:
    json.dump(metas, f, indent=4, ensure_ascii=False)

  os.makedirs('/content/drive/MyDrive/MYCOEIROINK_WORK/speaker_info', exist_ok=True)
  shutil.copytree('/content/dummy_speaker_info', f"/content/drive/MyDrive/MYCOEIROINK_WORK/speaker_info/{speaker_uuid}")
#!rm -rf dummy_speaker_info

実行する

(venv) python3 mycoeiro_speaker_info.py

/content/drive/MyDrive/MYCOEIROINK_WORK/speaker_info/5bef76ca-fd2b-11ee-ad21-ffd84b7356b6

がセットアップされていることを確認し、dummy_speaker_infoを削除する。

cd /content
rm -Rf dummy_speaker_info

「モデルの作成ための環境構築1」を手動で行う

前半のpython3.9環境構築はすでにできているとして、後半部分を手動でセットアップ。

cd /content/drive/MyDrive/MYCOEIROINK_WORK

(venv) pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio===0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

git clone https://github.com/shirowanisan/espnet.git
cd espnet
git checkout 0.10.3
git branch
cd ..

(venv) pip install Cython==0.29.32
(venv) pip install pyopenjtalk==0.1.3 --no-build-isolation
(venv) pip install pysptk==0.2.0 --no-build-isolation
(venv) pip install -r espnet/requirements.txt

git clone https://github.com/kaldi-asr/kaldi
cd espnet/tools
ln -s ../../kaldi .
cd ../../
mkdir -p espnet/tools/venv/bin && touch espnet/tools/venv/bin/activate

「モデルの作成ための環境構築2」コードの作成と実行

以下の改変コードを/content/mycoeiro_build_env_2.pyとして設置する

Pythonmycoeiro_build_env_2.pyGitHub Source
#@title 「モデルの作成のための環境構築2」のために左の再生ボタンをおしてください

import os
import glob
import shutil

pretrained_model_tag = 'model_2' #@param ["model_1", "model_2", "model_3"] {type: "string"}

espnet_wavs_dir = '/content/espnet/egs2/mycoe/tts1/downloads/wavs'
os.makedirs(espnet_wavs_dir, exist_ok=True)

wav_paths = sorted(glob.glob('/content/normalized_wavs/*.wav'))

for wav_path in wav_paths:
  shutil.copyfile(wav_path, espnet_wavs_dir + '/' + wav_path.split('/')[-1])

#%cd /content/
if pretrained_model_tag == 'model_1':
  shutil.unpack_archive('/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/mycoe_pretrain_model.zip', '/content/')
if pretrained_model_tag == 'model_2':
  shutil.unpack_archive('/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/mycoe_pretrain_model_2.zip', '/content/')
if pretrained_model_tag == 'model_3':
  shutil.unpack_archive('/content/drive/MyDrive/MYCOEIROINK_WORK/downloads/mycoe_pretrain_model_3.zip', '/content/')

shutil.copyfile('/content/mycoe_pretrain_model/100epoch.pth', '/content/espnet/egs2/mycoe/tts1/downloads/100epoch.pth')
shutil.copyfile('/content/mycoe_pretrain_model/tokens.txt', '/content/espnet/egs2/mycoe/tts1/downloads/tokens.txt')
shutil.copyfile('/content/corpus/transcripts_utf8.txt', '/content/espnet/egs2/mycoe/tts1/downloads/transcripts_utf8.txt')

実行する

(venv) python3 mycoeiro_build_env_2.py

/content/{espnet, mycoe_pretrain_model}ができているのを確認する。

「モデルの作成ための環境構築3」を手動で行う

準備スクリプトの実行 すでに仮想環境の中にいるわけなので、setup_python.shが正解。(setup_venv.shではない)

(venv) cd /content/drive/MyDrive/MYCOEIROINK_WORK/espnet/tools
(venv) ./setup_python.sh $(which python3)
Warning: Setting PYTHONUSERBASE
Requirement already satisfied: pip in /content/drive/MyDrive/MYCOEIROINK_WORK/venv/lib/python3.9/site-packages (24.0)
Requirement already satisfied: wheel in /content/drive/MyDrive/MYCOEIROINK_WORK/venv/lib/python3.9/site-packages (0.43.0)

activate_python.shができていればOK

cat activate_python.sh
#!/usr/bin/env bash
# THIS FILE IS GENERATED BY tools/setup_python.sh
export PYTHONUSERBASE="/content/drive/MyDrive/MYCOEIROINK_WORK/espnet/tools/python_user_base"
export PATH="/content/drive/MyDrive/MYCOEIROINK_WORK/espnet/tools/python_user_base/bin":${PATH}
export PATH=/content/drive/MyDrive/MYCOEIROINK_WORK/venv/bin:${PATH}

そしてrun実行フォルダに行き、downloadsを退避して/content以下のものをrsyncでコピーする。

cd /content/drive/MyDrive/MYCOEIROINK_WORK/espnet/egs2/mycoe/tts1
mv downloads downloads.org
rsync -a /content/espnet/egs2/mycoe/tts1/downloads .

runスクリプトを実行する。

(venv) ./run.sh --stage 1 --stop-stage 5 --ngpu 1 --fs 44100 --n_fft 2048 --n_shift 512 --win_length null --dumpdir dump/44k --expdir /content/drive/MyDrive/MYCOEIROINK_WORK/exp --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/finetune.yaml

出力はこんな感じになった。

2024-04-18T12:57:12 (tts.sh:201:main) ./tts.sh --lang jp --feats_type raw --fs 24000 --n_fft 2048 --n_shift 300 --win_length 1200 --token_type phn --cleaner jaconv --g2p pyopenjtalk_prosody --train_config conf/finetune.yaml --inference_config conf/decode.yaml --train_set tr_no_dev --valid_set dev --test_sets dev eval1 --srctexts data/tr_no_dev/text --audio_format wav --stage 1 --stop-stage 5 --ngpu 1 --fs 44100 --n_fft 2048 --n_shift 512 --win_length null --dumpdir dump/44k --expdir /content/drive/MyDrive/MYCOEIROINK_WORK/exp --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/finetune.yaml
2024-04-18T12:57:12 (tts.sh:297:main) Stage 1: Data preparation for data/tr_no_dev, data/dev, etc.
2024-04-18T12:57:12 (data.sh:16:main) local/data.sh
2024-04-18T12:57:12 (data.sh:39:main) stage 0: local/data_prep.sh
finished making wav.scp, utt2spk, spk2utt.
finished making text.
Successfully finished data preparation.
2024-04-18T12:57:14 (data.sh:44:main) stage 2: utils/subset_data_dir.sh
utils/copy_data_dir.sh: copied data from data/train to data/tr_no_dev
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory data/tr_no_dev
utils/subset_data_dir.sh: reducing #utt from 424 to 5
utils/subset_data_dir.sh: reducing #utt from 424 to 5
2024-04-18T12:57:14 (data.sh:51:main) Successfully finished. [elapsed=2s]
2024-04-18T12:57:15 (tts.sh:313:main) Stage 2: Format wav.scp: data/ -> dump/44k/raw/
utils/copy_data_dir.sh: copied data from data/tr_no_dev to dump/44k/raw/org/tr_no_dev
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory dump/44k/raw/org/tr_no_dev
2024-04-18T12:57:15 (format_wav_scp.sh:42:main) scripts/audio/format_wav_scp.sh --nj 32 --cmd run.pl --audio-format wav --fs 44100 data/tr_no_dev/wav.scp dump/44k/raw/org/tr_no_dev
2024-04-18T12:57:15 (format_wav_scp.sh:110:main) [info]: without segments
2024-04-18T12:57:16 (format_wav_scp.sh:142:main) Successfully finished. [elapsed=1s]
utils/copy_data_dir.sh: copied data from data/dev to dump/44k/raw/org/dev
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory dump/44k/raw/org/dev
2024-04-18T12:57:17 (format_wav_scp.sh:42:main) scripts/audio/format_wav_scp.sh --nj 32 --cmd run.pl --audio-format wav --fs 44100 data/dev/wav.scp dump/44k/raw/org/dev
2024-04-18T12:57:17 (format_wav_scp.sh:110:main) [info]: without segments
2024-04-18T12:57:17 (format_wav_scp.sh:142:main) Successfully finished. [elapsed=0s]
utils/copy_data_dir.sh: copied data from data/dev to dump/44k/raw/org/dev
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory dump/44k/raw/org/dev
2024-04-18T12:57:18 (format_wav_scp.sh:42:main) scripts/audio/format_wav_scp.sh --nj 32 --cmd run.pl --audio-format wav --fs 44100 data/dev/wav.scp dump/44k/raw/org/dev
2024-04-18T12:57:18 (format_wav_scp.sh:110:main) [info]: without segments
2024-04-18T12:57:18 (format_wav_scp.sh:142:main) Successfully finished. [elapsed=0s]
utils/copy_data_dir.sh: copied data from data/eval1 to dump/44k/raw/eval1
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory dump/44k/raw/eval1
2024-04-18T12:57:19 (format_wav_scp.sh:42:main) scripts/audio/format_wav_scp.sh --nj 32 --cmd run.pl --audio-format wav --fs 44100 data/eval1/wav.scp dump/44k/raw/eval1
2024-04-18T12:57:19 (format_wav_scp.sh:110:main) [info]: without segments
2024-04-18T12:57:19 (format_wav_scp.sh:142:main) Successfully finished. [elapsed=0s]
2024-04-18T12:57:19 (tts.sh:441:main) Stage 3: Remove long/short data: dump/44k/raw/org -> dump/44k/raw
utils/copy_data_dir.sh: copied data from dump/44k/raw/org/tr_no_dev to dump/44k/raw/tr_no_dev
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory dump/44k/raw/tr_no_dev
fix_data_dir.sh: kept all 424 utterances.
fix_data_dir.sh: old files are kept in dump/44k/raw/tr_no_dev/.backup
utils/copy_data_dir.sh: copied data from dump/44k/raw/org/dev to dump/44k/raw/dev
utils/validate_data_dir.sh: WARNING: you have only one speaker.  This probably a bad idea.
   Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html
   for more information.
utils/validate_data_dir.sh: Successfully validated data-directory dump/44k/raw/dev
fix_data_dir.sh: kept all 5 utterances.
fix_data_dir.sh: old files are kept in dump/44k/raw/dev/.backup
2024-04-18T12:57:20 (tts.sh:500:main) Stage 4: Generate token_list from data/tr_no_dev/text
/content/drive/MyDrive/MYCOEIROINK_WORK/venv/bin/python3 /content/drive/MyDrive/MYCOEIROINK_WORK/venv/lib/python3.9/site-packages/espnet2/bin/tokenize_text.py --token_type phn -f 2- --input dump/44k/raw/srctexts --output dump/44k/token_list/phn_jaconv_pyopenjtalk_prosody/tokens.txt --non_linguistic_symbols none --cleaner jaconv --g2p pyopenjtalk_prosody --write_vocabulary true --add_symbol '<blank>:0' --add_symbol '<unk>:1' --add_symbol '<sos/eos>:-1'
Downloading: "https://downloads.sourceforge.net/open-jtalk/open_jtalk_dic_utf_8-1.11.tar.gz"
Extracting tar file /content/drive/MyDrive/MYCOEIROINK_WORK/venv/lib/python3.9/site-packages/pyopenjtalk/dic.tar.gz
2024-04-18 12:57:27,815 (tokenize_text:172) INFO: OOV rate = 0.0 %
2024-04-18T12:57:27 (tts.sh:529:main) Stage 5: TTS collect stats: train_set=dump/44k/raw/tr_no_dev, valid_set=dump/44k/raw/dev
2024-04-18T12:57:27 (tts.sh:616:main) Generate '/content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/run.sh'. You can resume the process from stage 5 using this script
2024-04-18T12:57:27 (tts.sh:620:main) TTS collect_stats started... log: '/content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/logdir/stats.*.log'
/content/drive/MyDrive/MYCOEIROINK_WORK/venv/bin/python3 /content/drive/MyDrive/MYCOEIROINK_WORK/venv/lib/python3.9/site-packages/espnet2/bin/aggregate_stats_dirs.py --input_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/logdir/stats.1 --input_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/logdir/stats.2 --input_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/logdir/stats.3 --input_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/logdir/stats.4 --input_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/logdir/stats.5 --output_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody
2024-04-18T12:57:33 (tts.sh:1142:main) Skip the uploading stages
2024-04-18T12:57:33 (tts.sh:1145:main) Successfully finished. [elapsed=21s]

後処理コマンドを手動で行う。

cd dump/44k/token_list/phn_jaconv_pyopenjtalk_prosody
mv tokens.txt tokens.bak.txt
cp /content/drive/MyDrive/MYCOEIROINK_WORK/espnet/egs2/mycoe/tts1/downloads/tokens.txt .

「モデルの学習スタート」を手動で行う

runスクリプトを実行する。TSUKUMO-PCはマルチGPUなので、ngpuを2に改変している。

(venv) ./run.sh --stage 6 --stop-stage 6 --ngpu 2 --fs 44100 --n_fft 2048 --n_shift 512 --win_length null --dumpdir dump/44k --expdir /content/drive/MyDrive/MYCOEIROINK_WORK/exp --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/finetune.yaml --train_args "--init_param downloads/100epoch.pth:tts:tts" --tag mycoe_model

2024-04-18T13:06:51 (tts.sh:201:main) ./tts.sh --lang jp --feats_type raw --fs 24000 --n_fft 2048 --n_shift 300 --win_length 1200 --token_type phn --cleaner jaconv --g2p pyopenjtalk_prosody --train_config conf/finetune.yaml --inference_config conf/decode.yaml --train_set tr_no_dev --valid_set dev --test_sets dev eval1 --srctexts data/tr_no_dev/text --audio_format wav --stage 6 --stop-stage 6 --ngpu 2 --fs 44100 --n_fft 2048 --n_shift 512 --win_length null --dumpdir dump/44k --expdir /content/drive/MyDrive/MYCOEIROINK_WORK/exp --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/finetune.yaml --train_args --init_param downloads/100epoch.pth:tts:tts --tag mycoe_model
2024-04-18T13:06:51 (tts.sh:665:main) Stage 6: TTS Training: train_set=dump/44k/raw/tr_no_dev, valid_set=dump/44k/raw/dev
2024-04-18T13:06:51 (tts.sh:836:main) Generate '/content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/run.sh'. You can resume the process from stage 6 using this script
2024-04-18T13:06:51 (tts.sh:841:main) TTS training started... log: '/content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/train.log'
2024-04-18 13:06:52,067 (launch:94) INFO: /content/drive/MyDrive/MYCOEIROINK_WORK/venv/bin/python3 /content/drive/MyDrive/MYCOEIROINK_WORK/venv/lib/python3.9/site-packages/espnet2/bin/launch.py --cmd 'run.pl --name /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/train.log' --log /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/train.log --ngpu 2 --num_nodes 1 --init_file_prefix /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/.dist_init_ --multiprocessing_distributed true -- python3 -m espnet2.bin.gan_tts_train --use_preprocessor true --token_type phn --token_list dump/44k/token_list/phn_jaconv_pyopenjtalk_prosody/tokens.txt --non_linguistic_symbols none --cleaner jaconv --g2p pyopenjtalk_prosody --normalize none --resume true --fold_length 150 --fold_length 409600 --output_dir /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model --config ./conf/finetune.yaml --feats_extract linear_spectrogram --feats_extract_conf n_fft=2048 --feats_extract_conf hop_length=512 --feats_extract_conf win_length=null --train_data_path_and_name_and_type dump/44k/raw/tr_no_dev/text,text,text --train_data_path_and_name_and_type dump/44k/raw/tr_no_dev/wav.scp,speech,sound --train_shape_file /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/train/text_shape.phn --train_shape_file /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/train/speech_shape --valid_data_path_and_name_and_type dump/44k/raw/dev/text,text,text --valid_data_path_and_name_and_type dump/44k/raw/dev/wav.scp,speech,sound --valid_shape_file /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/valid/text_shape.phn --valid_shape_file /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_prosody/valid/speech_shape --init_param downloads/100epoch.pth:tts:tts
2024-04-18 13:06:52,082 (launch:237) INFO: single-node with 2gpu on distributed mode
2024-04-18 13:06:52,086 (launch:348) INFO: log file: /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/train.log

/content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model/train.logに、トレーニング状況のログがたまっていく。nvida-smiでマルチGPUが使わているかを確認する。

nvidia-smi
Thu Apr 18 13:09:05 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:17:00.0 Off |                  N/A |
| 47%   55C    P2             228W / 350W |   8892MiB / 24576MiB |     54%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  | 00000000:65:00.0 Off |                  N/A |
| 46%   53C    P2             227W / 350W |   8470MiB / 24576MiB |     61%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     52767      C   ...e/MYCOEIROINK_WORK/venv/bin/python3     8884MiB |
|    1   N/A  N/A     52768      C   ...e/MYCOEIROINK_WORK/venv/bin/python3     8462MiB |
+---------------------------------------------------------------------------------------+

おおー、怪物GPU2台がデュアルで使われている!感動。

トレーニングが進んでいくと、ログと同じディレクトリに1epoch.pthなどエポック数を冠したモデルファイルができあがっていくので、100epochくらいできたら実用に耐えうるものができているらしい。

バックグラウンドで実行させておきたいので、Ctrl + Cで一旦終了する。なお、ps auxで見るとゾンビプロセスが結構残っているので、手動でkillするなど終了させる。

nohup+ログリダイレクトで再実行する。

nohup ./run.sh --stage 6 --stop-stage 6 --ngpu 2 --fs 44100 --n_fft 2048 --n_shift 512 --win_length null --dumpdir dump/44k --expdir /content/drive/MyDrive/MYCOEIROINK_WORK/exp --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/finetune.yaml --train_args "--init_param downloads/100epoch.pth:tts:tts" --tag mycoe_model > run.log 2>&1 &

実行状況はrun.log、トレーニング状況と結果データは/content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_modelに出力されていく。

COEIROINK本体に組み込んでデビューさせる

speaker_infoに良いころ合いのepochファイルとconfig.ymlをコピーし本体のspeaker_infoに入れるだけでOKです。
たとえば17エポック目の学習状態で本体に組み込むには、

cd /content/drive/MyDrive/MYCOEIROINK_WORK/exp/tts_mycoe_model
cp -p 17epoch.pth config.yaml ../../speaker_info/5bef76ca-fd2b-11ee-ad21-ffd84b7356b6/model/469203286/

とし、5bef76ca-fd2b-11ee-ad21-ffd84b7356b6を完成させ、

cd /content/drive/MyDrive/MYCOEIROINK_WORK/speaker_info
rsync -a 5bef76ca-fd2b-11ee-ad21-ffd84b7356b6 $HOME/voicevox_engine/speaker_info

などとして、本体COEIROINKのエンジン/speaker_infoにコピーして、本体を起動すればOK
あとはブラウザからAPIの確認で、このマシンのIPが172.21.16.39だとすると、

http://172.21.16.39:50031/docs

とブラウザを開くとテストページが出るので、/speakersをコール。Try it out→core_versionに0.0.0を指定しExecute。レスポンスに

[
  {
    "name": "リリンちゃん",
    "speaker_uuid": "cb11bdbd-78fc-4f16-b528-a400bae1782d",
    "styles": [
      {
        "name": "のーまる",
        "id": 90
      },
      {
        "name": "ささやき",
        "id": 91
      }
    ],
    "version": "1.0.3"
  },
  {
    "name": "MYCOEIROINK",
    "speaker_uuid": "5bef76ca-fd2b-11ee-ad21-ffd84b7356b6",
    "styles": [
      {
        "name": "のーまる",
        "id": 469203286
      }
    ],
    "version": "0.0.1"
  }
]

と、新しいモデルがデビューしていればOK!

※本記事は当サイト管理人の個人的な備忘録です。本記事の参照又は付随ソースコード利用後にいかなる損害が発生しても当サイト及び管理人は一切責任を負いません。
※本記事内容の無断転載を禁じます。
【WEBMASTER/管理人】
自営業プログラマーです。お仕事ください!
ご連絡は以下アドレスまでお願いします★

☆ServerNote.NETショッピング↓
ShoppingNote / Amazon.co.jp
☆お仲間ブログ↓
一人社長の不動産業務日誌
【キーワード検索】