云端炼丹,算力白嫖,基于云端GPU(Colab)使用So-vits库制作AI特朗普演唱《国际歌》

技术分享 3年前 (2023-05-17) 0 999+

人工智能AI技术早已深入到人们生活的每一个角落，君不见AI孙燕姿的歌声此起彼伏，不绝于耳，但并不是每个人都拥有一块N卡，没有GPU的日子总是不好过的，但是没关系，山人有妙计，本次我们基于Google的Colab免费云端服务器来搭建深度学习环境，制作AI特朗普，让他高唱《国际歌》。

Colab（全名Colaboratory ），它是Google公司的一款基于云端的基础免费服务器产品，可以在B端，也就是浏览器里面编写和执行Python代码，非常方便，贴心的是，Colab可以给用户分配免费的GPU进行使用，对于没有N卡的朋友来说，这已经远远超出了业界良心的范畴，简直就是在做慈善事业。

配置Colab

Colab是基于Google云盘的产品，我们可以将深度学习的Python脚本、训练好的模型、以及训练集等数据直接存放在云盘中，然后通过Colab执行即可。

首先访问Google云盘：drive.google.com

随后点击新建，选择关联更多应用：

接着安装Colab即可：

至此，云盘和Colab就关联好了，现在我们可以新建一个脚本文件my_sovits.ipynb文件，键入代码：

hello colab

随后，按快捷键 ctrl + 回车，即可运行代码：

这里需要注意的是，Colab使用的是基于Jupyter Notebook的ipynb格式的Python代码。

Jupyter Notebook是以网页的形式打开，可以在网页页面中直接编写代码和运行代码，代码的运行结果也会直接在代码块下显示。如在编程过程中需要编写说明文档，可在同一个页面中直接编写，便于作及时的说明和解释。

随后设置一下显卡类型：

接着运行命令，查看GPU版本：

!/usr/local/cuda/bin/nvcc --version      !nvidia-smi

程序返回：

nvcc: NVIDIA (R) Cuda compiler driver   Copyright (c) 2005-2022 NVIDIA Corporation   Built on Wed_Sep_21_10:33:58_PDT_2022   Cuda compilation tools, release 11.8, V11.8.89   Build cuda_11.8.r11.8/compiler.31833905_0   Tue May 16 04:49:23 2023          +-----------------------------------------------------------------------------+   | NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |   |-------------------------------+----------------------+----------------------+   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |   |                               |                      |               MIG M. |   |===============================+======================+======================|   |   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |   | N/A   65C    P8    13W /  70W |      0MiB / 15360MiB |      0%      Default |   |                               |                      |                  N/A |   +-------------------------------+----------------------+----------------------+                                                                                     +-----------------------------------------------------------------------------+   | Processes:                                                                  |   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |   |        ID   ID                                                   Usage      |   |=============================================================================|   |  No running processes found                                                 |   +-----------------------------------------------------------------------------+

这里建议选择Tesla T4的显卡类型，性能更突出。

至此Colab就配置好了。

配置So-vits

下面我们配置so-vits环境，可以通过pip命令安装一些基础依赖：

!pip install pyworld==0.3.2   !pip install numpy==1.23.5

注意jupyter语言是通过叹号来运行命令。

注意，由于不是本地环境，有的时候colab会提醒：

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/   Collecting numpy==1.23.5     Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 80.1 MB/s eta 0:00:00   Installing collected packages: numpy     Attempting uninstall: numpy       Found existing installation: numpy 1.22.4       Uninstalling numpy-1.22.4:         Successfully uninstalled numpy-1.22.4   Successfully installed numpy-1.23.5   WARNING: The following packages were previously imported in this runtime:     [numpy]   You must restart the runtime in order to use newly installed versions.

此时numpy库需要重启runtime才可以导入操作。

重启runtime后，需要再重新安装一次，直到系统提示依赖已经存在：

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/   Requirement already satisfied: numpy==1.23.5 in /usr/local/lib/python3.10/dist-packages (1.23.5)

随后，克隆so-vits项目，并且安装项目的依赖：

import os   import glob   !git clone https://github.com/effusiveperiscope/so-vits-svc -b eff-4.0   os.chdir('/content/so-vits-svc')   # install requirements one-at-a-time to ignore exceptions   !cat requirements.txt | xargs -n 1 pip install --extra-index-url https://download.pytorch.org/whl/cu117   !pip install praat-parselmouth   !pip install ipywidgets   !pip install huggingface_hub   !pip install pip==23.0.1 # fix pip version for fairseq install   !pip install fairseq==0.12.2   !jupyter nbextension enable --py widgetsnbextension   existing_files = glob.glob('/content/**/*.*', recursive=True)   !pip install --upgrade protobuf==3.9.2   !pip uninstall -y tensorflow   !pip install tensorflow==2.11.0

安装好依赖之后，定义一些前置工具方法：

os.chdir('/content/so-vits-svc') # force working-directory to so-vits-svc - this line is just for safety and is probably not required      import tarfile   import os   from zipfile import ZipFile   # taken from https://github.com/CookiePPP/cookietts/blob/master/CookieTTS/utils/dataset/extract_unknown.py   def extract(path):       if path.endswith(".zip"):           with ZipFile(path, 'r') as zipObj:              zipObj.extractall(os.path.split(path)[0])       elif path.endswith(".tar.bz2"):           tar = tarfile.open(path, "r:bz2")           tar.extractall(os.path.split(path)[0])           tar.close()       elif path.endswith(".tar.gz"):           tar = tarfile.open(path, "r:gz")           tar.extractall(os.path.split(path)[0])           tar.close()       elif path.endswith(".tar"):           tar = tarfile.open(path, "r:")           tar.extractall(os.path.split(path)[0])           tar.close()       elif path.endswith(".7z"):           import py7zr           archive = py7zr.SevenZipFile(path, mode='r')           archive.extractall(path=os.path.split(path)[0])           archive.close()       else:           raise NotImplementedError(f"{path} extension not implemented.")      # taken from https://github.com/CookiePPP/cookietts/tree/master/CookieTTS/_0_download/scripts      # megatools download urls   win64_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win64.zip"   win32_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win32.zip"   linux_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-linux-x86_64.tar.gz"   # download megatools   from sys import platform   import os   import urllib.request   import subprocess   from time import sleep      if platform == "linux" or platform == "linux2":           dl_url = linux_url   elif platform == "darwin":       raise NotImplementedError('MacOS not supported.')   elif platform == "win32":           dl_url = win64_url   else:       raise NotImplementedError ('Unknown Operating System.')      dlname = dl_url.split("/")[-1]   if dlname.endswith(".zip"):       binary_folder = dlname[:-4] # remove .zip   elif dlname.endswith(".tar.gz"):       binary_folder = dlname[:-7] # remove .tar.gz   else:       raise NameError('downloaded megatools has unknown archive file extension!')      if not os.path.exists(binary_folder):       print('"megatools" not found. Downloading...')       if not os.path.exists(dlname):           urllib.request.urlretrieve(dl_url, dlname)       assert os.path.exists(dlname), 'failed to download.'       extract(dlname)       sleep(0.10)       os.unlink(dlname)       print("Done!")         binary_folder = os.path.abspath(binary_folder)      def megadown(download_link, filename='.', verbose=False):       """Use megatools binary executable to download files and folders from MEGA.nz ."""       filename = ' --path "'+os.path.abspath(filename)+'"' if filename else ""       wd_old = os.getcwd()       os.chdir(binary_folder)       try:           if platform == "linux" or platform == "linux2":               subprocess.call(f'./megatools dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)           elif platform == "win32":               subprocess.call(f'megatools.exe dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)       except:           os.chdir(wd_old) # don't let user stop download without going back to correct directory first           raise       os.chdir(wd_old)       return filename      import urllib.request   from tqdm import tqdm   import gdown   from os.path import exists      def request_url_with_progress_bar(url, filename):       class DownloadProgressBar(tqdm):           def update_to(self, b=1, bsize=1, tsize=None):               if tsize is not None:                   self.total = tsize               self.update(b * bsize - self.n)              def download_url(url, filename):           with DownloadProgressBar(unit='B', unit_scale=True,                                    miniters=1, desc=url.split('/')[-1]) as t:               filename, headers = urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to)               print("Downloaded to "+filename)       download_url(url, filename)         def download(urls, dataset='', filenames=None, force_dl=False, username='', password='', auth_needed=False):       assert filenames is None or len(urls) == len(filenames), f"number of urls does not match filenames. Expected {len(filenames)} urls, containing the files listed below.n{filenames}"       assert not auth_needed or (len(username) and len(password)), f"username and password needed for {dataset} Dataset"       if filenames is None:           filenames = [None,]*len(urls)       for i, (url, filename) in enumerate(zip(urls, filenames)):           print(f"Downloading File from {url}")           #if filename is None:           #    filename = url.split("/")[-1]           if filename and (not force_dl) and exists(filename):               print(f"{filename} Already Exists, Skipping.")               continue           if 'drive.google.com' in url:               assert 'https://drive.google.com/uc?id=' in url, 'Google Drive links should follow the format "https://drive.google.com/uc?id=1eQAnaoDBGQZldPVk-nzgYzRbcPSmnpv6".nWhere id=XXXXXXXXXXXXXXXXX is the Google Drive Share ID.'               gdown.download(url, filename, quiet=False)           elif 'mega.nz' in url:               megadown(url, filename)           else:               #urllib.request.urlretrieve(url, filename=filename) # no progress bar               request_url_with_progress_bar(url, filename) # with progress bar      import huggingface_hub   import os   import shutil      class HFModels:       def __init__(self, repo = "therealvul/so-vits-svc-4.0",                model_dir = "hf_vul_models"):           self.model_repo = huggingface_hub.Repository(local_dir=model_dir,               clone_from=repo, skip_lfs_files=True)           self.repo = repo           self.model_dir = model_dir              self.model_folders = os.listdir(model_dir)           self.model_folders.remove('.git')           self.model_folders.remove('.gitattributes')          def list_models(self):           return self.model_folders          # Downloads model;       # copies config to target_dir and moves model to target_dir       def download_model(self, model_name, target_dir):           if not model_name in self.model_folders:               raise Exception(model_name + " not found")           model_dir = self.model_dir           charpath = os.path.join(model_dir,model_name)              gen_pt = next(x for x in os.listdir(charpath) if x.startswith("G_"))           cfg = next(x for x in os.listdir(charpath) if x.endswith("json"))           try:             clust = next(x for x in os.listdir(charpath) if x.endswith("pt"))           except StopIteration as e:             print("Note - no cluster model for "+model_name)             clust = None              if not os.path.exists(target_dir):               os.makedirs(target_dir, exist_ok=True)              gen_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,               filename = model_name + "/" + gen_pt) # this is a symlink                      if clust is not None:             clust_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,                 filename = model_name + "/" + clust) # this is a symlink             shutil.move(os.path.realpath(clust_dir), os.path.join(target_dir, clust))             clust_out = os.path.join(target_dir, clust)           else:             clust_out = None              shutil.copy(os.path.join(charpath,cfg),os.path.join(target_dir, cfg))           shutil.move(os.path.realpath(gen_dir), os.path.join(target_dir, gen_pt))              return {"config_path": os.path.join(target_dir,cfg),               "generator_path": os.path.join(target_dir,gen_pt),               "cluster_path": clust_out}      # Example usage   # vul_models = HFModels()   # print(vul_models.list_models())   # print("Applejack (singing)" in vul_models.list_models())   # vul_models.download_model("Applejack (singing)","models/Applejack (singing)")          print("Finished!")

这些方法可以帮助我们下载、解压和加载模型。

音色模型下载和线上推理

接着将特朗普的音色模型和配置文件进行下载，下载地址是：

https://huggingface.co/Nardicality/so-vits-svc-4.0-models/tree/main/Trump18.5k

随后模型文件放到项目的models文件夹，配置文件则放入config文件夹。

接着将需要转换的歌曲上传到和项目平行的目录中。

运行代码：

import os   import glob   import json   import copy   import logging   import io   from ipywidgets import widgets   from pathlib import Path   from IPython.display import Audio, display      os.chdir('/content/so-vits-svc')      import torch   from inference import infer_tool   from inference import slicer   from inference.infer_tool import Svc   import soundfile   import numpy as np      MODELS_DIR = "models"      def get_speakers():     speakers = []     for _,dirs,_ in os.walk(MODELS_DIR):       for folder in dirs:         cur_speaker = {}         # Look for G_****.pth         g = glob.glob(os.path.join(MODELS_DIR,folder,'G_*.pth'))         if not len(g):           print("Skipping "+folder+", no G_*.pth")           continue         cur_speaker["model_path"] = g[0]         cur_speaker["model_folder"] = folder            # Look for *.pt (clustering model)         clst = glob.glob(os.path.join(MODELS_DIR,folder,'*.pt'))         if not len(clst):           print("Note: No clustering model found for "+folder)           cur_speaker["cluster_path"] = ""         else:           cur_speaker["cluster_path"] = clst[0]            # Look for config.json         cfg = glob.glob(os.path.join(MODELS_DIR,folder,'*.json'))         if not len(cfg):           print("Skipping "+folder+", no config json")           continue         cur_speaker["cfg_path"] = cfg[0]         with open(cur_speaker["cfg_path"]) as f:           try:             cfg_json = json.loads(f.read())           except Exception as e:             print("Malformed config json in "+folder)           for name, i in cfg_json["spk"].items():             cur_speaker["name"] = name             cur_speaker["id"] = i             if not name.startswith('.'):               speakers.append(copy.copy(cur_speaker))          return sorted(speakers, key=lambda x:x["name"].lower())      logging.getLogger('numba').setLevel(logging.WARNING)   chunks_dict = infer_tool.read_temp("inference/chunks_temp.json")   existing_files = []   slice_db = -40   wav_format = 'wav'      class InferenceGui():     def __init__(self):       self.speakers = get_speakers()       self.speaker_list = [x["name"] for x in self.speakers]       self.speaker_box = widgets.Dropdown(           options = self.speaker_list       )       display(self.speaker_box)          def convert_cb(btn):         self.convert()       def clean_cb(btn):         self.clean()          self.convert_btn = widgets.Button(description="Convert")       self.convert_btn.on_click(convert_cb)       self.clean_btn = widgets.Button(description="Delete all audio files")       self.clean_btn.on_click(clean_cb)          self.trans_tx = widgets.IntText(value=0, description='Transpose')       self.cluster_ratio_tx = widgets.FloatText(value=0.0,          description='Clustering Ratio')       self.noise_scale_tx = widgets.FloatText(value=0.4,          description='Noise Scale')       self.auto_pitch_ck = widgets.Checkbox(value=False, description=         'Auto pitch f0 (do not use for singing)')          display(self.trans_tx)       display(self.cluster_ratio_tx)       display(self.noise_scale_tx)       display(self.auto_pitch_ck)       display(self.convert_btn)       display(self.clean_btn)        def convert(self):       trans = int(self.trans_tx.value)       speaker = next(x for x in self.speakers if x["name"] ==              self.speaker_box.value)       spkpth2 = os.path.join(os.getcwd(),speaker["model_path"])       print(spkpth2)       print(os.path.exists(spkpth2))          svc_model = Svc(speaker["model_path"], speaker["cfg_path"],          cluster_model_path=speaker["cluster_path"])              input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)        if f not in existing_files and         any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]       for name in input_filepaths:         print("Converting "+os.path.split(name)[-1])         infer_tool.format_wav(name)            wav_path = str(Path(name).with_suffix('.wav'))         wav_name = Path(name).stem         chunks = slicer.cut(wav_path, db_thresh=slice_db)         audio_data, audio_sr = slicer.chunks2audio(wav_path, chunks)            audio = []         for (slice_tag, data) in audio_data:             print(f'#=====segment start, '                 f'{round(len(data)/audio_sr, 3)}s======')                          length = int(np.ceil(len(data) / audio_sr *                 svc_model.target_sample))                          if slice_tag:                 print('jump empty segment')                 _audio = np.zeros(length)             else:                 # Padding "fix" for noise                 pad_len = int(audio_sr * 0.5)                 data = np.concatenate([np.zeros([pad_len]),                     data, np.zeros([pad_len])])                 raw_path = io.BytesIO()                 soundfile.write(raw_path, data, audio_sr, format="wav")                 raw_path.seek(0)                 _cluster_ratio = 0.0                 if speaker["cluster_path"] != "":                   _cluster_ratio = float(self.cluster_ratio_tx.value)                 out_audio, out_sr = svc_model.infer(                     speaker["name"], trans, raw_path,                     cluster_infer_ratio = _cluster_ratio,                     auto_predict_f0 = bool(self.auto_pitch_ck.value),                     noice_scale = float(self.noise_scale_tx.value))                 _audio = out_audio.cpu().numpy()                 pad_len = int(svc_model.target_sample * 0.5)                 _audio = _audio[pad_len:-pad_len]             audio.extend(list(infer_tool.pad_array(_audio, length)))                      res_path = os.path.join('/content/',             f'{wav_name}_{trans}_key_'             f'{speaker["name"]}.{wav_format}')         soundfile.write(res_path, audio, svc_model.target_sample,             format=wav_format)         display(Audio(res_path, autoplay=True)) # display audio file       pass        def clean(self):        input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)        if f not in existing_files and         any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]        for f in input_filepaths:          os.remove(f)      inference_gui = InferenceGui()

此时系统会自动在根目录，也就是content下寻找音乐文件，包含但不限于wav、flac、mp3等等，随后根据下载的模型进行推理，推理之前会自动对文件进行背景音分离以及降噪和切片等操作。

推理结束之后，会自动播放转换后的歌曲。

结语

如果是刚开始使用Colab，默认分配的显存是15G左右，完全可以胜任大多数训练和推理任务，但是如果经常用它挂机运算，能分配到的显卡配置就会渐进式地降低，如果需要长时间并且相对稳定的GPU资源，还是需要付费订阅Colab pro服务，另外Google云盘的免费使用空间也是15G，如果模型下多了，导致云盘空间不足，运行代码也会报错，所以最好定期清理Google云盘，以此保证深度学习任务的正常运行。

发表评论