AI作画与语音以及gpt

Author : zbzhen, Modified : Wed Nov 1 01:51:19 2023

文字图片和声音是主要的交流媒介，如今已经开始被ai占领

无论如何，ai都应该作为提升自我的工具，而不是用于作恶

下面是相关使用经验

1. ai产生文本gpt

这项技术已经很不错了，可以为工作和学习大大提高效率

但是由于各种限制，使用起来会碰到一些小麻烦

关键是这种技术个人用户本地部署代价大

因此只能用大厂提供的云端服务

个人用户最希望使用起来毫无限制

但是大厂和中间商希望个人用户多充钱

当然，也有些大佬用爱发电做免费服务，但是难以长久

最后结局就是：永久，自由，免费只能三选二

另外就是付出代价，例如：免费的代价为广告，推广，氪金等

自由的代价就是出现次数，积分，字数，网络等限制条件，

我的需求策略是： 找到良心大厂，它提供一定的基础免费服务，并且对高端服务收费。此外要方便搭配工具。可以适量付费，但是费用不能高。

1.1. gpt免费

免费的背后，一定是要想办法回血的
免费的背后，一定是要想办法回血的

可参考(下面提供的链接未必一定靠谱，最终请自行甄别)

可得到api_key和api_host https://github.com/chatanywhere/GPT_API_free
公开站点合集 https://github.com/LiLittleCat/awesome-free-chatgpt
这种方案稍微有点麻烦，官网可无限制免费用gpt3.5,但是有网络限制，这就意味拿到Access Token可以白嫖半个月(因为Access Token只有半个月时效) https://github.com/zhile-io/pandora
(极致薅公开羊毛, 速度和稳定性其实不强) https://github.com/xtekky/gpt4free
可得到api_key和api_host https://github.com/terobox/ChatGPT-API-Faucet, 实际体验计费有点贵, 而且只有一个月的期限, 不过可用来做测试

1.2. gpt使用的方式

经过实践发现下面的使用gpt的方式非常合理：

浏览网页时用浏览器的ChatGPTBox插件
写文档和代码时比较方便的一种使用方式是py脚本配合md，以及用vscode的bito插件

下面依次给出这些方案的说明

1.2.1. 配合vscode的插件使用

这是最简单的方法了，只需在vscode安装插件Bito.Bito

然后注册就可以使用3.5版本了，说是永久免费，目前没有网络和次数限制，它对4版本收费

这个优势是使用起来很方便，可以直接选中代码提问。

1.2.2. 配合浏览器使用

推荐开源的浏览器插件ChatGPTBox，增加浏览器的灵活性，这款插件用了之后必定停不下来了，因为浏览器是用于看网页和PDF的，有了它可以多一个提问的结果，特别是翻译以及网页总结非常方便

不过它只提供工具，需要自己整内核(api-key，以及解决网络问题)

api-key可免费注册一般有四个月的使用期限，不过需海外拿手机接码，也要手动操作。要相信有需求就一定有买卖，个人感觉低费用靠谱的key可以接受，毕竟代注册总要点跑腿费的，太贵一定是被坑了。

网络问题好解决，只需要修改插件里默认的api地址即可, 可通过ChatGPTBox插件里的高级-->API地址修改

api路径保持不变都是：/backend-api/conversation

api地址可想办法找到能中转的地址，比较简单的方法是借助cloudflare中转

可参考(下面提供的链接未必一定靠谱，最终请自行甄别)

1.2.3. 配合vscode中的markdown使用(必备推荐)

该方案使用gpt的优势

编辑起来最方便：因为这是在编辑器里编辑
直接保存历史会话记录，随时方便修改gpt中的历史会话
无缝对接日常工作写文档或代码
可配合vscode的插件，
- preflower.vscode-translation(翻译)，
- yzhang.dictionary-completion(书写提示)，
- Bito(它可免费使用gpt3.5)
可同时进行多个提问，并且不同提问可采用不同的模型，安全省时
完全不用科学上网，安全省心
也可以直接在 codespace 中使用，它的优势是真正意义上的无网络限制，可直接用官网的api
可完全免费使用
...

体验效果如下 (真正体验到ai帮你写文档和代码的快乐)：

不过第一次配置稍微麻烦，后续就很方便了

安装python 和vscode. 安装 openai 与 g4f
- pip install openai 安装慢的话可以试试 pip install openai -i https://pypi.tuna.tsinghua.edu.cn/simple
- pip install -U g4f

创建gpt.py文件，并粘贴下面内容

下面代码里的baseurl可以同浏览器插件ChatGPTBox里填写的api地址一样, 如果用默认的官网地址会有网络限制。可以填写多个combs，程序它会随机调用。如果没有api_key(sk-xxxxxxxxxxxxxx)的话可以填写''，程序会默认调用免费的接口，不过可能会比较慢. 默认会自动创建md文件的副本，用于方便不同版本的提问

点我查看折叠的python代码

import os
import sys
import random

g3 = "gpt-3.5-turbo"
g4 = "gpt-4"
api = "https://api.openai.com"
combs = [
[g3, '', ''],
[g4, '', ''],
[g3, api, 'sk-xxxxxxxxxxxxxx'],
[g4, api, 'sk-xxxxxxxxxxxxxx'],
]

syst_label = r"情景"
user_label = r"提问"
asst_label = r"回答"
endl = '\n\n'
all_role_label = [syst_label, user_label, asst_label]

syst =  lambda s : {"role": "system", "content": s}
user =  lambda s : {"role": "user", "content": s}
assi =  lambda s : {"role": "assistant", "content": s}


user_md = '\n\n' + "## "+ user_label + '\n'
asst_md = '\n' + "## "+ asst_label + '\n'

all_gpt_decorate = [syst, user, assi]

def get_h2_message(mdfile):
    h2_title = []
    h2_line_num = []
    mk = -1
    stt = -1
    with open(mdfile, 'r', encoding='utf-8') as file:
        lines = file.readlines()
        for i in range(-1, -len(lines), -1):
            line = lines[i]
            if line[0] == "#":
                if line[1] == " " or line == "#\n":
                    stt = i
                    mt = line[1:].strip()
                    if mt.isdigit():
                        mk = int(mt)
                    break
                elif line[1:3] == "# " :
                    tmp = line[3:].strip()  
                    h2_title.append(tmp)
                    h2_line_num.append(i)

        num_h2_title = len(h2_title)
        if num_h2_title == 0:
            return [], [], mk, stt

        h2_mark = [-1]*num_h2_title
        for i, title in enumerate(h2_title):
            for k, role in enumerate(all_role_label):
                if role == title:
                    h2_mark[i] = k
                    break
        message = []
        ind = []

        if h2_mark[0] != -1 and h2_line_num[0] != -1:
            tmp = lines[h2_line_num[0]+1:]
            tmp = ''.join(tmp).strip()
            if tmp != '':
                message.append(tmp)
                ind.append(h2_mark[0])

        for i in range(len(h2_mark)-1):
            if h2_mark[i+1] != -1:
                tmp = lines[h2_line_num[i+1]+1 : h2_line_num[i]]
                tmp = ''.join(tmp).strip()
                if tmp != '':
                    message.append(tmp)
                    ind.append(h2_mark[i+1])
    return message, ind, mk, stt

def gpt35_openai(inpt):
    messages, mdfile, comb = inpt
    gptmd, baseurl, key = comb
    if key == '' or baseurl == '':
        import g4f
        print('free '+ gptmd + 'is used')
        g4f.logging = True # enable logging
        g4f.check_version = False # Disable automatic version checking
        response = g4f.ChatCompletion.create(
            model = gptmd,
            messages = messages,
            stream=True,
        )
        for message in response:
            with open(mdfile, 'a', encoding='utf-8') as f:
                f.write(message)
        return
    import openai
    openai.api_key = key
    openai.api_base = baseurl+"/v1"
    print("        mdfile:", mdfile)
    print("py openai used:", baseurl[-8:])
    print("key=sk-xxxx..."+key[-3:])
    response = openai.ChatCompletion.create(
        model = gptmd,
        messages = messages,
        temperature = 0,
        stream=True, 
    )
    for chunk in response:
        one_out = chunk['choices'][0]['delta'].get('content','')
        with open(mdfile, 'a', encoding='utf-8') as f:
            f.write(one_out)
    return

class Gpt(object):
    def __init__(self, mdfile, gpt35, combs):
        self.message, self.ind, mk, stt = get_h2_message(mdfile)
        if self.message == []:
            return
        if self.ind[0] == -1:
            return
        gpt_in = []
        for i in range(-1, -len(self.ind)-1, -1):
            roll_decorate = all_gpt_decorate[self.ind[i]]
            gpt_in.append(roll_decorate(self.message[i]))

        Ncombs = len(combs)
        if mk == -1:
            ikx = random.choice(range(Ncombs))
        else:
            ikx = (abs(mk) + Ncombs) % Ncombs
        ikx1 = (ikx + 1 + Ncombs) % Ncombs
        ipt = [gpt_in, mdfile, combs[ikx]]
        inputs = [ipt]
        with open(mdfile, 'a', encoding='utf-8') as f:
            f.writelines(asst_md)
        ###########################################
        unn = 1
        if mdfile[-5:] != 'cp.md':
            unn = 2
            with open(mdfile, 'r', encoding='utf-8') as file:
                lines = file.readlines()
                mm = lines[stt+1:]
            folder_path = os.path.dirname(mdfile)
            file_name = os.path.basename(mdfile)
            cpmdfile = folder_path + '/' + file_name[:-3] + '_cp.md'
            with open(cpmdfile, 'a', encoding='utf-8') as f:
                f.writelines("\n# " + str(mk+1)+"\n")
                f.writelines(mm)
            inputs.append([gpt_in, cpmdfile, combs[ikx1]])
            
            import concurrent.futures  
            with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
                executor.map(gpt35, inputs)
        else:
            gpt35(ipt)
        ###### 
        current_file = os.path.abspath(__file__)
        current_folder = os.path.dirname(current_file)
        count_file = current_folder + '/count.txt'
        if os.path.exists(count_file):
            with open(count_file, 'r') as f:
                count = int(f.read().strip())
        else:
            count = 0
        with open(count_file, 'w') as f:
            f.write(str(count + unn))
        print('All usage count:', count + unn)
        with open(mdfile, 'a', encoding='utf-8') as f:
            f.writelines("\n# " +str(count + unn))
            f.writelines(user_md)
            f.writelines("\n")
        if unn == 2:
            with open(cpmdfile, 'a', encoding='utf-8') as f:
                f.writelines("\n# " +str(count + unn))
                f.writelines(user_md)
                f.writelines("\n")
        pass

Gpt(sys.argv[1], gpt35_openai, combs)

安装vscode的插件pucelle.run-on-save. 在settings.json中加上代码(假设的gpt.py路径是D:\\2023\\gpt\\gpt.py)

{

"runOnSave.commands": [
    {
        // Match scss files except names start with `_`.
        "match": "gpt.*\\.md$",
        "notMatch": "[\\\\\\/]_[^\\\\\\/]*\\.md$",
        "command": "python  D:\\2023\\gpt\\gpt.py ${file}",
        "runIn": "backend",
        "runningStatusMessage": "Runing python  ${fileDirname}/gpt.py ${file}",
        "finishStatusMessage": "$Finished"
    },
    ],
"files.autoSave": "off", 

}

上面配置代码的作用是，每当Ctrl+S保存了gptxx.md文件就会自动运行gpt.py脚本，从而可以把gpt生产的内容自动写到gptxx.md文件里。

点击vscode左侧的扩展栏最下方的齿轮图标，选择"Settings"。在打开的设置页面中，右上角有一个打开settings.json的图标，点击即可快速打开settings.json文件

vscode打开某个文件夹，然后创建gpt.md

在gpt.md中的提问方式为，单个#可开启新的提问, 例如
```
# 

## 提问
现在北京时间
```
快捷键Ctrl+S保存gpt.md就会触发gpt回答

如果在(# 后面可以加空格与数字)则会调用指定数字的combs, 例如
```
#  2

## 提问
现在北京时间
```
则会选择combs[2], 要注意从0数起。不加数字就会随机选择

1.3. gpt付费推荐

一定要有多种使用渠道。主要有两个顾虑：

一个是确保使用无障碍，毕竟免费薅羊毛一定会有数量限制的
另外一个是为了数据安全着想，如果只用一个账号给论文润色，这会导致你论文刚好全部泄露，不说别的，就说万一你的账号泄露了呢。如果有多个账号多种渠道就不用担心这种问题。因为每个渠道只知道你一部分信息。

永久，自由，免费只能三选二，因此可以做这么些选择

用低配的平替版
- 无网络限制 vscode与bito插件
- 有网络限制 https://poe.com
找靠谱的中间商, 购买便宜的官方key, 3元以内可以接受(毕竟同样的价值里它能带来的快乐是雪糕与可乐给不了的)，但多了是冤大头，然后借助cloudflare中转，突破网络限制问题。这里可以考虑整两个key，和两个cloudflare中转。这样配合vscode与bito插件几乎可以无限制使用gpt3.5。而且可以同一时间多个提问，同一个问题可多个渠道找答案。
直接用国内的ai平台，例如百度的ai，直接登陆账号可提问。

至此已经可以很方便的使用gpt3.5版本而且能做到安全，低费无限制，但是gpt3.5的智商有限，要想更好的体验，还是得上gpt4版本，不过，通常优质服务的费用会更大。下面是官方的GPT-4单价 https://openai.com/pricing

模型	输入	输出
8K	$0.03 / 1K tokens	$0.06 / 1K tokens
32K	$0.06 / 1K tokens	$0.12 / 1K tokens

按照7倍换算成RMB,

模型	输入	输出
8K	￥0.21 / 1K tokens	￥0.42 / 1K tokens
32K	￥0.42 / 1K tokens	￥0.84 / 1K tokens

gpt4版本的计费大约gpt3.5版本的20-30倍

这里给出这么一些购买建议gpt4的建议：

直接官方购买，相对比较简单靠谱的方式是用某宝给Apple的gift card充值，每个月是20$，如果怕费用高的话可以找队友合用一个账号，同样建议用cloudflare中转确保不被封号同顺带解决网络问题
在有多种gpt3.5的选择前提下，其实对gpt4的需求量不会特别大。因此可考虑使用一些看起来比较靠谱的中间商，需要自己仔细甄别。这种中间商其实很多，但是靠谱的难找。它们有个共同特点，帮忙转发流量，并且只能用他们家的key(其原理应该是，他们有很多官方号，然后轮循使用，他们需要做的事是转发会话信息并且做好计费工作)。这会就意味你的会话信息会经过他们的手。不过优势是，只要找对了商家使用起来是真的方便，此外的优势是他们帮忙解除了网络限制和付费限制，最低消费也不高。

理论上来说，诚信靠谱中间商一般来说单价会比官网贵，毕竟要赚手续费。
计费比官网还便宜的容易现这些问题：
- 使用起来不稳定，
- 偶尔以3.5版本充当4版本，
- tokens的计算方式比官方高
- 最严重的是跑路
无论是哪一种，反正就是多找几家，每家都不要冲太多就对了。例如可以多找几家靠谱的，每家耗费10-20块，也来个轮流提问。

无论是3版本还是4版本：

尽量不去没有什么名气和后台的第三方网站上使用gpt，毕竟隐私也很重要，它们多半会收集你的注册信息以及聊天信息。而且这种一般都是一开始免费，然后逐渐给出限制，例如收费，限制次数，广告等。总之，这类网站一般比较适合当玩具。

2. ai语音

训练数据
https://github.com/AI-Hobbyist/Genshin_Datasets
https://www.bilibili.com/read/cv24180458/

2.1. 文字转语音

体积小且支持cpu https://github.com/Plachtaa/VITS-fast-fine-tuning/

实际体验中，cpu的体验感觉不太好，太慢了

因此建议本地gpu部署，并推荐借助colab训练大约2-3小时足够

实测采用100条纯wav中文语音，每条语音5-10秒钟，按照官方给的colab步骤

只用中文非辅助训练，得到权重文件用于cpu版本会出现zh的额外读音，并且cpu推理慢

但是如果用混合语音训练，发音会有一股大佐味

最后就自己本地部署，本地部署完全不需要本地训练，只需考虑推理，训练用的是colab中文非辅助训练，

2.1.1. colab数据准备与训练

下面过程只针对中文训练, 训练用的是colab中文非辅助训练，
音频数据样本的采样率为16000

建议本地预处理音频, 本地预处理之后

可以在这一步里STEP 3 自动处理所有上传的数据, 只需执行!python scripts/resample.py

下面是处理的脚本

import os
import ffmpeg

input_folder = './'
output_folder = './'
spk = 'ms'
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

files = os.listdir(input_folder)

alltx = []
i = 0
for k, file in enumerate(files):
    if file.startswith("processed_"):
        continue
    if file.endswith('.wav'):
        input_path = os.path.join(input_folder, file)
        labfile = input_path[:-4]+'.lab'
        output_path = os.path.join(output_folder, f"processed_{i}.wav")
        with open(labfile, 'r', encoding='utf-8') as file:
            lines = file.readlines()
            ab = lines[0]
        ffmpeg.input(input_path).output(output_path, ar=16000).run()
        tx = f'./custom_character_voice/{spk}/processed_{i}.wav|{spk}|{ab}'
        i += 1
        alltx.append(tx)

with open("short_character_anno.txt", 'w', encoding='utf-8') as f:
    for line in alltx:
        f.write(line)
        f.write('\n')

文件的位置为(不需要多个作者,一个作者即可),

[VITS-fast-fine-tuning]
├───short_character_anno.txt
├───...
└───[custom_character_voice]
    └───[ms]
        └───processed_0.wav
        └───processed_1.wav
        └───...

其中short_character_anno.txt的格式是(ms为作者,可自己设定)

./custom_character_voice/ms/processed_0.wav|ms|语音零
./custom_character_voice/ms/processed_1.wav|ms|语音一

要把short_character_anno.txt文件里的[zh]去掉。可以在训练之前(音频重新采样之后)执行下面代码

scfile = 'short_character_anno.txt'
with open(scfile, 'r') as f:
    content = f.read()
content = content.replace('[ZH]', '')
with open(scfile, 'w') as f:
    f.write(content)
! cat 'short_character_anno.txt'

远程训练好了之后, 要运行一下
!python scripts/rearrange_speaker.py

2.1.2. 本地部署

我用的是python10(反正就是用中间数字的python版本就对了)，第三方包版本安装最新版本即可

本地部署好了后，把远程训练好的两个文件(finetune_speaker.json与G_latest.pth)下载下来，并放到VITS-fast-fine-tuning文件夹下

然后运行下面脚本即可很好的体验

python VC_inference.py

如果想要构建接口，需要稍微改动一点

把文件VC_inference.py里的(大约在页面的底部)

btn.click(tts_fn,
            inputs=[textbox, char_dropdown, language_dropdown, duration_slider,],
            outputs=[text_output, audio_output]
            )

修改成(就是添加一个api_name="tts")

btn.click(tts_fn,
            inputs=[textbox, char_dropdown, language_dropdown, duration_slider,],
            outputs=[text_output, audio_output], 
            api_name="tts"
            )

启动好python VC_inference.py，然后可运行下面脚本(需要安装pygame)可听到你好呀, 召唤师的声音

import requests
import pygame

def playsound(tx):
    response = requests.post("http://127.0.0.1:7860/run/tts", json={
        "data": [
            tx,
            "ms",
            "简体中文",
            1.5,
        ]
    }).json()
    audio = response["data"][-1]['name'] 
    pygame.init()
    sound = pygame.mixer.Sound(audio)
    sound.play()
    lens = sound.get_length()
    pygame.time.wait(int(1000*lens))
    pygame.quit()

tx = "你好呀, 召唤师" 
playsound(tx)

有了本地api之后，要读文章什么之类的就简单多了

2.2. 文字转语音在线版

可直接使用: https://github.com/LokerL/tts-vue/

下面是python版本, 主要作用是转srt文件为语音

该版本只能联网使用, 调用了微软和阿里的接口

阿里的接口需要申请api_key(申请后一定要保管好, 不能泄露), 可能会产生少量费用

下面的代码可以吧.srt的字幕文件转为音频文件

代码需要安装ffmpeg 以及Python的第三方库pydub, dashscope, edge-tts

点我查看折叠的python代码

import re
import numpy as np
import os
from pydub import AudioSegment
import subprocess
## pip install pydub, dashscope, edge-tts -i https://pypi.tuna.tsinghua.edu.cn/simple

def read_srt(file_name):
    with open(file_name, 'r', encoding='utf-8') as file:
        lines = file.readlines()
    #     content = file.read()
    # # 使用正则表达式匹配时间和文本
    time_pattern = re.compile(r'(\d{2}):(\d{2}):(\d{2}),(\d{3}) --> (\d{2}):(\d{2}):(\d{2}),(\d{3})')
    allnn = len(lines)
    nn = int((allnn+1)/4)
    times = ['']*nn
    texts = ['']*nn
    for i in range(nn):
        times[i] = time_pattern.findall(lines[i*4+1])[0]
        texts[i] = lines[i*4+2].strip()
    # 计算停顿时间
    tt = []
    for i in range(len(times)):
        t2 = int(times[i][4]) * 3600000 + int(times[i][5]) * 60000 + int(times[i][6]) * 1000 + int(times[i][7])
        t1 = int(times[i][0]) * 3600000 + int(times[i][1]) * 60000 + int(times[i][2]) * 1000 + int(times[i][3])
        tt.append(t1)
        tt.append(t2)
    tbb = np.array(tt) -  np.array([0]+tt[:-1])
    sumtbb = np.sum(tbb)
    print('Audio time for subtitles:', sumtbb)
    tbb = np.hstack((tbb, 10000000))
    return texts, tbb, nn, sumtbb

def tts_ali(input):
    txt, k = input
    # https://help.aliyun.com/zh/dashscope/developer-reference/model-list-old-version?spm=a2c4g.11186623.0.0.1fae52e5eE7ixj
    # mds = ['sambert-zhichu-v1', 'sambert-zhixiang-v1', 'sambert-zhiqian-v1']
    sample_rate=48000; model = 'sambert-zhiqian-v1'; volume = 60; rate = 1.0
    result1 = SpeechSynthesizer.call(model=model,
                                    text=txt,
                                    sample_rate=sample_rate,
                                    volume = volume,
                                    rate = rate,
                                    format='wav')

    audata1 = result1.get_audio_data()
    with open(f'./wav/{k}.wav', 'wb') as f:
        f.write(audata1)
    return 


def tts_ms(input):
    txt, k = input
    # https://github.com/rany2/edge-tts
    # edge-tts --list-voices 查看语音角色
    # 
    # mds = ['zh-CN-XiaoxiaoNeural', 'zh-CN-YunxiNeural']
    # edge-tts --voice zh-CN-XiaoxiaoNeural --text 你好召唤师 --write-media 1.mp3 --rate=-0% --volume=+10% 
    model = 'zh-CN-XiaoxiaoNeural';
    cmd= f"""edge-tts --voice {model} --text "{txt}" --write-media ./mp3/{k}.mp3 --write-subtitles ./mp3/{k}.vvt --rate=+20%"""
    subprocess.run(cmd)
    return 

def dir_check():
    current_dir = os.getcwd()
    mp3_dir = os.path.join(current_dir, 'mp3')
    if not os.path.exists(mp3_dir):
        os.makedirs(mp3_dir)
    return

def make(tt, nn, sumtt, fmt):
    audios = []
    newtt = []
    all_lenaudio = 0
    
    for i in range(nn):
        k = i*2
        # tts([texts[i], i])
        if fmt == 'mp3':
            audio = AudioSegment.from_mp3(f'./{fmt}/{i}.{fmt}')
        else:
            audio = AudioSegment.from_wav(f'./{fmt}/{i}.{fmt}')
        lenaudio = len(audio)
        dt = tt[k] + tt[k+1] - lenaudio
        if lenaudio <= tt[k+1] or dt >= 0:
            tt[k] += tt[k+1] - lenaudio
        elif dt + tt[k+2] >= 0:
            tt[k] = 0
            tt[k+2] += dt
        else:
            tt[k] = 0
            tt[k+2] = 0
        audios.append(audio)
        newtt.append(tt[k])
        all_lenaudio += lenaudio
    dtt = (all_lenaudio+np.sum(newtt)) - sumtt
    newtt = np.array(newtt)*(1-dtt/np.sum(newtt))
    newtt = np.array(newtt, int)
    combined = AudioSegment.silent(duration=0)
    for i in range(len(audios)):
        silence = AudioSegment.silent(duration=newtt[i])
        combined = combined + silence + audios[i]

    if sumtt > len(combined):
        dtt = sumtt - len(combined)
        silence = AudioSegment.silent(duration=dtt)
        combined = combined + silence
    print('Actual mp3 audio time   :', len(combined))
    combined.export(f'./{fmt}/0000000.{fmt}', format=fmt)
    print(f'please find ./{fmt}/0000000.{fmt}')

#######################################################################
dir_check()
texts, tt, nn, sumtbb = read_srt('1.srt')
tts = tts_ali
# tts = tts_ms

inputs = [[texts[i], i] for i in range(nn)]
if tts == tts_ali:
    print('tts_ali is used')
    import dashscope
    from dashscope.audio.tts import SpeechSynthesizer
    ## https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key
    dashscope.api_key='sk-xxxxxxxxxxxxxxxxxxxxxxxx'
    fmt = 'wav'
if tts == tts_ms:
    print('tts_ms is used')
    fmt = 'mp3'
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    executor.map(tts, inputs)

make(tt, nn, sumtbb, fmt)

2.3. 音频降噪与伴奏分离

https://github.com/Anjok07/ultimatevocalremovergui/releases

2.4. 语音转文字

本质上是为视频加字幕

下面要用到的权重文件：https://github.com/openai/whisper/discussions/63
字幕 Buzz(基于cpu)：https://github.com/chidiwilliams/buzz
字幕 WhisperDesktop(基于gpu)：https://github.com/Const-me/Whisper
繁简转换：https://github.com/xiaoxinpro/ChineseSubtitleConversionTool
其它参考 https://kz16.top/tutu/

2.5. 语音转语音

推荐这个，虽然训练速度慢但是效果很好

https://github.com/voicepaw/so-vits-svc-fork

下面这个会更好
https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

https://github.com/camenduru/Retrieval-based-Voice-Conversion-WebUI-colab

2.6. 文字变音乐

代码地址：https://github.com/facebookresearch/audiocraft
Colab地址：https://github.com/camenduru/MusicGen-colab
在线试玩：https://huggingface.co/spaces/facebook/MusicGen

3. ai画图

这个比较简单，甚至本地没有显卡也可以借助colab云服务器轻松实现

具体可以参考(下面提供的链接未必一定靠谱，最终请自行甄别)

离线画图 https://github.com/AUTOMATIC1111/stable-diffusion-webui
在线colab画图 https://github.com/camenduru/stable-diffusion-webui-colab