【2025年最新版】オープンソースリップシンクエンジンSadTalkerをDebianで動かす

カテゴリー【Python、Debian】

オープンソースリップシンクエンジンSadTalkerをDebianで動かす

POSTED BY
2025-07-08

静止画の人物の口を動かして音声ファイルを喋らせる動画を生成するオープンソースSadTalker

https://github.com/OpenTalker/SadTalker/tree/main

StaticなFFMPEGを使ったり、いくつかPythonスクリプトファイルの修正版と入れ替えなくては動かなかったので、メモ。

Python3.8のインストール

【Debian】Python3.8.20をソースからインストール【FreeBSD】

Gitリポジトリの取得と基礎セットアップ

gitを展開しその中でpythonの仮想環境を作成し動かす

git clone https://github.com/OpenTalker/SadTalker.git
cd SadTalker
python3.8 -m venv venv
source ./venv/bin/activate
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
bash -x scripts/download_models.sh

Static版のFFMPEGをセットアップ

システムに入っている最新のものでは動かなかったので、バージョン指定でSadTalker内のPATHを優先させる。

cd SadTalker
wget https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.4-amd64-static.tar.xz
tar xvfzp ffmpeg-4.4-amd64-static.tar.xz
export PATH=./ffmpeg-4.4-amd64-static:$PATH
which ffmpeg
./ffmpeg-4.4-amd64-static/ffmpeg

Pythonスクリプトファイルを２つ入れ替える

まずカレントのapp_sadtalker.pyを以下のものに入れ替える。デフォルトのものだとAttributeError: 'Row' object has no attribute 'style'とエラーになる。

Python

app_sadtalker.py

GitHub Source

import os, sys
import gradio as gr
from src.gradio_demo import SadTalker  

# ... (rest of the code remains unchanged)

def sadtalker_demo(checkpoint_path='checkpoints', config_path='src/config', warpfn=None):

    sad_talker = SadTalker(checkpoint_path, config_path, lazy_load=True)

    with gr.Blocks(analytics_enabled=False) as sadtalker_interface:
        gr.Markdown("<div align='center'> <h2> 😭 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023) </span> </h2> \
                    <a style='font-size:18px;color: #efefef' href='https://arxiv.org/abs/2211.12194'>Arxiv</a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
                    <a style='font-size:18px;color: #efefef' href='https://sadtalker.github.io'>Homepage</a>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
                     <a style='font-size:18px;color: #efefef' href='https://github.com/Winfredy/SadTalker'> Github </div>")
        
        submit = gr.Button('Generate', elem_id="sadtalker_generate", variant='primary')  # Define the submit button here

        driven_audio = gr.Audio(label="Input audio", type="filepath", elem_id="driven_audio")  # Define the driven_audio variable here

        preprocess_type = gr.Radio(['crop', 'resize','full', 'extcrop', 'extfull'], value='crop', label='preprocess', info="How to handle input image?")  # Define the preprocess_type variable here

        is_still_mode = gr.Checkbox(label="Still Mode (fewer head motion, works with preprocess `full`)", elem_id="is_still_mode")  # Define the is_still_mode variable here

        enhancer = gr.Checkbox(label="GFPGAN as Face enhancer", elem_id="enhancer")  # Define the enhancer variable here

        batch_size = gr.Slider(label="batch size in generation", step=1, maximum=10, value=2, elem_id="batch_size")  # Define the batch_size variable here

        size_of_image = gr.Radio([256, 512], value=256, label='face model resolution', info="use 256/512 model?", elem_id="size_of_image")  # Define the size_of_image variable here

        pose_style = gr.Slider(minimum=0, maximum=46, step=1, label="Pose style", value=0, elem_id="pose_style")  # Define the pose_style variable here

        gen_video = gr.Video(label="Generated video", format="mp4", elem_id="gen_video")  # Define the gen_video variable here

        with gr.Row():
            with gr.Column(variant='panel'):
                with gr.Tabs(elem_id="sadtalker_source_image"):
                    with gr.TabItem('Upload image'):
                        with gr.Row():
                            source_image = gr.Image(label="Source image", type="filepath", elem_id="img2img_image")

            # ... (rest of the code remains unchanged)

            if warpfn:
                submit.click(
                            fn=warpfn(sad_talker.test), 
                            inputs=[source_image,
                                    driven_audio,
                                    preprocess_type,
                                    is_still_mode,
                                    enhancer,
                                    batch_size,                            
                                    size_of_image,
                                    pose_style
                                    ], 
                            outputs=[gen_video]
                            )
            else:
                submit.click(
                            fn=sad_talker.test, 
                            inputs=[source_image,
                                    driven_audio,
                                    preprocess_type,
                                    is_still_mode,
                                    enhancer,
                                    batch_size,                            
                                    size_of_image,
                                    pose_style
                                    ], 
                            outputs=[gen_video]
                            )

    return sadtalker_interface
 

if __name__ == "__main__":

    demo = sadtalker_demo()
    demo.queue()
    demo.launch()

さらにsrc/test_audio2coeff.pyを以下のものに入れ替える。デフォルトのものだとFFMPEGが同一ファイル名上書きエラーffmpeg cannot edit existing files in placeが出る。

Python

test_audio2coeff.py

GitHub Source

import os 
import torch
import numpy as np
from scipy.io import savemat, loadmat
from yacs.config import CfgNode as CN
from scipy.signal import savgol_filter

import safetensors
import safetensors.torch 

from src.audio2pose_models.audio2pose import Audio2Pose
from src.audio2exp_models.networks import SimpleWrapperV2 
from src.audio2exp_models.audio2exp import Audio2Exp
from src.utils.safetensor_helper import load_x_from_safetensor  

def load_cpk(checkpoint_path, model=None, optimizer=None, device="cpu"):
    checkpoint = torch.load(checkpoint_path, map_location=torch.device(device))
    if model is not None:
        model.load_state_dict(checkpoint['model'])
    if optimizer is not None:
        optimizer.load_state_dict(checkpoint['optimizer'])

    return checkpoint['epoch']

class Audio2Coeff():

    def __init__(self, sadtalker_path, device):
        #load config
        fcfg_pose = open(sadtalker_path['audio2pose_yaml_path'])
        cfg_pose = CN.load_cfg(fcfg_pose)
        cfg_pose.freeze()
        fcfg_exp = open(sadtalker_path['audio2exp_yaml_path'])
        cfg_exp = CN.load_cfg(fcfg_exp)
        cfg_exp.freeze()

        # load audio2pose_model
        self.audio2pose_model = Audio2Pose(cfg_pose, None, device=device)
        self.audio2pose_model = self.audio2pose_model.to(device)
        self.audio2pose_model.eval()
        for param in self.audio2pose_model.parameters():
            param.requires_grad = False 
        
        try:
            if sadtalker_path['use_safetensor']:
                checkpoints = safetensors.torch.load_file(sadtalker_path['checkpoint'])
                self.audio2pose_model.load_state_dict(load_x_from_safetensor(checkpoints, 'audio2pose'))
            else:
                load_cpk(sadtalker_path['audio2pose_checkpoint'], model=self.audio2pose_model, device=device)
        except:
            raise Exception("Failed in loading audio2pose_checkpoint")

        # load audio2exp_model
        netG = SimpleWrapperV2()
        netG = netG.to(device)
        for param in netG.parameters():
            netG.requires_grad = False
        netG.eval()
        try:
            if sadtalker_path['use_safetensor']:
                checkpoints = safetensors.torch.load_file(sadtalker_path['checkpoint'])
                netG.load_state_dict(load_x_from_safetensor(checkpoints, 'audio2exp'))
            else:
                load_cpk(sadtalker_path['audio2exp_checkpoint'], model=netG, device=device)
        except:
            raise Exception("Failed in loading audio2exp_checkpoint")
        self.audio2exp_model = Audio2Exp(netG, cfg_exp, device=device, prepare_training_loss=False)
        self.audio2exp_model = self.audio2exp_model.to(device)
        for param in self.audio2exp_model.parameters():
            param.requires_grad = False
        self.audio2exp_model.eval()
 
        self.device = device

    def generate(self, batch, coeff_save_dir, pose_style, ref_pose_coeff_path=None):

        with torch.no_grad():
            #test
            results_dict_exp= self.audio2exp_model.test(batch)
            exp_pred = results_dict_exp['exp_coeff_pred']                         #bs T 64

            #for class_id in  range(1):
            #class_id = 0#(i+10)%45
            #class_id = random.randint(0,46)                                   #46 styles can be selected 
            batch['class'] = torch.LongTensor([pose_style]).to(self.device)
            results_dict_pose = self.audio2pose_model.test(batch) 
            pose_pred = results_dict_pose['pose_pred']                        #bs T 6

            pose_len = pose_pred.shape[1]
            if pose_len<13: 
                pose_len = int((pose_len-1)/2)*2+1
                pose_pred = torch.Tensor(savgol_filter(np.array(pose_pred.cpu()), pose_len, 2, axis=1)).to(self.device)
            else:
                pose_pred = torch.Tensor(savgol_filter(np.array(pose_pred.cpu()), 13, 2, axis=1)).to(self.device) 
            
            coeffs_pred = torch.cat((exp_pred, pose_pred), dim=-1)            #bs T 70

            coeffs_pred_numpy = coeffs_pred[0].clone().detach().cpu().numpy() 

            if ref_pose_coeff_path is not None: 
                 coeffs_pred_numpy = self.using_refpose(coeffs_pred_numpy, ref_pose_coeff_path)
        
            savemat(os.path.join(coeff_save_dir, '%s_%s.mat'%(batch['pic_name'], batch['audio_name'])),
                    {'coeff_3dmm': coeffs_pred_numpy})

            return os.path.join(coeff_save_dir, '%s_%s.mat'%(batch['pic_name'], batch['audio_name']))

           # savemat(os.path.join(coeff_save_dir, '%s##%s.mat'%(batch['pic_name'], batch['audio_name'])),  
        #            {'coeff_3dmm': coeffs_pred_numpy})

          #  return os.path.join(coeff_save_dir, '%s##%s.mat'%(batch['pic_name'], batch['audio_name']))
    
    def using_refpose(self, coeffs_pred_numpy, ref_pose_coeff_path):
        num_frames = coeffs_pred_numpy.shape[0]
        refpose_coeff_dict = loadmat(ref_pose_coeff_path)
        refpose_coeff = refpose_coeff_dict['coeff_3dmm'][:,64:70]
        refpose_num_frames = refpose_coeff.shape[0]
        if refpose_num_frames<num_frames:
            div = num_frames//refpose_num_frames
            re = num_frames%refpose_num_frames
            refpose_coeff_list = [refpose_coeff for i in range(div)]
            refpose_coeff_list.append(refpose_coeff[:re, :])
            refpose_coeff = np.concatenate(refpose_coeff_list, axis=0)

        #### relative head pose
        coeffs_pred_numpy[:, 64:70] = coeffs_pred_numpy[:, 64:70] + ( refpose_coeff[:num_frames, :] - refpose_coeff[0:1, :] )
        return coeffs_pred_numpy

WebUIの立ち上げ、完了

これまでの作業で無事ブラウザテストサイトがポート番号7860立ち上げられるはず。

python app_sadtalker.py

【次の記事】ファイアウォール内部のGradio/WebUIを外部からProxyPassを通して使う

【前の記事】ファイアウォール内部のOpenAPI/FastAPIのdocsを外部からProxyPassで呼ぶ

Android 　iPhone/iPad 　Flutter 　MacOS 　Windows 　Debian 　Ubuntu 　CentOS 　FreeBSD 　RaspberryPI 　HTML/CSS 　C/C++ 　PHP 　Java 　JavaScript 　Node.js 　Swift 　Python 　MatLab 　Amazon/AWS 　CORESERVER 　Google 　仮想通貨　 LINE 　OpenAI/ChatGPT 　IBM Watson 　Microsoft Azure 　Xcode 　VMware 　MySQL 　PostgreSQL 　Redis 　Groonga 　Git/GitHub 　Apache 　nginx 　Postfix 　SendGrid 　Hackintosh 　Hardware 　Fate/Grand Order 　ウマ娘　将棋　ドラレコ

【WEBMASTER/管理人】

自営業プログラマーです。お仕事ください！
ご連絡は以下アドレスまでお願いします★

【キーワード検索】

【最近の記事】【全部の記事】

オープンソースリップシンクエンジンSadTalkerをAPI化してアプリから呼ぶ【２】
オープンソースリップシンクエンジンSadTalkerをAPI化してアプリから呼ぶ【１】
【Xcode】iPhone is not available because it is unpairedの対処法
【Let's Encrypt】Failed authorization procedure 503の対処法
【Debian】古いバージョンでapt updateしたら404 not foundでエラーになる場合
ファイアウォール内部のWindows11 PCにmacOS Sequoiaからリモートデスクトップする
ファイアウォール内部のNode.js+Socket.ioを外部からProxyPassを通して使う
ファイアウォール内部のGradio/WebUIを外部からProxyPassを通して使う
オープンソースリップシンクエンジンSadTalkerをDebianで動かす
ファイアウォール内部のOpenAPI/FastAPIのdocsを外部からProxyPassで呼ぶ

【カテゴリーリンク】