ADVERTISEMENT
Kokoro v0.19 是一款最近發布的文字轉語音(Text-to-Speech, TTS)模型,僅有 8200 萬參數,但輸出品質非常高。它以 Apache 授權發布,並以不到 100 小時的音訊進行訓練。目前支援美式、英式英語、法語、韓語、日語和中文,並提供多種優質的聲音。
以下是其品質範例:
一位熱衷於閱讀的使用者克勞迪奧·桑迪尼(Claudio Santini)一直以來都夢想著將自己的電子書庫轉換為有聲書,尤其是那些市面上難以找到有聲版本的冷門書籍。而 Kokoro 語音模型的出現,讓 桑迪尼看到了實現夢想的曙光。
Kokoro 是一款以快速著稱的語音模型,long8811 認為它足以勝任這項任務。為此,他開發了一款名為 Audiblez 的小工具,向知名有聲書平台 Audible 致敬。Audiblez 可以解析 .epub 檔案格式,並將書中的文字內容轉換為高品質的語音檔案。
根據桑迪尼的測試,在他的 M2 MacBook Pro 上,將理查德·道金斯(Richard Dawkins)所著的《自私的基因》(The Selfish Gene)轉換為 mp3 格式大約需要兩個小時。這本書約有 10 萬字(或 60 萬個字元),轉換速度約為每秒 80 個字元。桑迪尼的成功嘗試,為廣大書迷帶來了福音,也為 Kokoro 語音模型的應用開拓了新的可能性。
如何安裝和運行
如果你的電腦上安裝了 Python 3,可以使用 pip 進行安裝。但請注意,該工具無法與 Python 3.13 版本相容。
然後,你還需要在同一個資料夾中下載另外幾個檔案,這些檔案大約為 360MB:
pip install audiblez
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
然後,要將 epub 檔案轉換為有聲讀物,只需執行:
換為有聲讀物,只需執行:
audiblez book.epub -l en-gb -v af_sky
它首先會在同一個目錄中建立一系列名為 book_chapter_1.wav、book_chapter_2.wav 等的文件,最後會生成一個名為 book.m4b 的完整有聲書文件,你可以用 VLC 或任何有聲書播放器收聽。如果你的機器上安裝了 ffmpeg,才會生成 .m4b 文件。
支援的語言
使用 -l 選項指定語言,支持的語言代碼包括:🇺🇸 en-us(美式英語)、🇬🇧 en-gb(英式英語)、🇫🇷 fr-fr(法語)、🇯🇵 ja(日語)、🇰🇷 kr(韓語)和 🇨🇳 cmn(簡體中文)。
支援的語音
使用 -v 選項指定語音,可選的語音包括 af、af_bella、af_nicole、af_sarah、af_sky、am_adam、am_michael、bf_emma、bf_isabella、bm_george、bm_lewis。你可以在這裡試聽它們:https://huggingface.co/spaces/hexgrad/Kokoro-TTS
章節檢測
章節檢測功能有點不穩定,但在桑迪尼試過的大多數 .epub 文件中,Kokoro可以成功找到核心章節,並跳過封面、目錄、附錄等內容。如果你發現它沒有包含你感興趣的章節,可以嘗試調整程式碼中的 is_chapter
函數。通常它會跳過前言或引言部分,目前還不確定這是 bug 還是某種功能。
原始碼
想多了解這套軟件,可以參考 GitHub 上的 Audiblez 專案。雖然目前該工具仍有一些很粗糙的地方,但對大部分的人來說都已經足夠使用。未來的改進方向可能包括:
- 更好的章節檢測,或允許使用者包含/排除章節。
- 將章節導航加到 m4b 檔案(這看起來很難,因為 ffmpeg 沒有做到)
- 使用一些圖像轉文字模型為圖像加入敘述
- 程式碼很短,可以直接貼在這裡:
#!/usr/bin/env python3
# audiblez - A program to convert e-books into audiobooks using
# Kokoro-82M model for high-quality text-to-speech synthesis.
# by Claudio Santini 2025 - https://claudio.uk
import argparse
import sys
import time
import shutil
import subprocess
import soundfile as sf
import ebooklib
import warnings
import re
from pathlib import Path
from string import Formatter
from bs4 import BeautifulSoup
from kokoro_onnx import Kokoro
from ebooklib import epub
from pydub import AudioSegment
def main(kokoro, file_path, lang, voice):
filename = Path(file_path).name
with warnings.catch_warnings():
book = epub.read_epub(file_path)
title = book.get_metadata('DC', 'title')[0][0]
creator = book.get_metadata('DC', 'creator')[0][0]
intro = f'{title} by {creator}'
print(intro)
chapters = find_chapters(book)
print('Found chapters:', [c.get_name() for c in chapters])
texts = extract_texts(chapters)
has_ffmpeg = shutil.which('ffmpeg') is not None
if not has_ffmpeg:
print('\033[91m' + 'ffmpeg not found. Please install ffmpeg to create mp3 and m4b audiobook files.' + '\033[0m')
total_chars = sum([len(t) for t in texts])
print('Started at:', time.strftime('%H:%M:%S'))
print(f'Total characters: {total_chars:,}')
print('Total words:', len(' '.join(texts).split(' ')))
i = 1
chapter_mp3_files = []
for text in texts:
chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
chapter_mp3_files.append(chapter_filename)
if Path(chapter_filename).exists():
print(f'File for chapter {i} already exists. Skipping')
i += 1
continue
print(f'Reading chapter {i} ({len(text):,} characters)...')
if i == 1:
text = intro + '.\n\n' + text
start_time = time.time()
samples, sample_rate = kokoro.create(text, voice=voice, speed=1.0, lang=lang)
sf.write(f'{chapter_filename}', samples, sample_rate)
end_time = time.time()
delta_seconds = end_time - start_time
chars_per_sec = len(text) / delta_seconds
remaining_chars = sum([len(t) for t in texts[i - 1:]])
remaining_time = remaining_chars / chars_per_sec
print(f'Estimated time remaining: {strfdelta(remaining_time)}')
print('Chapter written to', chapter_filename)
print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
progress = int((total_chars - remaining_chars) / total_chars * 100)
print('Progress:', f'{progress}%')
i += 1
if has_ffmpeg:
create_m4b(chapter_mp3_files, filename)
def extract_texts(chapters):
texts = []
for chapter in chapters:
xml = chapter.get_body_content()
soup = BeautifulSoup(xml, features='lxml')
chapter_text = ''
html_content_tags = ['title', 'p', 'h1', 'h2', 'h3', 'h4']
for child in soup.find_all(html_content_tags):
inner_text = child.text.strip() if child.text else ""
if inner_text:
chapter_text += inner_text + '\n'
texts.append(chapter_text)
return texts
def is_chapter(c):
name = c.get_name().lower()
part = r"part\d{1,3}"
if re.search(part, name):
return True
ch = r"ch\d{1,3}"
if re.search(ch, name):
return True
if 'chapter' in name:
return True
def find_chapters(book, verbose=True):
chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
if verbose:
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT:
# print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
if len(chapters) == 0:
print('Not easy to find the chapters, defaulting to all available documents.')
chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
return chapters
def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
remainder = int(tdelta)
f = Formatter()
desired_fields = [field_tuple[1] for field_tuple in f.parse(fmt)]
possible_fields = ('W', 'D', 'H', 'M', 'S')
constants = {'W': 604800, 'D': 86400, 'H': 3600, 'M': 60, 'S': 1}
values = {}
for field in possible_fields:
if field in desired_fields and field in constants:
values[field], remainder = divmod(remainder, constants[field])
return f.format(fmt, **values)
def create_m4b(chaptfer_files, filename):
tmp_filename = filename.replace('.epub', '.tmp.m4a')
if not Path(tmp_filename).exists():
combined_audio = AudioSegment.empty()
for wav_file in chaptfer_files:
audio = AudioSegment.from_wav(wav_file)
combined_audio += audio
print('Converting to Mp4...')
combined_audio.export(tmp_filename, format="mp4", codec="aac", bitrate="64k")
final_filename = filename.replace('.epub', '.m4b')
print('Creating M4B file...')
proc = subprocess.run(['ffmpeg', '-i', f'{tmp_filename}', '-c', 'copy', '-f', 'mp4', f'{final_filename}'])
Path(tmp_filename).unlink()
if proc.returncode == 0:
print(f'{final_filename} created. Enjoy your audiobook.')
print('Feel free to delete the intermediary .wav chapter files, the .m4b is all you need.')
def cli_main():
if not Path('kokoro-v0_19.onnx').exists() or not Path('voices.json').exists():
print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
sys.exit(1)
kokoro = Kokoro('kokoro-v0_19.onnx', 'voices.json')
voices = list(kokoro.get_voices())
voices_str = ', '.join(voices)
epilog = 'example:\n' + \
' audiblez book.epub -l en-us -v af_sky'
default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('epub_file_path', help='Path to the epub file')
parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
if len(sys.argv) == 1:
parser.print_help(sys.stderr)
sys.exit(1)
args = parser.parse_args()
main(kokoro, args.epub_file_path, args.lang, args.voice)
if __name__ == '__main__':
cli_main()
資料來源:claudio
請注意!留言要自負法律責任,相關案例層出不窮,請慎重發文!