带有较长文本的 Chrome 语音合成

IT技术 javascript google-chrome speech-synthesis
2021-03-14 12:48:27

我在 Chrome 33 中尝试使用语音合成 API 时遇到问题。它与较短的文本完美配合,但如果我尝试较长的文本,它只会停在中间。像这样停止一次后,语音合成在 Chrome 中的任何地方都不起作用,直到浏览器重新启动。

示例代码(http://jsfiddle.net/Mdm47/1/):

function speak(text) {
    var msg = new SpeechSynthesisUtterance();
    var voices = speechSynthesis.getVoices();
    msg.voice = voices[10];
    msg.voiceURI = 'native';
    msg.volume = 1;
    msg.rate = 1;
    msg.pitch = 2;
    msg.text = text;
    msg.lang = 'en-US';

    speechSynthesis.speak(msg);
}

speak('Short text');
speak('Collaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without revolutionary ROI. Efficiently unleash cross-media information without cross-media value. Quickly maximize timely deliverables for real-time schemas. Dramatically maintain clicks-and-mortar solutions without functional solutions.');
speak('Another short text');

它在第二个文本的中间停止说话,在那之后我无法让任何其他页面说话。

这是浏览器错误还是某种安全限制?

6个回答

我在使用 Google Chrome Speech Synthesis 时遇到这个问题已经有一段时间了。经过一番调查,我发现了以下几点:

  • 在话语的断裂,只有当声音不是本机的声音发生
  • 剪切通常发生在200-300 个字符之间
  • 当它确实破裂时,您可以通过执行以下操作来解冻它 speechSynthesis.cancel();
  • ' onend ' 事件有时决定不触发。一个古怪的解决方法是在说话之前使用 console.log() 输出话语对象。我还发现在 setTimeout 回调中包装 speak 调用有助于解决这些问题。

针对这些问题,我编写了一个克服字符限制的函数,将文本分成更小的话语,然后一个接一个地播放。显然,有时你会听到一些奇怪的声音,因为句子可能被分成两个单独的话语,每个话语之间有一个小的时间延迟,但是代码会尝试在标点符号处分割这些点,以使声音中断不那么明显。

更新

我已在https://gist.github.com/woollsta/2d146f13878a301b36d7#file-chunkify-js公开提供此解决方法非常感谢Brett Zamir的贡献。

功能:

var speechUtteranceChunker = function (utt, settings, callback) {
    settings = settings || {};
    var newUtt;
    var txt = (settings && settings.offset !== undefined ? utt.text.substring(settings.offset) : utt.text);
    if (utt.voice && utt.voice.voiceURI === 'native') { // Not part of the spec
        newUtt = utt;
        newUtt.text = txt;
        newUtt.addEventListener('end', function () {
            if (speechUtteranceChunker.cancel) {
                speechUtteranceChunker.cancel = false;
            }
            if (callback !== undefined) {
                callback();
            }
        });
    }
    else {
        var chunkLength = (settings && settings.chunkLength) || 160;
        var pattRegex = new RegExp('^[\\s\\S]{' + Math.floor(chunkLength / 2) + ',' + chunkLength + '}[.!?,]{1}|^[\\s\\S]{1,' + chunkLength + '}$|^[\\s\\S]{1,' + chunkLength + '} ');
        var chunkArr = txt.match(pattRegex);

        if (chunkArr[0] === undefined || chunkArr[0].length <= 2) {
            //call once all text has been spoken...
            if (callback !== undefined) {
                callback();
            }
            return;
        }
        var chunk = chunkArr[0];
        newUtt = new SpeechSynthesisUtterance(chunk);
        var x;
        for (x in utt) {
            if (utt.hasOwnProperty(x) && x !== 'text') {
                newUtt[x] = utt[x];
            }
        }
        newUtt.addEventListener('end', function () {
            if (speechUtteranceChunker.cancel) {
                speechUtteranceChunker.cancel = false;
                return;
            }
            settings.offset = settings.offset || 0;
            settings.offset += chunk.length - 1;
            speechUtteranceChunker(utt, settings, callback);
        });
    }

    if (settings.modifier) {
        settings.modifier(newUtt);
    }
    console.log(newUtt); //IMPORTANT!! Do not remove: Logging the object out fixes some onend firing issues.
    //placing the speak invocation inside a callback fixes ordering and onend issues.
    setTimeout(function () {
        speechSynthesis.speak(newUtt);
    }, 0);
};

如何使用它...

//create an utterance as you normally would...
var myLongText = "This is some long text, oh my goodness look how long I'm getting, wooooohooo!";

var utterance = new SpeechSynthesisUtterance(myLongText);

//modify it as you normally would
var voiceArr = speechSynthesis.getVoices();
utterance.voice = voiceArr[2];

//pass it into the chunking function to have it played out.
//you can set the max number of characters by changing the chunkLength property below.
//a callback function can also be added that will fire once the entire text has been spoken.
speechUtteranceChunker(utterance, {
    chunkLength: 120
}, function () {
    //some code to execute when done
    console.log('done');
});

希望人们觉得这很有用。

您可以将文本分成块并将它们加载到 webSpeech 中。我在textfromtospeech.com/uk/text-to-voice上实现了它
2021-04-23 12:48:27
你好!做得好。我们在 IOS 上通过cordova 使用speechSynthesis,其中记录一些话语会导致记录器死亡。我们发现,在您的 js 中的某处(如在数组中)存储对话语的引用也可以工作,而不会发送垃圾邮件!我们在 SpeechSynthesis 取消时清理数组 - 所以没有内存泄漏。
2021-05-10 12:48:27
正如其他答案中提到的,有一个更简单的解决方案: setInterval(() => { speechSynthesis.pause(); speechSynthesis.resume(); }, 5000);
2021-05-14 12:48:27
顺便说一句,这就是console.log修复它的原因stackoverflow.com/questions/28839652/...
2021-05-16 12:48:27
@BrettZamir 是在演讲中失败了,还是根本不说话?speechSynthesis.cancel();在调用组块模式以清除任何排队的语音之前尝试执行此操作。此外,如果它在讲话中中断,请考虑将chunkLength属性调整为较小的值。如果这有帮助,请告诉我。
2021-05-18 12:48:27

我已经解决了这个问题,同时有一个定时器函数调用 pause() 和 resume() 函数并再次调用定时器。在 onend 事件中,我清除了计时器。

    var myTimeout;
    function myTimer() {
        window.speechSynthesis.pause();
        window.speechSynthesis.resume();
        myTimeout = setTimeout(myTimer, 10000);
    }
    ...
        window.speechSynthesis.cancel();
        myTimeout = setTimeout(myTimer, 10000);
        var toSpeak = "some text";
        var utt = new SpeechSynthesisUtterance(toSpeak);
        ...
        utt.onend =  function() { clearTimeout(myTimeout); }
        window.speechSynthesis.speak(utt);
    ...

这似乎运作良好。

我发现在 Chrome Android 上,.pause()其次是.resume()可以停止语音。我只用 a .resume()in a替换了它,setInterval它似乎工作正常(如@MhagnumDw 的回答所示)。
2021-04-25 12:48:27
哇,我简直不敢相信这只有 1 个赞成票。到目前为止,这个答案对我阅读相当长的字符串来说就像一个魅力。
2021-04-27 12:48:27
你让我今天一整天都感觉很好!
2021-05-09 12:48:27
这应该有更多的赞成票。根据我的经验,这被证明是唯一稳定的解决方案。
2021-05-10 12:48:27
这是最好的解决方案。如果你不改变声音,你就不必做任何事情,但如果你改变了,这比尝试对大文本进行分块更简单、更稳定。
2021-05-10 12:48:27

一个简单而有效的解决方案是定期恢复。

function resumeInfinity() {
    window.speechSynthesis.resume();
    timeoutResumeInfinity = setTimeout(resumeInfinity, 1000);
}

您可以将其与 onend 和 onstart 事件相关联,因此您只会在必要时调用简历。就像是:

var utterance = new SpeechSynthesisUtterance();

utterance.onstart = function(event) {
    resumeInfinity();
};

utterance.onend = function(event) {
    clearTimeout(timeoutResumeInfinity);
};

我偶然发现了这个!

希望这有帮助!

最多 1428 个字符就可以了,超过 1428 个字符就不行了。奇怪的
2021-04-27 12:48:27
我尝试了您的解决方案,对我来说效果很好。感谢分享。
2021-04-29 12:48:27
我认为我们面临的不是字符限制而是时间限制(请参阅此问题和其他地方的其他答案)。我刚刚测试了在stackoverflow.com/q/42875726/5025060发布的超过 1800 个字符的非母语的小提琴,它在 Chrome 63.0.3239.132(官方版本)(64 位)和 Windows 7/64 Pro 下工作。
2021-04-30 12:48:27
有谁知道这是否会导致内存泄漏,例如如果在onend触发之前删除话语
2021-05-07 12:48:27
我们有一个商业产品,这个问题困扰着我们。我发现你建议每秒调用 .resume() 所以我实现了它,它似乎工作得很好(到目前为止)。谢谢!
2021-05-20 12:48:27

彼得回答的问题是当您设置语音合成队列时它不起作用。该脚本会将新块放在队列的末尾,因此无序。示例:https : //jsfiddle.net/1gzkja90/

<script type='text/javascript' src='http://code.jquery.com/jquery-2.1.0.js'></script>
<script type='text/javascript'>    
    u = new SpeechSynthesisUtterance();
    $(document).ready(function () {
        $('.t').each(function () {
            u = new SpeechSynthesisUtterance($(this).text());

            speechUtteranceChunker(u, {
                chunkLength: 120
            }, function () {
                console.log('end');
            });
        });
    });
     /**
     * Chunkify
     * Google Chrome Speech Synthesis Chunking Pattern
     * Fixes inconsistencies with speaking long texts in speechUtterance objects 
     * Licensed under the MIT License
     *
     * Peter Woolley and Brett Zamir
     */
    var speechUtteranceChunker = function (utt, settings, callback) {
        settings = settings || {};
        var newUtt;
        var txt = (settings && settings.offset !== undefined ? utt.text.substring(settings.offset) : utt.text);
        if (utt.voice && utt.voice.voiceURI === 'native') { // Not part of the spec
            newUtt = utt;
            newUtt.text = txt;
            newUtt.addEventListener('end', function () {
                if (speechUtteranceChunker.cancel) {
                    speechUtteranceChunker.cancel = false;
                }
                if (callback !== undefined) {
                    callback();
                }
            });
        }
        else {
            var chunkLength = (settings && settings.chunkLength) || 160;
            var pattRegex = new RegExp('^[\\s\\S]{' + Math.floor(chunkLength / 2) + ',' + chunkLength + '}[.!?,]{1}|^[\\s\\S]{1,' + chunkLength + '}$|^[\\s\\S]{1,' + chunkLength + '} ');
            var chunkArr = txt.match(pattRegex);

            if (chunkArr[0] === undefined || chunkArr[0].length <= 2) {
                //call once all text has been spoken...
                if (callback !== undefined) {
                    callback();
                }
                return;
            }
            var chunk = chunkArr[0];
            newUtt = new SpeechSynthesisUtterance(chunk);
            var x;
            for (x in utt) {
                if (utt.hasOwnProperty(x) && x !== 'text') {
                    newUtt[x] = utt[x];
                }
            }
            newUtt.addEventListener('end', function () {
                if (speechUtteranceChunker.cancel) {
                    speechUtteranceChunker.cancel = false;
                    return;
                }
                settings.offset = settings.offset || 0;
                settings.offset += chunk.length - 1;
                speechUtteranceChunker(utt, settings, callback);
            });
        }

        if (settings.modifier) {
            settings.modifier(newUtt);
        }
        console.log(newUtt); //IMPORTANT!! Do not remove: Logging the object out fixes some onend firing issues.
        //placing the speak invocation inside a callback fixes ordering and onend issues.
        setTimeout(function () {
            speechSynthesis.speak(newUtt);
        }, 0);
    };
</script>
<p class="t">MLA format follows the author-page method of in-text citation. This means that the author's last name and the page number(s) from which the quotation or paraphrase is taken must appear in the text, and a complete reference should appear on your Works Cited page. The author's name may appear either in the sentence itself or in parentheses following the quotation or paraphrase, but the page number(s) should always appear in the parentheses, not in the text of your sentence.</p>
<p class="t">Joe waited for the train.</p>
<p class="t">The train was late.</p>
<p class="t">Mary and Samantha took the bus.</p>

就我而言,答案是在将字符串添加到队列之前“分块”字符串。见这里:http : //jsfiddle.net/vqvyjzq4/

许多支持 Peter 的想法以及正则表达式(我还没有征服它。)我相信 javascript 可以清理,这更像是一个概念证明。

<script type='text/javascript' src='http://code.jquery.com/jquery-2.1.0.js'></script>
<script type='text/javascript'>    
    var chunkLength = 120;
    var pattRegex = new RegExp('^[\\s\\S]{' + Math.floor(chunkLength / 2) + ',' + chunkLength + '}[.!?,]{1}|^[\\s\\S]{1,' + chunkLength + '}$|^[\\s\\S]{1,' + chunkLength + '} ');

    $(document).ready(function () {
        var element = this;
        var arr = [];
        var txt = replaceBlank($(element).text());
        while (txt.length > 0) {
            arr.push(txt.match(pattRegex)[0]);
            txt = txt.substring(arr[arr.length - 1].length);
        }
        $.each(arr, function () {
            var u = new SpeechSynthesisUtterance(this.trim());
            window.speechSynthesis.speak(u);
        });
    });
</script>
<p class="t">MLA format follows the author-page method of in-text citation. This means that the author's last name and the page number(s) from which the quotation or paraphrase is taken must appear in the text, and a complete reference should appear on your Works Cited page. The author's name may appear either in the sentence itself or in parentheses following the quotation or paraphrase, but the page number(s) should always appear in the parentheses, not in the text of your sentence.</p>
<p class="t">Joe waited for the train.</p>
<p class="t">The train was late.</p>
<p class="t">Mary and Samantha took the bus.</p>

2017 年,这个 bug 仍然存在。作为屡获殊荣的 Chrome 扩展程序Read Aloud的开发者,我碰巧非常了解这个问题好吧,开玩笑的获奖部分。

  1. 如果超过 15 秒,您的演讲就会卡住。
  2. 我发现 Chrome 使用 15 秒空闲计时器来决定何时停用扩展程序的事件/背景页面。我相信这是罪魁祸首。

我使用的解决方法是一种相当复杂的分块算法,它尊重标点符号。对于拉丁语言,我将最大块大小设置为 36 个单词。如果您愿意,代码是开源的:https : //github.com/ken107/read-aloud/blob/315f1e1d5be6b28ba47fe0c309961025521de516/js/speech.js#L212

36 个字的限制在大多数情况下效果很好,保持在 15 秒以内。但也会有卡住的情况。为了从中恢复,我使用了一个 16 秒的计时器。

好的!我来这里是因为我在做一个类似的项目并遇到了这个问题。很高兴看到您已经可以使用 Chrome 和 Firefox 插件了!
2021-04-29 12:48:27