Text-to-speech mismatch is when you use text for reading as a base for video scripts or voiceovers. And you do that without adapting the text for spoken delivery.
It’s an error that often occurs in SaaS videos—such as product descriptions, updates, demos, and more. Teams use text meant for reading as their narration scripts and voiceovers.
And that’s a big problem because there’s a big difference between text meant for reading and text meant for listening.
Here’s a real-life video narration script:
We have done some new updates to improve bulk action in Data Shares, which include, number one, the option to ‘select all’ and ‘clear’ on all possible selections when choosing which Data Shares to copy from another workspace. Number two, the Data Warehouse and Excel list pages now have bulk action to delete, restore, and enable, disable.
Read it first. You may be able to understand it. However, chances are you won’t be able to make sense of it when hearing it.
Written language doesn’t translate well when spoken. That’s the effect of text-to-speech mismatch. Let’s analyze why is this happening.
There’s a set of conditions at play when people read a text. Just think about it.
When reading a text, people:
That’s why, text meant for reading may include:
However, all these elements don’t translate well in spoken narration. Why is that the case?
When listening to a video, people:
People won’t enjoy your videos if you use text meant for reading as your narration scripts or voiceovers. The sentence’s complexity alone may hinder understanding. This might backfire by causing people to disconnect from your videos.
To keep this from happening, never use text for reading as your video narration script or voiceover. Always adapt the text for listening. Even better, create your scripts and voiceovers from scratch, following these guidelines:
It may take you some practice. But with time, you’ll learn to create easy-to-follow narration scripts and voiceovers for your SaaS videos.