Key advancements in subtitling and closed captioning, including AI and machine learning, are enabling content owners to bring their content to global audiences. Shawn Liew reports.
For broadcasters and media companies delivering content globally on a multitude of platforms, subtitling and closed captioning remains one of the most important, and challenging, broadcast functions to execute.
Delivering closed captioning at scale is costly and the manual undertaking can be burdensome to production teams, David Kulczar, senior product manager, Watson Video Analytics, IBM Watson Media, points out.
Another major challenge with closed captioning, he tells APB, comes from a language and compliance standpoint. “Language is nuanced in a way that is sometimes difficult to capture and deliver in an automated caption. Programmes require context, and this is where machine learning can be incredibly valuable in increasing precision over time.”
Kulczar cites the example of the 2017 Tennis US Open, where IBM Watson Media powered closed captioning of the event, and was able to navigate nuanced tennis lingo. “With a power artificial intelligence (AI) and machine learning combo, Watson differentiated between ‘love’ the emotion, and ‘love’ within the context of tennis.”
Compliance, he adds, is another area that is “tricky” to navigate globally. “Regulations surrounding captioning vary between industries, geographic locations and delivery methods, making it challenging to provide compliant captioning.”
With regards to compliance constraints, IBM’s offering is targeted as a tool to help companies reach compliance more efficiently — rather than as a tool that certifies compliance. Because of this, IBM’s products are not directly impacted by regional compliance regulations, although the company constantly reviews how its tools can better help customers adhere to compliance regardless of region.
Product features such as Smart Layout were created to help with many of the compliance differences across regions, and in a bid to further remove the frictions associated with these processes, IBM launched Watson Captioning last month. This is a new standalone offering that leverages AI to automate the captioning process while ensuring increased accuracy over time through its machine learning capabilities. “By adding a layer of searchable, textual data to video libraries, Watson Captioning empowers media companies to more easily adapt generated captions to meet specific compliance standards,” Kulczar highlights.
Besides Watson Captioning, IBM Cloud Video recently introduced the ability to convert video speech to text, an indication, perhaps, of a larger push towards automation in production processes for media and entertainment? Agreeing, Kulczar suggests: “AI and automation are powerful tools for saving time and expediting resource-intensive tasks within production workflows, and we’ll definitely see more investment in the area in the coming years.
“In terms of ensuring quality, we’ll see machine learning become increasingly integral in improving the accuracy of captioning over time.”
Specifically with Watson Captioning, IBM wanted to ensure that the captioning experience was catered to media companies’ needs. One of its standout features is a customisable glossary, which allows users to input a specific set of words and phrases that may be unique to their company or industry. “With the glossary in place and machine learning backing the solution, our customers can generate precise captions from the get-go that become even more accurate over time,” says Kulczar.
As far as bona fide game-changers in the broadcast, media and entertainment go, does AI possess the potential to trump all comers? “Until recently, generating closed captions was quite a manual undertaking, and a costly one at that,” Kulczar says. “Now that AI has streamlined the process, production teams are freed up to work the editorial aspects of the production process.”
Another area in which AI is transforming captioning, he believes, is within live broadcasting. “With AI, broadcasters have the ability to generate closed captions in near real time, something that was previously a major pain point.”
This is important, Kulczar explains, because intelligent captioning helps broadcasters to streamline their own workflows, and deliver reliable captioning to audiences. “Caption is a vital part of media and entertainment, and ensuring proper access to accurate captioning helps optimise the viewing experience for diverse audiences worldwide,” he concludes.
One company that believes automation in closed captioning and subtitling still depend on the function in question is Cavena Image Products. “When it comes to subtitle file generation, the market demands high quality,” says Henrik Moberg, managing director, Cavena Image Products. “Automated subtitling and closed captioning has been big news for the past 10 years; maybe with AI, it is now slowly gaining ground.”
While subtitling service providers rightfully, are constantly looking for the most optimal ways to create and generate subtitle text files, this should not be at the expense of quality, Moberg cautions. He cites the example of Asia: “When it comes to interfacing current playout systems with streaming platforms, a tightly integrated system that deals with Asian character sets is essential. This is why the Cavena protocol has become the de facto standard in Asia.”
A Swedish company, Cavena has been building subtitling systems for broadcasters and other users for the past 25 years, and is dedicated to building “functional and reliable” subtitling systems for translators, translation facilities and broadcasters. Through the addition of a suitable transmission protocol, the Cavena system can also be adopted for subtitle transmission with video over IP.
Malaysia’s Astro and Hong Kong’s TVB are some of Cavena’s broadcast customers in Asia, where there are no constraints resulting from regulations, offers Moberg, who also describes how Asia-Pacific skipped the old European teletext in heritage and went completely with DVB. Today, there are multiple operators distributing video content with original audio, and a number of different languages available in the form of subtitles.
He continues: “If the spoken language of the audience is large enough, there may also be audio dubbing. And even if I personally do not like to see James Bond speaking anything else than English, dubbing or subtitles is driven by market demand, and that is the way it is.”
With the tagline of ‘Worldwide Subtitling Made Easy’, the message is clear, although, delving deeper, Cavena is about serving as a technology partner to operators distributing video and subtitles and/or closed captioning. “We assist with simplifying workflows with any subtitle file, on any platform,” Moberg emphasises. “Non-Latin characters between different operators is a challenge, where we assist with our knowledge of complex Asian character sets, right down to the details of each font and character. Non-Latin characters for over-the-top (OTT) platforms is a core competence for us.”
Closed captioning and subtitling are key tools for content providers to reach global audiences. However, translation of captions and subtitles will be vital to reach a wider audience, particularly in Asia-Pacific, suggests Hiren Hindocha, president and CEO of Digital Nirvana. “If you are not translating captions, you are missing out on a larger audience across the region. Integration of multilingual closed captions or subtitles on videos can expand viewership globally, especially in this part of the world.”
Where Asia-Pacific is concerned, he points out that there are audiences who are less proficient in English, and thus are watching their favourite local programmes with English closed captions or subtitles. Conversely, they may also be watching English programmes with local-language closed captions or subtitles.
As for video sharing platforms such as YouTube, the secondary-language translations of captions not only help the videos reach out to a wider audience, but also improve searchability and discoverability, as YouTube indexes secondary language captions and subtitles.
With so much content now residing on streaming services such as YouTube, Amazon, Netflix, Hulu, Vimeo and so on, automated caption services can help bring global awareness to content, says Hindocha.
He continues: “Captioning opens avenues for content providers to reach a global audience, and it creates a larger audience through online viewing.
“In environments where users on laptop and mobile devices don’t want to or can’t turn on their volume, closed captioning allows them to watch a show sound-free and increases the providers’ audience.”
Another factor driving the use of closed captioning is the creation of metadata, according to Hindocha. Closed captioning increases the searchability of an asset and for content owners, it increases the visibility of their videos. For users, they are able to locate the content they want with ease.
“Automated speech-to-text conversion, coupled with state-of-the-art workflow and experienced captioners, reduces the time and cost to publish, and provides better search engine discoverability — while complying with all legal guidelines,” says Hindocha, who is also a champion of cloud-based closed captioning.
With cloud-based caption synchronisation services, media clips are scanned for unsynchronised captions and sent to an application for automatic synchronisation and format conversion. This technology coordinates all pre-recorded and online video content through an automated process over the cloud, providing a suite of options for clipping, data transfer and caption formats, as well as integration directly to the customer’s video platform.
“As cloud-based technology does not lock a provider into a specific vendor, users can integrate this technology into their existing workflow via APIs,” says Hindocha. Digital Nirvana provides a cloud-based closed captioning service that uses audio fingerprinting to automate near-live synchronisation of live broadcast options, with the ability to revise the text.
The company’s new automated closed captioning and subtitling service also includes pop-on and roll-up captioning services for all technology platforms, and offers high-quality caption generation for all pre-recorded and online video content through an automated process over the cloud. The service is designed to handle multiple SD and HD video formats, as well as a wide range of caption file formats, and in addition to its multi-format capability, Digital Nirvana’s enterprise-level workflow is equipped with checkpoints to ensure content and caption quality.
Video and audio fingerprinting technology also improves the ability to find clips from a huge library, while Digital Nirvana’s caption lookup, along with its synchronisation process, offers advantages to production houses and streaming video providers to sync captions after editing content for air in multiple countries. “As the original captions are retrieved using the lookup process, there’s no need to re-do the entirety of their corresponding subtitles and captions,” Hindocha explains.
He emphasises how content owners are becoming increasingly aware of the reach and return-on-investment (ROI) of their content with subtitles in various languages. This serves as a catalyst to provide textual representation for their video content, in contrast to prior practices, where entire movies were dubbed into another language at a high cost for presentation to other markets.
“With the increased use of subtitles, we’re seeing content gain traction in different regions,” Hindocha says, while noting the evolution of closed captioning. “It steadily evolved from conventional methods to voice writing, to what is a far more automated process currently.
“The applications of closed captioning have also evolved, as it now improves the discoverability of video content and the cognitive modelling (simulating human problem solving in a computerised model) for automated analysis of broadcast content,” he concludes.