Video artificial intelligence was a massive theme at NAB 2018 with a majority of the video publishing technology companies showing off some form of AI integration. In my previous post How Artificial Intelligence Gives Back Time time is money in the video publishing business. AI is set to be a very important tool why all the big guns like AWS (Elemental), Google (Video Intelligence), IBM (Watson) and Microsoft (Azure) had digital AI eye candy to share. There was a feeling of a meet too with all of them were competing to weave their video annotation/labeling – speech to text APIs into a variety of video workflows.
Top Video AI use cases:
- Labeling – The ability to label the elements within a video specific scene selection, people, places, and things.
- Editing – Segmenting by relevance, slicing up the video into logical parts and producing.
- Discovery – Using both annotation and speech to text to expand metadata for funding specific scenes within video libraries.
One of several challenges is this ALL or nothing situation. Video publishers assets can be on many hard drives or encoded without lots of metadata. There are companies that provide services like Axel to index those videos and make them searchable with a mixed model of on-prem tech and cloud services. Dealing with live feeds requires hardware and bigger commitments. Most publishers are not willing to forklift over their video encoding and library to another provider without a clear ROI justification. The other big ROI challenge is video publishers don’t have a lot of patience and the pressure to increase profits on video is higher now with more competition in the digital space across all channels. Selling workflow efficiency in AI won’t be a big enough draw over AI generating substantial revenue solving a specific problem. The pain isn’t high enough to make a big AI investment. There are lots of POC right now in the market, however, not one product creates a seamless flow within a video publisher’s existing workflow. Avid and Adobe are well positioned for the edit team since their products are so widely used. Other cloud providers are enabling AI technology not a specific solution.
Search and discovery was the biggest theme using AI to do image and speech to text analysis. Compliance with Closed Caption to make video accessible in digital will be mandated driving faster adoption. Editing video via AI is in its early phase, however, the technology is emerging fast. There are some exciting examples of AI created video but at scale is another. Of the many talks at NAB some exciting direction on AI in Video were discussed around video asset management. Here are a few examples of what we demoed at NAB 2018 showing promise in the video intelligence field.
Adobe had a big splash with their new editing technologies and using AI to enhance the video editing process. Todd Burke presented Adobe Sensi their AI entry into video intelligence. The video labeling demo and scene slicing we’re designed to help editors create videos faster and simplify the process. The segmenting was just a prototype and video labeling demonstrated the API extension integrated within Sensi.
IBM’s demo was slick and pointed to the direction of using machine learning to process large amounts of video to pull out interesting parts of the video. Doing the announcer and crowd response analysis added another layer of segmentation. You can see a live demo of their AI highlight for the Master. They did the same for Wimbelton slicing up the live feed they were powering for the events and creating “cognitive highlights”. It wasn’t clear if these highlights were used by the edit team or if this was a POC. Regardless there was both image and text analysis of the steams occurring and demonstrated the power of AI in the video.
The Avid demo was just that. They created a discovery UI on top of API’s like Google Vision to assist in the video analysis for search and supporting edit teams. Speech to text and annotation in one UI has its advantages. It’s wasn’t clear how soon this was going to be made available over a development tool.
The team over at Zora had by far the slickest video HUB approach. I believe the play for Google is more around their cloud strategy trying to attract storage of the videos and leverage their Video Intelligence to enable search over all your video assets. Google’s video intelligence is just getting started and their opening up of the AI foundation Tensorflow makes them one of the top companies committed to video AI. I like what Zora is doing and can see editing teams benefiting from this approach. There was a collaborative element too.
GrayMeta UI was slick and their voice to text interface was amazing. This was all powered by Azure. Azure Video Indexer is the real deal and ability to identify face identification has broad use cases. Indexing voice isn’t new but having a fast and slick UI helps enable adoption of the technology. They can pinpoint parts of the video just on the text along. There is a team collaboration element around the product have a Slack feel. The approach was making all media assets searchable.
There were several cool examples of possibilities with Amazon Rekognition - video analysis, facial recognition and video segments. Elemental (purchased by Amazon) core technology is a video ad stitching whereby video ads are inserted into the video directly. They created UI extension demonstrating some possibilities with Rekognition. It wasn’t clear what was in production over the demo. The facial recognition around celebrities looked solid. Elemental had a cool real-time object detect bounding boxes showing up on sports content too. This has many use cases, however, creating more data for video publishers to access when the amount of data they can manage needs to be addressed before adding another data firehosed.
Video artificial intelligence is just getting started and will only improve with greater computing advancements and new algorithms. The guts of what’s needed exist to achieve scale. The major use cases around video discovery and search are set to improve dramatically with industry players opening up more API’s. Video machine learning has great momentum utilizing these API’s to crack open the treasure trove of data locked away inside of video. The combination of video AI and text analysis truly creates a massive metadata for the multitude of use cases where voice computing can play are roll. Outside of all the AI eye candy there needs to be more focus on clear business problems vs. Me Too. More like what’s the end product and how will it make the video publisher more revenue?