Video publishers have been caught off guard with the recent announcement of Apple blocking video autoplay. Even Google is pushing back on bad web ads. The backlash against video autoplay has been festering for some time. If losing video ad revenue and turning consumers off with declining traffic isn’t a wake-up call then what will be? Headlines like this from CNN “Apple’s plan to kill autoplay feature could leave publishers in the dust” should get video publisher’s attention. This clamp down isn’t a joke and Google and Apple are taking a hardline to clean up the web experience when it comes to video. Here we dive deep into how to get ahead of these changes by Apple/Google and increase your video lifetime value.
Facebook started the conversation
Since Facebook started force-feeding video autoplay on us, other publishers followed suit knowing their video volume would go up. However, some major agencies flat out said they would only pay half of the CPMs due to the viewability issues with autoplay. A major advertiser (Heineken) is publicly having challenges getting a 6 sec clip to stick. Publishers say the video relationship with Facebook is “complicated”. This is a topic of constant discussions and other players are outright opting out of video autoplay altogether in favor of a better consumer experience. The major catch 22 here is that publishers driving their O&O strategy can’t think of autoplay is a video strategy—it’s a tactic that, in most cases, turns consumers off. If you want to see some of the consumer backlash, just search on Google “how to turn off autoplay” and you will see that this is most definitely a real consumer pain point. With Apple’s latest release of iOS 11 specifically blocking video autoplay, a more thoughtful and intelligent approach is required.
Publishers are responding to consumer demand by giving the options to turn OFF autoplay video.
A video strategy involves deciding to dominate a content category vertically and be the go-to source for the highest value content in that space. Yes, video is content marketing. People watch video for information, enlightenment, entertainment, etc. Video is a very effective communication tool. Video is mobile and on demand. And being a tool, the publisher has a responsibility to harness and wield that tool surgically vs. a blunt object that pushes video views without consumer consent or value add to paid advertisers. Some publishers understand this, such as LittleThings Inc. They are disabling video autoplay completely and focusing on consumer experience. This has resulted In higher play rates (CTR), and higher CPMs that can be verified and justified to their customers. The other major benefit was consumers engaged more.
“We wanted video views to be on the consumer’s terms. By running autoplay, you might [reach your desired] fill rate, but the user is not engaged with the brand the way they would if they raised their hands to watch the video” said Justin Festa, chief digital officer for Little Things, at JW Player’s JW Insights event in New York
The digital publisher today is going to have to use higher intelligence with consumers. A surgical approach to utilizing data and then presenting it is now a must have. So what is the benefit of artificial intelligence in video? It is better to start with the question: What is digital video? If we break it down, digital video is just a series of images and sequences spliced together. Humans are visual and have emotional responses to images and context. The story is a major draw in creating greater emotional response over simply the affinity one may have to the people. Now a computer that translates all the above and puts it into context would have to be truly intelligent. This is not something new; Netflix proved you get higher take rates by having the right images, which results in higher consumer engagement.
In the Making
Three years ago, a technology was introduced called KRAKEN. It utilizes video machine learning to select images to replace the static non-intelligent thumbnail with interactive dynamic thumbnails which are the best set of images to drive the highest play rates possible. The rotation of images provides more visual information when compared to a single image. Video clipping (GIF) was next, however, it is most effective in action shots. A new way of looking at video thumbnails was required. The solution was creating a real time responsive, dynamic intelligence and scoring images based on relevance. Finding the best images is one thing, however, powering video recommendation was a natural fit for finding great images. Learning what collective visuals work together to extend longer time on site is a major deal for all publishers. We’re living in exciting times with advances in machine learning and computer chip design having achieved amazing levels of image processing capability. We have experienced a big leap forward in the code foundation (like Deep Learning) now powering platforms to segment out objects, images, places and facial recognition. We’re in an artificial intelligence renaissance.
Show me the money
Video Recommendation powered by KRAKEN video machine learning. Going beyond meta data and plays to now visuals within the Video.
It’s no secret ads still drive the bulk of digital video revenue. For that very reason, each video play, and increased time on site, translates into cold hard cash. Making the site sticky and getting more repeat visits requires video intelligence. Google and Apple are very serious about protecting the mobile web. It is clear that Google AMP (accelerated mobile pages) has won out with the publishers while Facebook instant articles has fallen short and most have abandoned it due to lack of making money vs AMP. The perfect trifecta of real-time video analytics, intelligence image selection, and video recommendation are now a reality. We have the data and processing power to predict what images make you excited and what video is most relevant to watch. Video discovery is key for increasing video life time value.
Are you ready for the do not track and the non-autoplay world? Like it or not, Google and Apple are disabling video autoplay and intrusive ads. The digital broadcasting publisher has a grand opportunity to leverage machine learning in video. Tapping into visually relevant actions and drawing out behavior is a competitive advantage. Machine learning linked with digital video that maximizes your video assets is a strategic advantage and increases video lifetime value. The above video recommendation example was not possible before machine learning based video processing made it a reality. What possibilities can you imagine? .
So, you’ve developed an OTT app and you’ve marketed it to your viewers. Now your focus is on keeping your viewers watching. How can machine learning drive more engagement? Let’s face it—they may have a favorite show or two, but to keep them engaged for the long term, they need to be able to discover new shows. Because OTT is watched on TVs, you have a lot of real estate to engage with your viewers. A video’s thumbnail has more of an impact on OTT than any other platform, so choose your thumbnails carefully!
Discovery is different on different platforms
On desktop, most videos start with either a search (e.g. Google) or via a social share (e.g. Facebook). Headlines and articles provide additional info to get a viewer to cognitively commit to watching a video. Autoplay runs rampant removing the decision to press “play” from the user.
TVs have a lot more real estate than smartphones
On a smartphone, small screen size is an issue. InfiniGraph’s machine learning data shows that more than three objects in a thumbnail will cause a reduction in play rates. Again, social plays a huge role in the discovery of new content, with some publishers reporting that almost half of their mobile traffic originates from Facebook.
OTT Discovery is Unique
The discovery process on OTT is unique because the OTT experience is unique. Most viewers already have something in mind when they turn on their OTT device. In fact, Hulu claims that they can predict with a 70% accuracy the top three shows each of their users is tuning in to see. But what about the other 30%? What about the discovery of new shows?
Netflix AB Test Example
Netflix has said that if a user can’t find something to watch in 30 seconds, they’ll leave the platform. They decided to start A/B testing their thumbnails to see what impact it would have, and discovered that different audiences engage with different images. They were able to increase view rates by 20-30% for some videos by using better images! In the on-demand world of OTT, the right image is the difference between a satisfied viewer and a user who abandons your platform. If you’re interested in increasing engagement on your OTT app, reach out to us at InfiniGraph to learn more about our machine learning technology named KRAKEN that chooses the best images for the right audience, every single time. Also, check out our post about increasing your video ad inventory!
Being a publisher is a tough gig these days. It’s become a complex world for even the most sophisticated companies. And the curve balls keep coming. Consider just a few of the challenges that face your average publisher today:
Decreasing display rates married with audience migration to mobile with even lower CPMs.
Maturing traffic growth on O&O sites.
Pressure to build an audience on social platforms including adding headcount to do so (Snapchat) without any certainty that it will be sufficiently monetizable.
The sad realization that native ads—last year’s savior!–are inefficient to produce, difficult to scale and are not easily renewable with advertising partners.
The list goes on…
Of course, the biggest opportunity—and challenge–for publishers is video. Nothing shows more promise for publishers from both a user engagement and business perspective than (mobile) video. It’s a simple formula. When users watch more video on a publisher’s site, they are, by definition, more engaged. More video engagement drives better “time spent’ numbers and, of course, higher CPMs.
But the barrier to entry is high, particularly for legacy print publishers. They struggle to convert readers to viewers because creating a consistently high volume of quality video content is expensive and not necessarily a part of their core DNA. Don’t get me wrong. They are certainly creating compelling video, but they have not yet been able to produce it at enough scale to satisfy their audiences. At the other end of the spectrum, video-centric publishers like TV networks that live and breathe video run out of inventory on a continuous basis.
The combined result of publishers’ challenge of keeping up with the consumer demand for quality video is a collective dearth of quality video supply in the market. To put it in culinary terms, premium publishers would sell more donuts if they could, but they just can’t bake enough to satisfy the demand.
So how can you make more donuts? Trust and empower the user!
Rise of Artificial Intelligence
The majority of the buzz at CES this year was about Artificial Intelligence and Machine Learning. The potential for Amazon’s Alexa to enhance the home experience was the shining example of this. In speaking with several seasoned media executives about the AI/machine learning phenomenon, however, I heard a common refrain: “The stuff is cool, but I’m not seeing any real applications for my business yet.” Everyone is pining to figure out a way to unlock user preferences through machine learning in practical ways that they can scale and monetize for their businesses. It is truly the new Holy Grail.
That’s why we at InfiniGraph are so excited about our product KRAKEN. KRAKEN has an immediate and profound impact on video publishing. KRAKEN lets users curate the thumbnails publishers serve and optimizes towards user preference through machine learning in real time. The result?: KRAKEN increases click-to-play rates by 30% on average resulting in the corresponding additional inventory and revenues.
It is a revolutionary application of machine learning that, in execution, makes a one-way, dictatorial publishing style an instant relic. With KRAKEN, the users literally collaborate with the publisher on what images they find most engaging. KRAKEN actually helps you, the publisher, become more responsive to your audience. It’s a better experience and outcome for everyone.
In a world of cool gadgets and futuristic musings, KRAKEN works today in tangible and measurable ways to improve your engagement with your audience. Most importantly, KRAKEN accomplishes this with your current video assets. No disruptive change to your publishing flow. No need to add resources to create more video. Just a machine learning tool that maximizes your video footprint.
In essence, you don’t need to make more donuts. You simply get to serve more of them to your audience. And, KRAKEN does that for you!
Beyond the deep learning hype, digital video sequence (clipping) powered by machine learning is driving higher profits. Video publishers use various images (thumbnails – poster images) to attract readers to watch more video. These “Thumbnail Images” are critical, and the visual information has a great impact on video performance. The lead visual in many cases is more important than the headline. More view equals more revenue it’s that simple. Deep learning is having significant impact in video visual search to video optimization. Here we explore video sequencing and the power of deep learning.
Having great content is required, but if your audience isn’t watching the video then you’re losing money. Understanding what images resonate with your audience and produce higher watch rates is exactly what KRAKEN does. That’s right: show the right image, sequence or clip to your consumers and you’ll increase the number of videos played. This is proven and measurable behavior as outlined in our case studies. An image is really worth a thousand words.
Below are live examples of KRAKEN in action. Each form is powered by a machine learning selection process. Below we describe the use cases for apex image, image rotation and animation clip.
KRAKEN “clips” the video at the point of APEX. Sequences are put together creating a full animation of a scene(s). Boost rates are equal to those from image rotation and can be much higher depending on the content type.
Consumer created clipping points within video
Creates more visual information vs. a static image
Highlights action scenes
Great for mobile and OTT preview
More than one on page can cause distraction
Overuse can turn off consumers
Too many on page can slow page loading performance (due to size)
Mobile LTE is slow and can lead to choppy images instead of a smooth video
Image rotation allows for a more complete visual story to be told when compared to a static image. This results in consumers having a better idea of the content in the video. KRAKEN determines the top four most engaging images and then cycles through them. We are seeing mobile video boost rates above 50%.
Smooth visual transition
Consumer selected top images
Creates a visual story vs. one image to engage more consumers
Ideal for mobile and OTT
Less bandwidth intensive (Mobile LTE)
Similar to animated clips, publishers should limit multiple placements on a single page
KRAKEN always finds the best lead image for any placement. This apex image alone creates high levels of play rates, especially in a click-to-launch placement. Average boost rates are between 20% to 30%.
Audience-chosen top image for each placement
Can be placed everywhere (including social media)
Ideal for desktop
Good with mobile and OTT
Static thumbnails have limited visual information
Once the apex is found, the image will never be substituted
Below are live KRAKEN animation clip examples. All three animations start with the audience choosing the apex image. Then, KRAKEN identifies (via deep learning) clipping points and uses machine learning to adjust to optimal clipping sequence.
HitFix Video Deep Learning Video Clipping to Action, Machine Learning adjust in real time
Video players have transitioned to HTML5 and mobile consumption of video is the fastest growing medium. Broadcasters that embrace advanced technologies that adapt to the consumer preference will achieve higher returns, and at the same time create a better consumer experience. The value proposition is simple: If you boost your video performance by 30% (for a video publisher doing 30 million video plays per month), KRAKEN will drive an additional $2.2 million in revenue (See KRAKEN revenue calculator). This happens with existing video inventory and without additional head count. KRAKEN creates a win-win scenario and will improve its performance as more insights are used to bring prediction and recommendation to consumers, thereby increasing the video process.
The elusive video search whereby you can search video image context is now possible with advanced technologies like deep learning. It’s very exciting to see video SEO becoming a reality thanks to amazing algorithms and massive computing power. We truly can say a picture is worth 1,000 words!
Content creators have fantasized about doing video search. For many years,, major engineering challenges were a road block to comprehending video images directly.
Video visual search opens up a whole new field where video is the new HTML. And, the new visual SEO is what’s in the image. We’re in exciting times with new companies dedicated to video visual search. In a previous post, Video Machine Learning: A Content Marketing Revolution, we demonstrated image analysis within video to improve video performance. After one year, we’re now embarking on video visual search via deep learning.
Behind the Deep Curtain
Video clipping powered by KRAKEN video deep learning. Identify relevance within video images to drive higher plays
Many research groups have collaborated to push the field of deep learning forward. Using an advanced image labeling repository like ImageNet has elevated the deep learning field. The ability to take video and identify what’s in the video frames and apply description opens up huge visual keywords.
What is deep learning? It is probably the biggest buzzword around along with AI (Artificial Intelligence). Deep Learning came from advanced math on large data set processing, similar to the way the human brain works. The human brain is made of up tons of neurons and we have long attempted to mimic how these neurons work. Previously, only humans and a few other animals had the ability to do what machines can now do. This is a game changer.
The evolution of what’s call a Convolution Neural Network, or CNN aka deep learning, was created from thought leaders like Yann LeCrun (Facebook), Geoffrey Hinton (Google), Andrew Ng (Baidu) and Li Fei-Fei (Director of the Stanford AI Lab and creator of ImageNet). Now the field has exploded and all major companies have open sourced their deep learning platforms for running Convolution Neural Networks in various forms. In an interview with New York Times, Fei-Fei said “I consider the pixel data in images and video to be the dark matter of the Internet. We are now starting to illuminate it.” That was back in 2014. For more on the history of machine learning, see the post by Roger Parloff at Fortune.
KRAKEN video deep learning Images for high video engagement
Image reduction is key to video deep learning. Image analysis is achieved through big number crunching. Photo: Chase McMichael created image
Think about this: video is a collection of images linked together and played back at 30 frames-a-second. Analyzing massive number of frames is a major challenge
As humans, we see video all the time and our brains are processing those images in real-time. Getting a machine to do this very task at scale is not trivial. Machines processing images is an amazing feat and doing this task in real-time video is even harder. You must decipher shapes, symbols, objects, and meaning. For robotics and self-driving cars this is the holy grail.
To create a video image classification system required a slightly different approach. You must handle the enormous number of single frames in a video file first to understand what’s in the images.
On September 28th, 2016, the seven-member Google research team announced YouTube-8M leveraging state-of-the-art deep learning models. YouTube-8M, consists of 8 million YouTube videos, equivalent to 500K hours of video, all labeled and there are 4800 Knowledge Graph entities. This is a big deal for the video deep learning space. YouTube-8M’s scale required some pre-processing on images to pull frame level features first. The team used Inception-V3 image annotation model trained on ImageNet. What’s makes this such a great thing is we now have access to a very large video labeling system and Google did massive heavy lifting to create 8M.
Google 8M Stats Video Visual Search
Top level numbers of YouTube 8M. Photo created by Chase McMichael.
The secret to handling all this big data was reducing the number of frames to be processed. The key is extracting frame level features from 1 frame-per-second creating a manageable data set. This resulted in 1.9 billion video frames enabling a reasonable handling of data. With this size you can train a TensorFlow model on a single Graphic Process Unit (GPU) in 1 day! In comparison, the 8M would have required a petabyte of video storage and 24 CPUs of computing power for a year. It’s easy to see why pre-processing was required to do video image analysis and frame segmenting created a manageable data set.
Google has beautifully created two big parts of the video deep learning trifecta. First, they opened up a video based labeling system (YouTube8m). This will give all in the industry a leg up in analyzing video. Without a labeling system like ImageNet, you would have to do the insane visual analysis on your own. Second, Google opened Tensoflow, their deep learning platform, creating a perfect storm for video deep learning to take off. This is why some call it an artificial intelligence renaissance. Third, we have access to a big data pipeline. For Google this is easy, as they have YouTube. Companies that are creating large amounts of video or user-generated videos will greatly benefit.
The deep learning code and hardware are becoming democratized, and its all about the visual pipeline. Having access to a robust data pipeline is the differentiation. Companies that have the data pipeline will create a competitive advantage from this trifecta.
Follow Google’s lead with TensorFlow, Facebook launched it’s own open AI platform FAIR, followed by Baidu. What does this all mean? The visual information disruption is in full motion. We’re in a unique time where machines can see and think. This is the next wave of computing. Video SEO powered by deep learning is on track to be what keywords are to HTML.
Visual search is driving opportunity and lowering technology costs to propel innovation. Video discovery is not bound by what’s in a video description (meta layer). The use cases around deep learning include medical image processing to self-flying drones, and that is just a start.
Deep learning will have a profound impact our daily lives in ways we never imagined.
Both Instagram and Snapchat are using sticker overlays based on facial recognition and Google Photo sort your photos better than any app out there. Now we’re seeing purchases linked with object recognition at Houzz leveraging product identification powered by deep learning. The future is bright for deep learning and content creation. Very soon we’ll be seeing artificial intelligence producing and editing video.
How do you see video visual search benefiting you, and what exciting use cases can you imagine?
Feature Image is YouTube 8M web interface screen shot taken by Chase McMichael on September 30th .
Deep Learning, image and object recognition are core elements to intelligent video visual analysis. Understanding context within and classification creates a strong use case for video deep learning. Digital video is exploding, however there are few leveraging the wealth of data and how to harness visual analysis. A true reinforced deep learning system using collective human intelligence linked with neural networks provides the foundation to a new level of video insights. We’re just at the beginnings of intelligent video and using this knowledge to improve video performance.
Chase McMichael talk at ACM on Hacking Video Via Deep Learning Photo: Sophia Viklund
How and why did Ad Tech become a bad word? Ad tech has become associated with, and blamed for, everything from damaging the user experience (slow load rates) to creating a series of tolls that the advertiser pays for but ultimately at the expense of margins for publishers. Global warming has a better reputation. Even the VC’s are investing more in marketing tech than the ad tech space.
The Lumascape is denser than ever and, even with consolidation, it will take years before there is clarity. And the newest, new threats to the ad ecosystem like visibility, bots, and ad blocking will continue to motivate scores of new “innovative” companies to help solve these issues. This is in spite of the anemic valuations ad tech companies are currently seeing from Wall Street and venture firms. The problem is that the genesis of almost all of these technologies begins with the race for the marketing dollar while the user experience remains an afterthought. A wise man once said, “Improve the user experience and the ad dollars will follow.” So few new companies are born out of this philosophy. The ones that are—Facebook, Google and Netflix (How Netflix does A/B testing) —are massively successful.
One of the initial promises for publishers to engage their readers on the web was to provide an “interactive” experience—a two-way conversation. The user would choose what they wanted to consume, and editors would serve up more of what they wanted resulting in a happier, more highly engaged user. Service and respect the user and you—the publisher—will be rewarded.
This is what my company does. We have been trying to understand why the vast majority of users don’t click on a video when, in fact, they are there to watch one! How can publishers make the experience better? Editors often take great care to select a thumbnail image that they believe their users will click on to start a video and then…nothing. On average, 85% of videos on publishers’ sites do not get started.
We believe that giving the user control and choice is the answer to this dilemma. So we developed a patented machine learning platform that responds to the wisdom of the crowds by serving up thumbnail images from publisher videos that the user—not the editor—determines are best. By respecting the user experience with our technology, users are 30% more likely to click on videos when the thumbnails are user-curated.
What does this mean for publishers? Their users have a better experience because they are actually consuming the most compelling content on the site. Nothing beats the sight, sound and motion of the video experience. Their users spend more time on the site and are more likely to return to the site in the future to consume video. Importantly from a monetization standpoint, InfiniGraph’s technology “KRAKEN” creates 30% more pre-roll revenue for the publisher.
We started our company with the goal of improving the user experience, and as a result, monetization has followed. This, by the way, enables publishers to create even more video for their users. There are no tricks. No additional load times. No videos that follow you down the page to satisfy the viewability requirements for proposals from the big holding companies. Just an incredibly sophisticated machine learning algorithm that helps consumers have a more enjoyable experience on their favorite sites. Our advice? Forget about “ad tech” solutions. Think about “User Tech”. The “ad” part will come.
The live example above demonstrates KRAKEN in action on the movie trailer “Intersteller” achieving 16.8X improvement over the traditional static thumbnail image.
Deep Learning Methods Within Video An End Game Application – We’ll explore the use cases of using deep learning to drive higher video views. The coming Valhalla of video Deep Learning is being realized in visual object recognition and image classification within video. Mobile video has and continues to transform the way video is being distributed and consumed.
We’re witnessing the largest digital land grab in video history. Mobile video advertising is the fastest growing segment projected to account for $25 billion worth of ad spend by 2021. Deep Learning and artificial intelligence are also growing within the very same companies who are jockeying for your cognitive attention. This confluence of video and deep learning has created a new standard in higher performing video content diving greater engagement, views, and revenue. In this post we’ll dive deep into how video intelligence is changing the mobile video game. Many studies showing tablet and smartphone viewing accounted for nearly 40 minutes of daily viewing in 2015 with mobile video continuing to dominate in 2016. Moreover, digital video is set to out pace TV for the first time and social / Instagram/Snapchat video is experiencing explosive growth.
The Interstellar trailer is a real example of KRAKEN in action and achieved a 16X improvement in video starts. Real-Time A/B testing between the poster image (thumbnail) and selected images pulled from visual training set provide the simultaneous measurement of what image induce engagement. All data and actions are linked with a Video Machine Learning (KRAKEN) algorithm enabling real-time optimization and sequences of the right images to achieve maximum human engagement possible.
How it works
Processing video at large scale and learning requires advanced algorithms designed to ingest real-time data. We have now entered the next phase of data insights going beyond the click and video play. Video opens the door to video consumption habits and using machine learning enables a competitive advantage.
Consumer experience and time on site are paramount when video is the primary revenue source for most broadcasting and over-the-top (OTT) sites today including Netflix, HULU, Comcast X1, and Amazon. Netflix has already put into production their version of updating poster images to improve higher play starts, discovery and completions.
It’s All Math
Images with higher object density have proven to drive higher engagement. The graph demonstrates images with high entropy (explained in this video) generated the most attraction. Knowing what images produce a cognitive response are fundamental for video publishers looking to maximized their video assets.
Top 3 video priorities we’re hearing from customers.
1) Revenue is very important, and showing more video increases revenue (especially during peak hours when inventory is already sold out)
2) More video starts means more user time on site
3) Mobile is becoming very important. Increasing mobile video plays is a top priority.
While this is good news overall, it does present a number of new challenges facing video publishers in 2016. One challenge is managing the consumer access to content on their terms and across many points. Video consumption is increasingly accessed through multiple entry-points throughout the day. These entry points, by their very nature, have context.
Broadcasters and publishers must consider consumer visual consumption as a key insight. These eye balls (neurons firing) are worth billions of dollars but its no longer a game of looking at web logs. More advance image analysis to determine what images work with customers requires insights into consumers video consumption habit. For the digital broadcasters, enabling intelligence where the consumer engages isn’t new. Using deep convolutional neural networks powers the image identification and other priority algorithms. More details are in the main video.
Visual consumer engagement tracking is not something random. Tracking engagement on video has been done for many years but when it comes to “what” within the video there was a major void. InfiniGraph created KRAKEN to enable video deep learning and fill that void by enabling machine learning within the video to optimize what images are shown to achieve the best response rates. Interstellar’s 16X boost is a great example of using KRAKEN to dive higher click to launch for autoplay on desktop and click to play in mobile resulting in higher revenue and greater video efficiency. Think of KRAKEN as the Optimizely for video.
One question that comes up often is: “Is the image rotation the only thing causing people to click play?” The short answer is NO. Rotating arbitrary images is annoying and distracting. KRAKEN finds what the customer likes first and then sequences the images based on measurable events. The right set of images is everything. Once you have the right images you can then find the right sequence and this combination makes all the difference in maximizing play rates. Not using the best visuals will cause higher abandonment rates.
Further advances in deep learning are opening the doors to continuous learning and self improving systems. One are we’re very excited about is visual prediction and recommendation of video. We see a great future of mapping human collective cognitive response to visuals that stimulate and created excitement. Melting the human mind to video intelligence is the next phase for publishers to deliver a better consumer experience.
Chase McMichael, NAB VIDEO Intro – Top Video Platforms and Video Machine Learning made a big splash at NAB 2016.
The event was all about digital video, video production, VR, drones and every other technology you could imagine. Think of NAB as the as the CEO of digital and video broadcasting. Everywhere you looked there was drone technology, robotics and even a full area dedicated to VR. The future of video publishing is bright for sure as new technology simplifies quality capture and distribution. We took the time to connect with some of our video platform partners at NAB. Our one-on-one interviews were with Ooyala, Brightcove, and Kaltura. Each video platform provided a comprehensive walkthrough of their latest development and demos. What stood out the most was the big push in Over The Top (OTT) supporting broadcasters. OTT was a big theme for many video platforms, and all show amazing on-demand video technology. Everyone has seen Netflix and Hulu interfaces and are now becoming serious about OTT. Visuals are everything in OTT interfaces and using the power of intelligence is a key differentiation. Netflix identifies this fact in “Selecting the best artwork for videos through A/B testing”
The consumer has gone mobile in a big way, and digital video is taking on TV. Consumers want access to on-demand video wherever they are and on their terms. User experience was also a big draw, too. There is no question that lines have been drawn with rumblings of opening up the Set Top Box and unbundling the TV. Apple TV and Roku started to look like a yesteryear technology compared with the OTT interfaces and mobile native app interfaces being demoed. Brightcove released an OTT Flow and a very exciting interface for a video library and we got a first-hand view of a super slick mobile interface to digital video consumption. Kaltura also showed off what they did for Vodafone. The video platforms seem well positioned to service a TV Everywhere strategy and feed into the Apple TV and Roku devices.
Another part of the demonstrations on each platform that we experienced was 360 video support. Each player had mouse controls whereas Ooyala demonstrated split screen view supporting Google Cardboard. There is an exciting future in VR content and all are waiting to see what’s going to come out from a content perspective. Beyond linear video, immersive storytelling has a great future and we hope that technology doesn’t encumber the adoption and create friction for the experience. The speed of video player loading, streaming efficiency and low buffer rates have always been major competitive advantages when video publishers evaluate platforms.
A big topic was the relatively new Apple standard HLSjs streaming protocol. DASH by Microsoft was also discussed at various booths. All players support HTML5 with a focus on migrating customers away from the old Adobe Flash technology. Every platform demonstrated to use of HLSjs/HTML5. Kaltura shows a real-time side-by-side with an impressive HTML5 player load speed of 50% improvement. Improving load time and streaming will continue to benefit the mobile web and autoplay world. Video is everywhere and customers are demanding more of it. All video publishing platforms had very well organized video management and publishing capabilities. The big takeaways are that the platforms are focused on simplification in publishing and handling a large volume of video with greater intelligence built-in. Obviously, this is important when serving video and creating a better video viewing experience. Here are the top 4 most mentioned attributions for all the platforms.
Availability - percentage of times video playback starts successfully
Start Up Time - time between the play button click and playback start
Rebuffers - number of times and the duration of interruptions due to re-buffering
Bitrate - average bits per second of video playback. The higher the bitrate, the better the experience
All of our conversation centered around using intelligence within thumbnail selection and the process of integration. KRAKEN video machine learning has a bright future with the onslaught of OTT platforms offering more video carousel and indexes as part of the central interface for video discovery. Next up is video prediction (recommendation) and using data to make smarter decisions on what to watch next. There are some very positive results coming from companies like Iris.tv and JW Player. Look for our next post coming from Stream Media East. Catch more on our last podcast here “Thumbnails are part of a Video Marketing Strategy”
VIDEO – Better User Experience, Time on Site and Converting Readers into Viewers.
Video Optimization With Machine Learning is now a reality and publishers are intelligently making the most out of their O&O digital assets. The digital video industry is undergoing a transformation and machine learning is advancing the video user experience. Mobile, combined with video, is truly the definitive on-demand platform making it the fastest growing sector in digital content distribution.
Video machine learning is a new field. The ability to crowd source massive human interactions on video content has created a new data-set. We’re tapping into a small part of the human collective conscious for the first time. Publishers and media broadcasters are now going beyond the video view, clicks, and completions to actually obtaining introspection into video objects, orientations and types of movements that induce positive cognitive response. This human cognitive response is the ultimate in measurement of relevance where humans are interacting with video in a much more profound way. In this article, we will dive deep into the four drivers of video machine learning.
Video by its nature is linear, however, there are several companies working to personalize the video experience as well as make it live. We’re now in an age where the peak of hype on Virtual Reality / Augmented Reality will provide the most immersive experience. All of these forms of video have two things in common: moving sights and sound. Humans by nature prefer video because this is how we see the world around us. The bulk of video consumed globally is mostly designed around a liner body of work that tells a story. The fact that the video is just a series of images connected together is not something people think much about. In the days of film, seeing a real film strip from a movie reel made it obvious that each frame was in fact a still image. Now fast forward, digital video has frames but those frames are made up of 1’s and 0’s. “Digital” opens the door to advance mathematics and image / object recognition technologies to process these images into more meaning than just a static picture.
It’s hard to believe how important images really are. For videos placed “above the fold,” you have to wonder why so many videos have such a low play rate to begin with (Video Start CTR). Consumers process objects in images within 13 milliseconds (0.013 seconds). That’s FAST! Capturing cognitive attention has to be achieved extremely fast for a human to commit to watching a video and the first image is important, but not everything. More than one image is sometimes required to assure a positive cognitive response. The reality is people are just flat out dismissive and some decide not to play the video. This is evident when you have a 10% CTR, which means 90% of your audience OPTED OUT OF PLAYING THE VIDEO. What happened? The facts are the first image may have been great but didn’t create a full mental picture of what was possible in the linear body of work. The reality is you’re not going to get 100% play rates, however, providing greater cognitive stimulation that builds relevance will drive greater reasons to commit time to watching a linear form of video.
Machine Learning and Algorithms
In the last 4 years, machine learning / artificial intelligence has exploded with new algorithms and advanced computing power has greatly reduced the cost of complex computations. Machine learning is transforming the way information is being interpreted and used to gain actionable insights. With the recent open sourcing of TensorFlow from Google and advances in Torch from Facebook, these machine learning platforms have truly disrupted the entire artificial intelligence industry.
Feature extraction and classification is key to learning what’s in the image that is achieving positive response.
Major hardware providers, such as NVIDIA, have ushered massive advancements in the machine learning and AI fields that would have otherwise been out of reach. The democratization of machine learning is now opening the doors to many small teams to propel the product development around meaningful algorithmic approaches.
The unique properties of digital video specifically in a consumer’s mobile feed, delivered from a video publishing site, creates a perfect window into how consumers snack on content. If you want to see hyper snacking, ride a train into a city or watch kids on their smartphones. Digital content consumption has never been so interactive than now. All digital publishers and broadcasters have to ask themselves this question, “How is my content going to get traction with this type of behavior?” If your audience is Snapchatters, YouTubers, or Instagramers you’re going to have to provide more value in your content V I S U A L Y or you will lose them in a split second.
Graphs – Video Views (Mobile-KMView / Desktop-KDView) vs. Minutes in a day – 1440 min = 24 hrs. Mobile is dominating the weekend where as work week, during commute and after work, skyrockets in usage. Is your video content adapting to this behavior?
Video Publishing Conundrum
A big conundrum is why people are not playing videos. This required further investigation. We found that the lead image (i.e. the old school “thumbnail”, or “poster image”) had a huge impact on introducing a cognitive response. In the mobile world, video is still a consumer driven response and we hope this will stay a click to play world. We believe consumer choice and control will always win the day. For video publishers, under the revenue gun, consumers will quickly tire of native ad content tricks, in-stream video (auto play), and the bludgeoning and force feeding of video on the desktop. No wonder ad-blocking is at an all time high! There is a whole industry cropping up around blocking ads and it’s an all out war. The sad part is the consumer is stuck in the middle.
Many publishers are using desktop video auto-play to reduce friction, however the FRONT of the page, video carousel, or gallery is a click to launch environment making the images on the published page even more important. Those Fronts are the main traffic driver over possible social share amplification. As for mobile video, it’s still a click to play world for a majority of broadcasters and publishers. Video is the highest consumer engaging vehicle at their disposal and it is why so many publishers are forcing themselves to create more video content. Publishing more video oriented content is great, however, the lack of knowledge of what consumers emotionally respond to has been a major gap. A post and pray or post and measure later system is currently prevalent throughout the publishing industry.
Video Quality matters
Creating a better consumer experience is everything if you want your content to be consumed in the days where auto-play is rampant and force fed content is inducing engagement. More brands demand measured engagement. Video engagement quality is measured by starts, length of time on video, and physical actions taken. Capturing human attention is very hard due to many distractions, especially on a mobile device. We’re in a phase where the majority of connected humans are now digital natives in this digital deluge. ADD is at an all time high (link). With < .25sec to get the consumer to engage before they have formulated the video story line in their mind is a hard task. A quick peak on the video thumbnail fast read of a headline and glance of some keywords could be standing between you and a revenue generating video play. People are pressed with their time and unwillingness to commit to a video play unless it induces a real cognitive response. Translating readers into video viewers is important and keeping them is even more important.
Mobile Video and Machine Learning
Mobile is becoming the prevalent method of on demand video access. This combination of video and mobile is an explosive pair and most likely the most powerful marketing conduit ever created. Here we have investigated how machine learning algorithms on images can provide a real-time level of insight and decision support to catch the consumer’s attention and achieve higher video yield otherwise lost. The big challenge with video is it created in a linear format and then loaded in a CMS put up for publishing and pray it gets traction. Promotion helps and placement matters, however, there is really nothing a publisher can do to adjust the video content once out. Enter video intelligence. The ability to measure in real-time video engagement is a game changer. Enabling intelligence within video seems intuitive, however, the complexity of encoding and decoding video has great a sufficient barrier of entry that this area of video intelligence has been otherwise untapped.
How and Why KRAKEN Works
Here we dive deep into consumers looking to interact with certain visual objects to create a positive response before a video is played. InfiniGraph invented a technology called KRAKEN that actually shows a series of images, but the series of images we call “image rotation” is not really new. What’s new is the actual selection and choice of those images using machine learning algorithms allowing us to adjust those images to achieve highest human response possible.
GRAPH – LIFT by KRAKEN mobile (KMLIFT) vs. desktop (KDLIFT) on same day. NOTE the grouping prior and after lunch had overall higher boost by KRAKEN. We attribute this behavior due to less distraction.
As more images are processed by KRAKEN, the system becomes smarter by selecting better lead images driving higher video efficiency. This entire process of choosing which order to sequence the best is another part of the learning mechanism. Image sequencing is derived from a collection of 1 to 4 images. These images are being selected based upon KRAKEN ranking linked with human actions. Those visual achieved the highest degree of engagement will receive a higher KRAKEN rank. The actual sequence also creates a visual story maximizing the limited time to capture a consumer’s attention.
KRAKEN in Action
KRAKEN determines the best possible thumbnails for any video using machine learning and audience testing. Once it finds the top 1-4 images, it rotates through them to further increase click-to-play rates. It also A/B tests against the original thumbnail to continually show its benefits. Here are 2 real examples:
KRAKEN Thumbnails with 273% lift below. What makes a good video lead image unique? We’re asked this question all the time. Why would someone click on one image versus another? These questions are extremely context and content dependent. The actual number of visual objects in the frame has a great deal to do with humans determining relevance, inducing intrigue or desire. The human brain sees shapes first in black / white. Color is a third response however red has it’s on visual alerting system. The human brain can process vast sums of visual information fast. The digital real estate such as mobile or desktop can be vastly different. A great example is what we call information packaging where a smaller image size on a mobile phone may only support 2 or 3 visual objects that a human would quickly recognize and induce a positive response whereas the desktop could support up to 5. Remember one size doesn’t fit all especially in mobile video. KRAKEN Thumbnails with 217% lift to the left. Trick your brain: black and white photo turns to colour! – Colour: The Spectrum of Science – BBC
4 drivers of video machine learning
Who benefits from video machine learning? The consumer benefits the most because of increased consumer experience due to creating a more visually accurate compilation of what the video content’s best moments are. It’s critical that people get a sense of the video so they commit to playing the video and sticking around. Obviously the publisher or broadcaster benefits financially due to more video consumption yielding to higher social shares.
Color depth: remember bright colors don’t always yield the best results. Visuals that depict action or motion elicit a higher response. Depending on the background can greatly alter color perception, hence images with a complementary background can enable a human eye to pick up colors that will best represent what they are looking at creating greater intrigue.
Image sequencing: Sequencing the wrong or bad images together doesn’t help but turns off. The right collection is everything and could be 1 to 4. Know when to alter or shift is key to obtaining the highest degree of engagement. The goal is to create a visual story that will increase consumer experience.
Visual processing: The human brain can process vast amounts of visual information fast. The digital real estate such as mobile or desktop can differ. A great example is what we call “information packaging” where a smaller image size on mobile phone screen may only support 2 or 3 visual objects in view. Humans can quickly recognize and induce a positive response whereas the desktop could support up to 5. One size doesn’t fit all especially in mobile video.
Object classification: Understanding what’s in an image and classify those images provides a library to top performing images. These images with the right classification create a unique data set for use in recommendation to prediction. Knowing what’s in the image as just as important as knowing it was acted on.
The first impression is everything or maybe the second or third if you are showing a sequence of images. For publishers and digital broadcasters adapting to their customers content consumption preferences and being on platforms that will yield the most will be an ongoing saga. Nurturing your audience and perpetuating their viewing experience will be key as more and more consumer move to mobile. KRAKEN is just the start of using machine learning to create a better user experience in mobile video. We see video intelligence expanding into prediction to VR / AR in the not too distantd future. As this unique dataset expands we look forward to getting your feedback on other exciting use cases and finding ways to increase the overall yield on your existing video assets.
Tell us what you think and where you see mobile video going in your business.