Video discovery is one of the best ways to increase video lifetime value. Learning what video content is relevant increases greater time on site.
All video publishers are looking to increase their video’s lifetime value. Creating video can be expensive and the shelf life of most video is short. Maximizing those videos assets and their lifetime value is a top priority. With the advent of new technologies such as Video Machine Learning, publishers can now increase their video’s lifetime value by intelligently generating more time on site. Identifying the best image to lead with (thumbnail) and recommending relevant videos drive higher lifetime value through user experience and discovery.
This combination of visual identification and recommendation is like the Reese’s of video. By linking technologies like artificial intelligence and real-time video analytics, we’re changing the video game through automated actionable intelligence.
Ryan Shane, our VP of Sales, describes the advantages of knowing what visual (video thumbnail) (context) produces the most engagement and what video business models benefit the most from video machine learning.
Hear from our CEO, Chase McMichael, who talks about the advanced use of machine learning and deep learning to improve video take rates by finding and recommending the right images consumers engage with the most.
Here are two examples of how video machine learning increases revenue on your existing video assets.
Yield Example #1: Pre-roll
If you run pre-roll on your video content, you likely fill it with a combination of direct sales and an RTB network. For this example, assume you have a 10% CTR, which translates to 1 million video plays each day. That means that you are showing 1,000,000 pre-roll ads each day. Now assume that you run KRAKEN on your videos, and engagement jumps to by 30% to a 12% CTR. That means that you will be showing 1,300,000 pre-roll ads each day. KRAKEN has effectively added an additional 300,000 pre-roll spots for you to fill! This is an example of increasing the video value on your existing consumers.
Yield Example #2: Premium Content
For our second example, assume you monetize with premium content. You have an advertising client who has given you a budget of $100,000 and expects their video to be shown 5 million times. With your current play rates, you determine it will take four days to achieve that KPI. Instead, you run KRAKEN on their premium content, and engagement jumps 2X. You will hit your client’s KPI in only two days. You now have freed up two days of premium content inventory that you can sell to another client! Maximizing your existing video consumers and increase CTR reduces the need to sell off network.
Below is a Side by Side example of Guardians Of the Galaxy Default Thumbnail vs. KRAKEN Rotation powered by Deep Learning. Boosting click rates generates more primary views. While leveraging known images that induce response is logical to insert into a video recommendation (Reese’s). The two together now drive primary and secondary video views.
As you can see from both examples, using KRAKEN actually increases lifetime value as well as advertising yield from your video assets. Displaying like base content sorted by Deep Learning and video analytics by category delivers greater relevance. Organizing video into context is key to increasing discovery. Harnessing artificial intelligence with image selection and recommendation brings together the best of both digital video intelligent worlds.
Bite into a Reese’s and see how you can increase your video lifetime value. Request a demo and we’ll show you.
Video viewability is a top priority for video publishers who are under pressure to verify that their audience is actually watching advertisers’ content. In a previous post How Deep Learning Video Sequence Drives Profits, we demonstrated why image sequences draw consumer attention. Advanced technologies such as Deep Learning are increasing video Viewability through identifying and learning which images make people stick to content. This content intelligence is the foundation for advancing video machine learning and improving overall video performance. In this post, we will explore some challenges in viewability and how deep learning is boosting video watch rates.
Side by Side Default Thumbnail vs. KRAKEN Rotation powered by Deep Learning
In the two examples above, which one do you think would increase viability? The video on the right has images selected by deep learning and automatically adjusted image rotation. It delivered a whopping 120% more plays than the static image on the left, which was chosen by an editor. Higher viewability is validated by the fact that the same video with the same placement at the same time achieved a greater audience take rate with images chosen by machine learning.
This boost in video performance was powered by KRAKEN, a video machine learning technology. KRAKEN is designed to understand what visuals (contained in the video) consumers are more likely to engage with based on learning. More views equals more revenue.
A/B testing is required when looking to verify optimization. For decades, video players have been void of any intelligence. They have been a ‘dumb’ interface for displaying a video stream to consumers. The fact was that without intelligence, the video player was just bit-pipe. Very basic measurements were taken, such as Video Starts, Completes, Views as well as some advanced metrics such as how long a user watched, etc. A new thinking was required to be more responsive to the audience and take advantage of what images people would reacted on. Increasing reaction increase viewability.
So how does KRAKEN do its A/B Testing? The goal was to create the most accurate measurement foundation possible to test for visuals consumers are more likely to engage with and measure the crowds response to one image vs another. KRAKEN implemented 90/10 splitting of traffic whereby 10% of traffic shows the default thumbnail image (the control) and 90% of traffic to the KRAKEN selected images. It is very simple to see why testing video performance through A/B testing is possible. Now that HTML5 is the standard and Adobe Flash has been deprecated, the ability to run A/B testing within video players has been furthered simplified.
Making sure a video is “in view” is one thing, but the experience has a great deal to do with legitimate viewability. A bigger question is: Will a person engage and really want to watch? People have a choice to watch content. It’s not that complex. If the content is bad, why would anyone want to watch it? If the site is known for identifying or creating great content then that box can be checked off.
Understanding what visual(s) makes people tick and get engaged is a key factor to increase viewability. Consumers have affinities to visuals and those affinities are core to them taking action. Tap into the right images and you will enhance the first impression and consumer experience.
What is Visual Cognitive Loading?
How the brain recognizes objects – MIT Neuroscientists find evidence that the brain’s inferotemporal cortex can identify objects. Visual induce human response using the right visuals increase attraction and attention. Photo: MIT
A single image is very hard to convey a video story with a single image. Yes, an image is worth a 1000 words but some people need more information to get excited. Video is a linear body of work that tells a story. Humans are motivated by emotion, intrigue and actions. Senses of sight and motion create a visual story that can be a turn on or turn off. Finding the right turn on images that tells a story is golden. Identifying what will draw them into a video is priceless.
The human visual cortex is connected to your eyes via the optic nerve; it’s like a super computer. Your ability to detect faces and objects at lightning speed is also how fast someone can get turned off to your video. Digital expectations are high in the age of digital natives. For this very reason, the right visual impression is required to get a video to stick, i.e. “sticky videos”. If you’re video isn’t sticky you will loose massive numbers of viewers and be effectively ignored just like “Banner Blindness”. The more visual information shown to a person the higher the probability of inducing an emotional response. Cognitive loading thereby gives them more information about what’s in the video. If you’re going to increase viewability you have to increase cognitive loading. It’s all about whether the content is worthy of their time.
Why Deep Learning
Deep Learning layers of object recognition. Understanding whats in the images is as valuable as the meta data and title. Photo: VICOS
The ability to identify what images and why are a big deal over the previous method of “plug a pray”. Systems now can recognize what’s in the image and linking that information back in real time with consumer behavior creates a very powerful leaning environment for video. Its now possible to create a hierarchical shape vocabulary for multi-class object representation further expanding a meaningful data layer.
Quality video and actuate measurement are paramount when optimizing video. Many ask, Why are KRAKEN images better? The reality is they are because using deep learning to select the right starting images increases the probability of nailing the right images that consumers will want to engage with. Over time, the system gets smarter and optimizes faster. A real time active feedback mechanism is created continuously adjusting and sending information back into the algorithm to improve over time.
Because KRAKEN consists of consumer curated actions, proactive video image selection is made possible. We make the assertion that optimized thumbnails result in more engaged video watchers as proven by the increase in video plays. KRAKEN drives viewability and enable publishers move premium O&O rates as a result.
Viewability or go home
After the Facebook blunder or “miss calculating video plays” and other measurement stumbles major brands have taken notice …. if you want to believe this was just a “mistake.” A 3 second play in AUTO PLAY isn’t a play in a feed environment when audio is off according to Rob Norman of Group M. The big challenge is there really isn’t a clear standard, just advice on handling viewability from the IAB. However, the big media buyers like Group M are demanding more and requiring half the video plays have a click to play to meet their viewability standard. This is wake up call for video publishers to get very serious about viewability and advertiser to create better content. All agree viewability is a top KPI when judging a campaigns effectiveness. 2017 is going to be an exciting year to watch how advertisers and publishers work together to increase video viewability. See The state of video Ad viewability in 5 charts as the conversation heats up.
Beyond the deep learning hype, digital video sequence (clipping) powered by machine learning is driving higher profits. Video publishers use various images (thumbnails – poster images) to attract readers to watch more video. These “Thumbnail Images” are critical, and the visual information has a great impact on video performance. The lead visual in many cases is more important than the headline. More view equals more revenue it’s that simple. Deep learning is having significant impact in video visual search to video optimization. Here we explore video sequencing and the power of deep learning.
Having great content is required, but if your audience isn’t watching the video then you’re losing money. Understanding what images resonate with your audience and produce higher watch rates is exactly what KRAKEN does. That’s right: show the right image, sequence or clip to your consumers and you’ll increase the number of videos played. This is proven and measurable behavior as outlined in our case studies. An image is really worth a thousand words.
Below are live examples of KRAKEN in action. Each form is powered by a machine learning selection process. Below we describe the use cases for apex image, image rotation and animation clip.
KRAKEN “clips” the video at the point of APEX. Sequences are put together creating a full animation of a scene(s). Boost rates are equal to those from image rotation and can be much higher depending on the content type.
Consumer created clipping points within video
Creates more visual information vs. a static image
Highlights action scenes
Great for mobile and OTT preview
More than one on page can cause distraction
Overuse can turn off consumers
Too many on page can slow page loading performance (due to size)
Mobile LTE is slow and can lead to choppy images instead of a smooth video
Image rotation allows for a more complete visual story to be told when compared to a static image. This results in consumers having a better idea of the content in the video. KRAKEN determines the top four most engaging images and then cycles through them. We are seeing mobile video boost rates above 50%.
Smooth visual transition
Consumer selected top images
Creates a visual story vs. one image to engage more consumers
Ideal for mobile and OTT
Less bandwidth intensive (Mobile LTE)
Similar to animated clips, publishers should limit multiple placements on a single page
KRAKEN always finds the best lead image for any placement. This apex image alone creates high levels of play rates, especially in a click-to-launch placement. Average boost rates are between 20% to 30%.
Audience-chosen top image for each placement
Can be placed everywhere (including social media)
Ideal for desktop
Good with mobile and OTT
Static thumbnails have limited visual information
Once the apex is found, the image will never be substituted
Below are live KRAKEN animation clip examples. All three animations start with the audience choosing the apex image. Then, KRAKEN identifies (via deep learning) clipping points and uses machine learning to adjust to optimal clipping sequence.
HitFix Video Deep Learning Video Clipping to Action, Machine Learning adjust in real time
Video players have transitioned to HTML5 and mobile consumption of video is the fastest growing medium. Broadcasters that embrace advanced technologies that adapt to the consumer preference will achieve higher returns, and at the same time create a better consumer experience. The value proposition is simple: If you boost your video performance by 30% (for a video publisher doing 30 million video plays per month), KRAKEN will drive an additional $2.2 million in revenue (See KRAKEN revenue calculator). This happens with existing video inventory and without additional head count. KRAKEN creates a win-win scenario and will improve its performance as more insights are used to bring prediction and recommendation to consumers, thereby increasing the video process.
The elusive video search whereby you can search video image context is now possible with advanced technologies like deep learning. It’s very exciting to see video SEO becoming a reality thanks to amazing algorithms and massive computing power. We truly can say a picture is worth 1,000 words!
Content creators have fantasized about doing video search. For many years,, major engineering challenges were a road block to comprehending video images directly.
Video visual search opens up a whole new field where video is the new HTML. And, the new visual SEO is what’s in the image. We’re in exciting times with new companies dedicated to video visual search. In a previous post, Video Machine Learning: A Content Marketing Revolution, we demonstrated image analysis within video to improve video performance. After one year, we’re now embarking on video visual search via deep learning.
Behind the Deep Curtain
Video clipping powered by KRAKEN video deep learning. Identify relevance within video images to drive higher plays
Many research groups have collaborated to push the field of deep learning forward. Using an advanced image labeling repository like ImageNet has elevated the deep learning field. The ability to take video and identify what’s in the video frames and apply description opens up huge visual keywords.
What is deep learning? It is probably the biggest buzzword around along with AI (Artificial Intelligence). Deep Learning came from advanced math on large data set processing, similar to the way the human brain works. The human brain is made of up tons of neurons and we have long attempted to mimic how these neurons work. Previously, only humans and a few other animals had the ability to do what machines can now do. This is a game changer.
The evolution of what’s call a Convolution Neural Network, or CNN aka deep learning, was created from thought leaders like Yann LeCrun (Facebook), Geoffrey Hinton (Google), Andrew Ng (Baidu) and Li Fei-Fei (Director of the Stanford AI Lab and creator of ImageNet). Now the field has exploded and all major companies have open sourced their deep learning platforms for running Convolution Neural Networks in various forms. In an interview with New York Times, Fei-Fei said “I consider the pixel data in images and video to be the dark matter of the Internet. We are now starting to illuminate it.” That was back in 2014. For more on the history of machine learning, see the post by Roger Parloff at Fortune.
KRAKEN video deep learning Images for high video engagement
Image reduction is key to video deep learning. Image analysis is achieved through big number crunching. Photo: Chase McMichael created image
Think about this: video is a collection of images linked together and played back at 30 frames-a-second. Analyzing massive number of frames is a major challenge
As humans, we see video all the time and our brains are processing those images in real-time. Getting a machine to do this very task at scale is not trivial. Machines processing images is an amazing feat and doing this task in real-time video is even harder. You must decipher shapes, symbols, objects, and meaning. For robotics and self-driving cars this is the holy grail.
To create a video image classification system required a slightly different approach. You must handle the enormous number of single frames in a video file first to understand what’s in the images.
On September 28th, 2016, the seven-member Google research team announced YouTube-8M leveraging state-of-the-art deep learning models. YouTube-8M, consists of 8 million YouTube videos, equivalent to 500K hours of video, all labeled and there are 4800 Knowledge Graph entities. This is a big deal for the video deep learning space. YouTube-8M’s scale required some pre-processing on images to pull frame level features first. The team used Inception-V3 image annotation model trained on ImageNet. What’s makes this such a great thing is we now have access to a very large video labeling system and Google did massive heavy lifting to create 8M.
Google 8M Stats Video Visual Search
Top level numbers of YouTube 8M. Photo created by Chase McMichael.
The secret to handling all this big data was reducing the number of frames to be processed. The key is extracting frame level features from 1 frame-per-second creating a manageable data set. This resulted in 1.9 billion video frames enabling a reasonable handling of data. With this size you can train a TensorFlow model on a single Graphic Process Unit (GPU) in 1 day! In comparison, the 8M would have required a petabyte of video storage and 24 CPUs of computing power for a year. It’s easy to see why pre-processing was required to do video image analysis and frame segmenting created a manageable data set.
Google has beautifully created two big parts of the video deep learning trifecta. First, they opened up a video based labeling system (YouTube8m). This will give all in the industry a leg up in analyzing video. Without a labeling system like ImageNet, you would have to do the insane visual analysis on your own. Second, Google opened Tensoflow, their deep learning platform, creating a perfect storm for video deep learning to take off. This is why some call it an artificial intelligence renaissance. Third, we have access to a big data pipeline. For Google this is easy, as they have YouTube. Companies that are creating large amounts of video or user-generated videos will greatly benefit.
The deep learning code and hardware are becoming democratized, and its all about the visual pipeline. Having access to a robust data pipeline is the differentiation. Companies that have the data pipeline will create a competitive advantage from this trifecta.
Follow Google’s lead with TensorFlow, Facebook launched it’s own open AI platform FAIR, followed by Baidu. What does this all mean? The visual information disruption is in full motion. We’re in a unique time where machines can see and think. This is the next wave of computing. Video SEO powered by deep learning is on track to be what keywords are to HTML.
Visual search is driving opportunity and lowering technology costs to propel innovation. Video discovery is not bound by what’s in a video description (meta layer). The use cases around deep learning include medical image processing to self-flying drones, and that is just a start.
Deep learning will have a profound impact our daily lives in ways we never imagined.
Both Instagram and Snapchat are using sticker overlays based on facial recognition and Google Photo sort your photos better than any app out there. Now we’re seeing purchases linked with object recognition at Houzz leveraging product identification powered by deep learning. The future is bright for deep learning and content creation. Very soon we’ll be seeing artificial intelligence producing and editing video.
How do you see video visual search benefiting you, and what exciting use cases can you imagine?
Feature Image is YouTube 8M web interface screen shot taken by Chase McMichael on September 30th .
Deep Learning, image and object recognition are core elements to intelligent video visual analysis. Understanding context within and classification creates a strong use case for video deep learning. Digital video is exploding, however there are few leveraging the wealth of data and how to harness visual analysis. A true reinforced deep learning system using collective human intelligence linked with neural networks provides the foundation to a new level of video insights. We’re just at the beginnings of intelligent video and using this knowledge to improve video performance.
Chase McMichael talk at ACM on Hacking Video Via Deep Learning Photo: Sophia Viklund
Deep Learning Methods Within Video An End Game Application – We’ll explore the use cases of using deep learning to drive higher video views. The coming Valhalla of video Deep Learning is being realized in visual object recognition and image classification within video. Mobile video has and continues to transform the way video is being distributed and consumed.
We’re witnessing the largest digital land grab in video history. Mobile video advertising is the fastest growing segment projected to account for $25 billion worth of ad spend by 2021. Deep Learning and artificial intelligence are also growing within the very same companies who are jockeying for your cognitive attention. This confluence of video and deep learning has created a new standard in higher performing video content diving greater engagement, views, and revenue. In this post we’ll dive deep into how video intelligence is changing the mobile video game. Many studies showing tablet and smartphone viewing accounted for nearly 40 minutes of daily viewing in 2015 with mobile video continuing to dominate in 2016. Moreover, digital video is set to out pace TV for the first time and social / Instagram/Snapchat video is experiencing explosive growth.
The Interstellar trailer is a real example of KRAKEN in action and achieved a 16X improvement in video starts. Real-Time A/B testing between the poster image (thumbnail) and selected images pulled from visual training set provide the simultaneous measurement of what image induce engagement. All data and actions are linked with a Video Machine Learning (KRAKEN) algorithm enabling real-time optimization and sequences of the right images to achieve maximum human engagement possible.
How it works
Processing video at large scale and learning requires advanced algorithms designed to ingest real-time data. We have now entered the next phase of data insights going beyond the click and video play. Video opens the door to video consumption habits and using machine learning enables a competitive advantage.
Consumer experience and time on site are paramount when video is the primary revenue source for most broadcasting and over-the-top (OTT) sites today including Netflix, HULU, Comcast X1, and Amazon. Netflix has already put into production their version of updating poster images to improve higher play starts, discovery and completions.
It’s All Math
Images with higher object density have proven to drive higher engagement. The graph demonstrates images with high entropy (explained in this video) generated the most attraction. Knowing what images produce a cognitive response are fundamental for video publishers looking to maximized their video assets.
Top 3 video priorities we’re hearing from customers.
1) Revenue is very important, and showing more video increases revenue (especially during peak hours when inventory is already sold out)
2) More video starts means more user time on site
3) Mobile is becoming very important. Increasing mobile video plays is a top priority.
While this is good news overall, it does present a number of new challenges facing video publishers in 2016. One challenge is managing the consumer access to content on their terms and across many points. Video consumption is increasingly accessed through multiple entry-points throughout the day. These entry points, by their very nature, have context.
Broadcasters and publishers must consider consumer visual consumption as a key insight. These eye balls (neurons firing) are worth billions of dollars but its no longer a game of looking at web logs. More advance image analysis to determine what images work with customers requires insights into consumers video consumption habit. For the digital broadcasters, enabling intelligence where the consumer engages isn’t new. Using deep convolutional neural networks powers the image identification and other priority algorithms. More details are in the main video.
Visual consumer engagement tracking is not something random. Tracking engagement on video has been done for many years but when it comes to “what” within the video there was a major void. InfiniGraph created KRAKEN to enable video deep learning and fill that void by enabling machine learning within the video to optimize what images are shown to achieve the best response rates. Interstellar’s 16X boost is a great example of using KRAKEN to dive higher click to launch for autoplay on desktop and click to play in mobile resulting in higher revenue and greater video efficiency. Think of KRAKEN as the Optimizely for video.
One question that comes up often is: “Is the image rotation the only thing causing people to click play?” The short answer is NO. Rotating arbitrary images is annoying and distracting. KRAKEN finds what the customer likes first and then sequences the images based on measurable events. The right set of images is everything. Once you have the right images you can then find the right sequence and this combination makes all the difference in maximizing play rates. Not using the best visuals will cause higher abandonment rates.
Further advances in deep learning are opening the doors to continuous learning and self improving systems. One are we’re very excited about is visual prediction and recommendation of video. We see a great future of mapping human collective cognitive response to visuals that stimulate and created excitement. Melting the human mind to video intelligence is the next phase for publishers to deliver a better consumer experience.