VIDEO – Better User Experience, Time on Site and Converting Readers into Viewers.
Video Optimization With Machine Learning is now a reality and publishers are intelligently making the most out of their O&O digital assets. The digital video industry is undergoing a transformation and machine learning is advancing the video user experience. Mobile, combined with video, is truly the definitive on-demand platform making it the fastest growing sector in digital content distribution.
Video machine learning is a new field. The ability to crowd source massive human interactions on video content has created a new data-set. We’re tapping into a small part of the human collective conscious for the first time. Publishers and media broadcasters are now going beyond the video view, clicks, and completions to actually obtaining introspection into video objects, orientations and types of movements that induce positive cognitive response. This human cognitive response is the ultimate in measurement of relevance where humans are interacting with video in a much more profound way. In this article, we will dive deep into the four drivers of video machine learning.
Video by its nature is linear, however, there are several companies working to personalize the video experience as well as make it live. We’re now in an age where the peak of hype on Virtual Reality / Augmented Reality will provide the most immersive experience. All of these forms of video have two things in common: moving sights and sound. Humans by nature prefer video because this is how we see the world around us. The bulk of video consumed globally is mostly designed around a liner body of work that tells a story. The fact that the video is just a series of images connected together is not something people think much about. In the days of film, seeing a real film strip from a movie reel made it obvious that each frame was in fact a still image. Now fast forward, digital video has frames but those frames are made up of 1’s and 0’s. “Digital” opens the door to advance mathematics and image / object recognition technologies to process these images into more meaning than just a static picture.
Images are Important (Critical)
It’s hard to believe how important images really are. For videos placed “above the fold,” you have to wonder why so many videos have such a low play rate to begin with (Video Start CTR). Consumers process objects in images within 13 milliseconds (0.013 seconds). That’s FAST! Capturing cognitive attention has to be achieved extremely fast for a human to commit to watching a video and the first image is important, but not everything. More than one image is sometimes required to assure a positive cognitive response. The reality is people are just flat out dismissive and some decide not to play the video. This is evident when you have a 10% CTR, which means 90% of your audience OPTED OUT OF PLAYING THE VIDEO. What happened? The facts are the first image may have been great but didn’t create a full mental picture of what was possible in the linear body of work. The reality is you’re not going to get 100% play rates, however, providing greater cognitive stimulation that builds relevance will drive greater reasons to commit time to watching a linear form of video.
Machine Learning and Algorithms
In the last 4 years, machine learning / artificial intelligence has exploded with new algorithms and advanced computing power has greatly reduced the cost of complex computations. Machine learning is transforming the way information is being interpreted and used to gain actionable insights. With the recent open sourcing of TensorFlow from Google and advances in Torch from Facebook, these machine learning platforms have truly disrupted the entire artificial intelligence industry.
Major hardware providers, such as NVIDIA, have ushered massive advancements in the machine learning and AI fields that would have otherwise been out of reach. The democratization of machine learning is now opening the doors to many small teams to propel the product development around meaningful algorithmic approaches.
The unique properties of digital video specifically in a consumer’s mobile feed, delivered from a video publishing site, creates a perfect window into how consumers snack on content. If you want to see hyper snacking, ride a train into a city or watch kids on their smartphones. Digital content consumption has never been so interactive than now. All digital publishers and broadcasters have to ask themselves this question, “How is my content going to get traction with this type of behavior?” If your audience is Snapchatters, YouTubers, or Instagramers you’re going to have to provide more value in your content V I S U A L Y or you will lose them in a split second.
Video Publishing Conundrum
A big conundrum is why people are not playing videos. This required further investigation. We found that the lead image (i.e. the old school “thumbnail”, or “poster image”) had a huge impact on introducing a cognitive response. In the mobile world, video is still a consumer driven response and we hope this will stay a click to play world. We believe consumer choice and control will always win the day. For video publishers, under the revenue gun, consumers will quickly tire of native ad content tricks, in-stream video (auto play), and the bludgeoning and force feeding of video on the desktop. No wonder ad-blocking is at an all time high! There is a whole industry cropping up around blocking ads and it’s an all out war. The sad part is the consumer is stuck in the middle.
Many publishers are using desktop video auto-play to reduce friction, however the FRONT of the page, video carousel, or gallery is a click to launch environment making the images on the published page even more important. Those Fronts are the main traffic driver over possible social share amplification. As for mobile video, it’s still a click to play world for a majority of broadcasters and publishers. Video is the highest consumer engaging vehicle at their disposal and it is why so many publishers are forcing themselves to create more video content. Publishing more video oriented content is great, however, the lack of knowledge of what consumers emotionally respond to has been a major gap. A post and pray or post and measure later system is currently prevalent throughout the publishing industry.
Video Quality matters
Creating a better consumer experience is everything if you want your content to be consumed in the days where auto-play is rampant and force fed content is inducing engagement. More brands demand measured engagement. Video engagement quality is measured by starts, length of time on video, and physical actions taken. Capturing human attention is very hard due to many distractions, especially on a mobile device. We’re in a phase where the majority of connected humans are now digital natives in this digital deluge. ADD is at an all time high (link). With < .25sec to get the consumer to engage before they have formulated the video story line in their mind is a hard task. A quick peak on the video thumbnail fast read of a headline and glance of some keywords could be standing between you and a revenue generating video play. People are pressed with their time and unwillingness to commit to a video play unless it induces a real cognitive response. Translating readers into video viewers is important and keeping them is even more important.
Mobile Video and Machine Learning
Mobile is becoming the prevalent method of on demand video access. This combination of video and mobile is an explosive pair and most likely the most powerful marketing conduit ever created. Here we have investigated how machine learning algorithms on images can provide a real-time level of insight and decision support to catch the consumer’s attention and achieve higher video yield otherwise lost. The big challenge with video is it created in a linear format and then loaded in a CMS put up for publishing and pray it gets traction. Promotion helps and placement matters, however, there is really nothing a publisher can do to adjust the video content once out. Enter video intelligence. The ability to measure in real-time video engagement is a game changer. Enabling intelligence within video seems intuitive, however, the complexity of encoding and decoding video has great a sufficient barrier of entry that this area of video intelligence has been otherwise untapped.
How and Why KRAKEN Works
Here we dive deep into consumers looking to interact with certain visual objects to create a positive response before a video is played. InfiniGraph invented a technology called KRAKEN that actually shows a series of images, but the series of images we call “image rotation” is not really new. What’s new is the actual selection and choice of those images using machine learning algorithms allowing us to adjust those images to achieve highest human response possible.
As more images are processed by KRAKEN, the system becomes smarter by selecting better lead images driving higher video efficiency. This entire process of choosing which order to sequence the best is another part of the learning mechanism. Image sequencing is derived from a collection of 1 to 4 images. These images are being selected based upon KRAKEN ranking linked with human actions. Those visual achieved the highest degree of engagement will receive a higher KRAKEN rank. The actual sequence also creates a visual story maximizing the limited time to capture a consumer’s attention.
KRAKEN in Action
KRAKEN determines the best possible thumbnails for any video using machine learning and audience testing. Once it finds the top 1-4 images, it rotates through them to further increase click-to-play rates. It also A/B tests against the original thumbnail to continually show its benefits. Here are 2 real examples:
KRAKEN Thumbnails with 273% lift below.
What makes a good video lead image unique? We’re asked this question all the time. Why would someone click on one image versus another? These questions are extremely context and content dependent. The actual number of visual objects in the frame has a great deal to do with humans determining relevance, inducing intrigue or desire. The human brain sees shapes first in black / white. Color is a third response however red has it’s on visual alerting system. The human brain can process vast sums of visual information fast. The digital real estate such as mobile or desktop can be vastly different. A great example is what we call information packaging where a smaller image size on a mobile phone may only support 2 or 3 visual objects that a human would quickly recognize and induce a positive response whereas the desktop could support up to 5. Remember one size doesn’t fit all especially in mobile video. KRAKEN Thumbnails with 217% lift to the left. Trick your brain: black and white photo turns to colour! – Colour: The Spectrum of Science – BBC
4 drivers of video machine learning
Who benefits from video machine learning? The consumer benefits the most because of increased consumer experience due to creating a more visually accurate compilation of what the video content’s best moments are. It’s critical that people get a sense of the video so they commit to playing the video and sticking around. Obviously the publisher or broadcaster benefits financially due to more video consumption yielding to higher social shares.
- Color depth: remember bright colors don’t always yield the best results. Visuals that depict action or motion elicit a higher response. Depending on the background can greatly alter color perception, hence images with a complementary background can enable a human eye to pick up colors that will best represent what they are looking at creating greater intrigue.
- Image sequencing: Sequencing the wrong or bad images together doesn’t help but turns off. The right collection is everything and could be 1 to 4. Know when to alter or shift is key to obtaining the highest degree of engagement. The goal is to create a visual story that will increase consumer experience.
- Visual processing: The human brain can process vast amounts of visual information fast. The digital real estate such as mobile or desktop can differ. A great example is what we call “information packaging” where a smaller image size on mobile phone screen may only support 2 or 3 visual objects in view. Humans can quickly recognize and induce a positive response whereas the desktop could support up to 5. One size doesn’t fit all especially in mobile video.
- Object classification: Understanding what’s in an image and classify those images provides a library to top performing images. These images with the right classification create a unique data set for use in recommendation to prediction. Knowing what’s in the image as just as important as knowing it was acted on.
The first impression is everything or maybe the second or third if you are showing a sequence of images. For publishers and digital broadcasters adapting to their customers content consumption preferences and being on platforms that will yield the most will be an ongoing saga. Nurturing your audience and perpetuating their viewing experience will be key as more and more consumer move to mobile. KRAKEN is just the start of using machine learning to create a better user experience in mobile video. We see video intelligence expanding into prediction to VR / AR in the not too distantd future. As this unique dataset expands we look forward to getting your feedback on other exciting use cases and finding ways to increase the overall yield on your existing video assets.
Tell us what you think and where you see mobile video going in your business.