Paul Burns Chief Data Scientist at InfiniGraph provides his point of view on what he has learned from doing massive video processing and video data analysis to find what images and clips work best with audiences. He spoke at the event Idea to IPO on Machine Learning, Video Deep Learning and Innovations in Big Data. Quick preview of Paul’s insights and approach to machine learning and big data.
Paul Burns Chief Data Scientist InfiniGraph working with start up involved in mobile video intelligence. I’ve had a bit of a varied career although a purely technical I would say started off in auto-sensing that’s 15 years doing research and RF sensor signal and data processing algorithms. I took a bit of a diverted turn in my career a number of years ago got a PhD and bioinformatics some works in the life sciences in genomics and sequencing industry for about three years. At the moment now I have turned again into video so I have range of experience with working with large datasets and learning algorithms and so hopefully I could bring some insights that others would like here.
My own personal experience is one in which I’ve inhabited a space very close to the data source and so when I think about big data I think about opportunities to find and discover patterns that are not apparent to an expert necessarily or they could be automatically found and used for prediction or analysis or health and status of the sensors at levels of effectiveness. There’s a lot of differences in the perception of what big data really is other than there’s the common thread that seems to be a way of thinking about data and I hate the word data. Really data is so non descriptive it’s so generic so that it’s it has almost no meaning at all.
I think of data as just information that’s stockpiled and it could be useful if you knew how to go in and sort through the stockpile of information to find patterns. How to find patterns that persist and can be used for predictive purposes. I think there’s been a generally slow progress over many decades and why this explosion in recent years is primarily because of the breakthroughs in computer vision and advancements in multi layer deep neural networks particularly processing image and video data.
This is something that’s taken places over the last ten years first with the breakthrough the seminal paper that was authored by Geoffrey Hinton in 2006 which demonstrated breakthroughs and deep multi-layer networks neural networks and then with the work that was published towards the ImageNet the competition in 2012 that made the significant advancement in performance over more conventional methods.
I think the major reason why there’s all this excitement is because visual perception is so incredibly powerful. That’s been an area where we’ve really struggled to make computers relate to the world and to understand and process things that are happening around them. There’s this sense that we’re on the cusp of a major revolution and autonomy. You can look at all the autonomous vehicles and all the human power and capital being put into those efforts.
Paul answers question on Privacy: Honestly, I think privacy has been dead for some time the way it should be structured is the way Facebook works I can choose to opt into Facebook and have a lot of details about the gory details of my life exposed to the world and Facebook. But what I get out of that is I’m more closely connected to friends and family so I choose to opt in because I want them to that reward but privacy issues where I don’t have the opt-out choice is most problematic. There was a government program I’m aware of that happened in the Netherlands some years ago. They adopted a pilot program where people could opt out of their having their Hospital care data published in a government database. The purpose of which was to lean and make patterns with health outcomes. That’s a little controversial because you can have public health the public health benefits of having such a database could be enormous and transformational so it’s a very complicated issue. I’m certainly probably not qualified to speak on this topic. I would say it’s (privacy) long since been dead and we kind of have to do a postmortem.
We’re very fortunate that so much very high quality research has been published, so many very excellent data sets and model parameters are available free download. If starting out we were working on just very generic replication of open systems. Object recognition can be done with fairly high quality free open source code in a week. That was kind of our starting point to be able to advertise mobile video by selecting thumbnails that are somehow more enticing for people to click on than the default ones the content owners provide.
As it turned out this idea our co-founders came up with (KRAKEN VIDEO MACHINE LEARNING how to increase video lifetime value) about a couple years ago. It’s amazing how bad humans are at predicting what other people want to click on it’s amazing. We are as far as we know the only startup that’s solely focused on this core idea which sounds like a small business but with all the mobile video volume an advertising revenue that’s out there and growing.
What I do is when I have a hard problem I try to stockpile as much data to create the most thorough training set that I can possibly create and I think the most successful businesses will be the ones that are able to do that. It turns out there there are actually companies all they do is help you create training sets for your machine learning applications we use a variety of methods to do that crowdsourcing is one common way that’s really expensive to it’s far more expensive I thought it was even possible. Getting startups to find a way to harvest rich training sets that are valuable for inference are potential to be huge winners. It just turns out to be very hard to do.
Another area that is big is wearable technology for the purpose of health monitor personal health. I think that’s an area that has tremendous potential just because you know your physician is starving for data. You have to make a point to see your doctor schedule it etc. So what do they do? They weigh you and take your blood pressure ask how old you are that’s about it. I mean that’s nothing right they know they do not know what’s going on with you. Maybe it’s personality dependent but I would be very much in favor of disclosing all kinds of biometric information about myself it’s continuously recorded and stockpiled in a database and repeatedly scanned by intelligent agents for anomalies and doctors appointments automatically scheduled for me. Same thing with any complicated piece of machinery you know it could be a car it could be parts of your business. This kind of invasive monitoring I think will come with resistant but could be unleashed as people see the value in disclosing.
See full panel here Idea to IPO