MIT spinout aims to make data scientists heroes
Originally posted on The Horizons Tracker.
Data seems to be everywhere at the moment, and the challenge has progressed from collecting it to making sense of it, whether that’s ensuring the data is actually usable or deriving insights from it. This challenge is exacerbated by a general shortage in employees skilled in data science.
Even when you have the talent available however, getting the insights out of vast swathes of data can be incredibly time consuming. This is the challenge that MIT spinout Endor set out to solve. Their platform aims to let any lay person upload raw data and then query that data with a simple, natural language business question. It’s a Google for your data essentially, albeit the responses take around 15 minutes rather than 15 nanoseconds, but that is still a big improvement on the time it takes human beings to perform the same task.
The startup began life at the famed MIT Media Lab and counts social physics guru Alex “Sandy” Pentland as a co-founder. Social physics uses AI and maths to better understand and eventually predict crowd behaviors.
Finding the needle in the haystack
Of course, as with any search engine, it’s important that you know the right question to ask, and indeed to know your ‘unknown unknowns’, so the platform comes with a query-builder tool to help you create the right question.
“It’s just like Google. You don’t have to spend time thinking, ‘Am I going to spend time asking Google this question?’ You just Google it,” the team say. “It’s as simple as that.”
The company has already secured some major customers, including Walmart and Mastercard, whilst it has also been mining Twitter for potential terrorists for a defence agency.
The tool aims to provide a richer insight into data by ensuring that human behaviors are taken into account. Simply querying media is relatively easy as the problem is static, but human behavior changes all the time.
“In general, you need a lot of data to build accurate models for human behavior, and that means you have to rely on the past. Because you rely on the past, you cannot detect things that recently happened, and you can’t predict human behavior,” the team say.
By using social physics however, they are able to imbue the system with much better understanding of crowd dynamics. The technology extracts clusters of behavioral commonalities from the raw data, and what’s more, can do so much faster than traditional machine-learning based approaches.
Suffice to say, the tool isn’t designed to replace the work data scientists do, but rather to compliment their work and empower them to perform faster and more effectively. Hopefully this will reduce the bottleneck that exists within many companies and truly allow them to get rapid insights from the data they hold.