Is Data Science only Useful for Organizations with 'Big Data'?

May 3rd, 2016 by Brock

The promise of data-driven insights and optimization has enticed businesses of all sizes — from the two-person basement startup next door to icons of the Fortune 500 — to dive headfirst into data science. Whether they are building in-house data science teams or hiring consultancies, more and more businesses each year are choosing to leverage their data to build smarter products and make smarter decisions.

But not everyone is convinced that organizations should be so bullish on data science. Bilal Mahmood over at Bolt Data, for example, recently wrote that not every company needs data science. To summarize, he argues that data science is only helpful to companies that:

  1. collect lots of data,
  2. have already stored lots of historical data, and
  3. can benefit from short-term predictions.

The theme underlying these points is that unless you meet some kind of ‘big data’ threshold, data science is not going to be useful for your organization. While we agree that these features may accurately reflect the typical data science project, we disagree that only companies that meet these criteria should work with data scientists.

Make sure you’re prepared for big data

One issue with this emphasis on big data is that a company that waits to develop a data strategy until they have lots of data will inevitably wish they had turned to data science earlier. More often than not, organizations will collect only a fraction of the data they should collect– and moreover, they will scatter the data in many locations and formats. While data scientists can overcome these hurdles, poor data storage will slow down later data science projects, and any issues with missing data will pose critical limits on the insights data scientists can offer.

This is why we recommend that every organization take a few simple steps to develop a data strategy as soon as possible. We are not saying that every startup should go out and hire 5 data scientists, put them in an office, and have them watch as data trickles drip-by-drip into a PostgreSQL database. Instead, most smaller organizations will do best by engaging a data science consultancy to consider the kinds of data science opportunities their organization has (now and in the future), and integrate their data-generating properties into a highly-functional data store in a way that meets their goals.

Once in place, an organization’s data strategy can operate on auto-pilot while the business grows. When the time comes, your organization will actually be able to do the heavy-lifting Balil describes in his blog post: you'll have saved the information that data-scientists actually need in order to deliver insights. You'll get faster, clearer, and more accurate results and predictions than an organization that waited until it was too late.

Matching the approach to the dataset

A second issue with an argument that focuses only on "big" data is that well-trained data scientists can also do amazing things with with small datasets. It just takes a different (i.e., more careful and thoughtful) approach.

Typically, with big data, one can deploy relatively naïve machine learning approaches that will learn a lot simply by allow any insights to arise ‘bottom-up’ from the data. For example, when training image recognition models using deep learning (e.g., when predicting movie quality), most of the time is spent designing an ample balanced dataset. The main benefit here is that one need not enforce strict assumptions on their statistical models to learn useful things and make predictions — the sheer amount of data tends to overwhelm these assumptions, anyways.

Small datasets, on the other hand, demand a different approach. Rather than relying on bottom-up data-mining to pull insights out of data, making sense of small data requires a deeper consideration of model assumptions and the hypotheses to be tested.

Working with small data also often requires a different suite of statistical tools. Bayesian models, for example, are typically too slow for big data but are perfect for making inferences from small datas. They blend a priori intuitions about features that are likely to be important together with the limited data you have to generate a current set of ‘best guesses’ about the best actions to take. These more ‘top-down’ insights into your customers or organization are often just as useful in strategizing as the completely ‘bottom-up’ insights from naïve pattern detection in big datasets.

It’s never too soon for your organization to begin using data science. Whether you are preparing for growth by designing and implementing a data strategy, or leveraging the ‘small’ data you have right now, working with data scientists sooner rather than later will ensure that your organization gets the most from their data.