Five Things Every Startup Should Be Doing with its Data

After years of hard work and several pivots, your startup is finally seeing some product traction! Now, you’re looking to double-down on your current product vision and grow faster than you ever have before.

This requires making several important strategic decisions. Where should you spend your marketing budget? What parts of your product should you focus on in the next release? And how should you tailor your company’s support to best address customers’ frustrations?

One way to address these questions is to use your intuitions. After all, you know your product better than anyone, right? The problem with this approach is that intuitions aren’t always correct, and startups don’t have the time and resources to waste time on bad decisions. Customers don’t always see products the same way as their inventors.

A better way to address these questions is to ask more objective questions about your data. What marketing efforts have provided the greatest returns on investment? What product features are customers spending the most time using? And which features do they avoid, or use least effectively?

Time to turn to the data! If you're like most early startups, you have all the basics covered. You’re using Google Analytics for marketing. You have Stripe for payment processing. You use Intercom for customer communication. And all your app data is stored in a variety of S3 and RDS instances over at AWS.

The problem is that these products — while great independently — aren’t going to give you the answers you need. They’ll each answer a couple basic questions, but none will offer the kinds of answers you need to make the biggest decisions facing your startup’s growth. For example, you might learn in Google Analytics that Facebook ads are really good at generating trial sign-ups, but do these trial sign-ups ever become paying customers? Moreover, maybe your product is usage-based — are these sign-ups high-usage (i.e., high-paying) customers? Or are they low-usage customers, who are taking all of your support team’s time without paying any of their salaries?

What you need is an integrated, scaleable data strategy that you can turn to in situations like this to get answers to your most important questions — the kinds of questions that span every aspect of your business. If you don’t already have a data strategy, I offer five simple points below that will guide you in preparing your company to use data — and data science — most effectively.

1. Track Everything

Storage is cheap, and it’s been cheap for a long time. This means that there is no excuse for not tracking every user action and interaction possible. It’s easy to filter out what you don’t need (or learn isn’t valuable) later, but it’s impossible to retrieve what you never stored in the first place.

Lots of companies are going to get a good amount of data tracked for free simply by using popular services like those listed above. But what they’ll likely miss is the app-specific data (unless they have a proper data strategy). And if you’re bringing a new product to market, this is the data that matters most!

But some companies will miss even the basic stuff. For example, I once had to rebuild a Software-as-a-Service (SaaS) application’s entire revenue history from raw payment and user information just to determine their historical churn rate. Something that would have taken a single, 1 second query instead took a day of work building and validating a data re-creation setup.

The bottom line: track everything you can, whether you see it as being immediately relevant or not.

2. Link Everything Together

Let’s say you have all these external and internal services tracking lots of data. Your work isn’t done! What you really need is a way to integrate them easily or, preferably, instantly. Your customers interact with each of your business’s departments (e.g., marketing, product development, billing). Unsurprisingly, then, your questions about how to make them spend more, use the product more, and generally be happier often require you to integrate their experience across these departments.

This is where something like Segment (or a similar/custom service) can come in handy. It will help you integrate data from your standard web services (e.g., Google Analytics, AWS, Stripe) into a single data warehouse alongside your custom data. Then, if you know what you’re doing, you can query and model these data to help you make decisions and improve your product. Which leads me to…

3. Build Models to Predict and Understand Key Customer Actions

With the data properly stored and integrated, you can get to the really fun stuff: predicting and understanding your customers’ actions.

The benefits of prediction are clear. By identifying an action a customer will take beforethey take it, you can tailor your interactions with them to maximize their value. For example, by identifying which free trial users are going to convert (and conversely, which are not), you might decide that unlikely-converters should receive a coupon and likely-converters should not (they might just pay full price!). Or by determining who is at risk of churn (leaving your company), you can trigger a series of helpful emails or a real-life phone call that might be enough to extend their time as your customer.

Conversions and churn are the most commonly predicted actions. But with the proper data, you can predict almost anything: Who is going to have the most trouble using your product? Who will recommend your product most? Who is going to take the most support time? Each of these is useful in their own way, and can turn your business from one that is primarily reactive to one that is proactive.

Prediction models typically rely on a variety of machine learning approaches, including gradient boosting, random forests, neural networks, or some Frankenstein-like ensemble of many models.

Beyond prediction, understanding the underlying factors that drive customer actions is also important and, from a decision-making perspective, often more rewarding. At the heart of understanding is explanation — why did a customer leave? Why did they sign up? Why are they having trouble? Identifying these causal factors underlying an action give you the potential for intervention: manipulating some factor in order to increase the likelihood of a desired outcome, or reduce the likelihood of an undesired outcome.

Common statistical models used for understanding actions include decision trees, linear models, principal components analysis, and Bayesian models. But true causal inference will only be licensed when these models are applied to experimental data (see point #5 below).

Building and implementing statistical models like this isn’t easy and, once built, you’ll want to have a process for ongoing model integration, updating, and validation of the models. But doing so can be tremendously helpful to a startup looking to maximize efficiency and effectiveness for fast growth.

4. Communicate Key Metrics and Models

Things are looking up! You have your data stored and integrated, and if you’re lucky (or you hired us), you have killer data models offering live predictions and causal models of your customers’ key actions. Quick question: If a model makes a prediction, and no one is there to see it, did it ever really make a prediction? Answer: It doesn’t matter. Your model is as useless as that question.

Data science for business is like any science: it’s only as good as it’s communicated to the people who need to know about it. Historically, this might have meant a project report presented at a company meeting. But now, this might mean a daily, automated email to your team with key organizational metrics and historical trends, combined with a live dashboard to see exactly how your company is doing and how the models are performing. Or maybe you want to go even further, perhaps with live prediction metrics and auto-segmentation available to your sales and support staff in your CRM? Or automatically-triggered interactions with your customer?

This is where your data strategy really pays off. In fact, this is where it even becomes apparent that you have a data strategy — when what you learn and predict is integrated back into your operations to make your business more efficient.

5. Experiment

The biggest criticism about the ‘big data’ movement is that no amount of data can overcome the issue we all learned about on the first day of our statistical training: correlation does not equal causation. In some cases, like when you are generating predictions, this issue can be ignored. You just want to know whether someone is doing something — you don’t care about the underlying causes.

But if you want to truly understand why something is happening in your organization or with your customers — which is typically the case — you’ll ultimately need to run some experiments, the simplest of which are called ‘A/B tests.’ This is the only way to determine whether some factor (e.g., seeing an advertisement, going through an onboarding process, interacting with your support team) actually causes someone to do something (e.g., subscribe longer, pay more, send you an angry email).

This is why we always recommend having a platform for experimentation setup, and a workflow for easily running valid analyses. (Believe me: if it’s not easy, you won’t run the analyses when it really matters.) For simple website-based experiments, Optimizelymight be your answer. For more complex experiments, you may need a custom platform. But setting up an experimentation platform is critical to turning your best guesses into valid answers.

What’s Next?

These are the five most important components of any data strategy — and things we recommend any startup have in place as soon as possible. Not doing so well with your data strategy? Get in touch with us and we’ll show you how we can help make your next big decision easier than any to this point.