FAQ: What Is Data Science in the D2C Space Like?

This content is an excerpt from an interview I did with fellow SharpestMinds alum Amber Teng. I’ve also added a few more comments and observations below.

Please keep in mind that I was only in this role for about 6 months, and my thoughts may not be representative of the entire D2C space.

Also, if you’re interested in data science in D2C, check out the Locally Optimistic community - it’s a huge Slack group of data analysts, data scientists, and data/analytics engineers, all discussing various aspects of how to best do data collection, analysis, and modeling in the D2C world.

Q: Previously, you interned at Curology as a data science intern. Could you discuss how data science looks like in the skincare industry? What types of questions did you and your team seek to answer? And what are some of the most interesting projects you worked on in your time at Curology?

A: My experience at Curology was probably a good example of how data science looks in a D2C (direct-to-consumer) business in general, especially in a fast-paced startup context.

First, it is often the case that the first thing consumer-focused businesses need data-wise (after data engineers, of course) is really just a lot of descriptive statistics, often known as “consumer insights.”

Since I was embedded in the user acquisition department, I was especially focused on answering questions that would help us make better marketing decisions across the many different acquisition channels.

80% of the time, I was writing SQL against our data warehouse to better understand the behavior of different customer segments and track how that behavior trended over time, and turning those findings into interpretable dashboards for use by the rest of the team.

The other 20% of the time, I used Python to analyze and visualize customers’ survey responses to better understand what they liked and needed from Curology.

So a few of the questions I got to ask and answer were:

  • How do customers’ skincare goals vary based on their demographics (gender, age, etc.)? What is most important to each segment of customers, and how can we make sure we serve each of their needs well?
  • Which of our channels have had the “stickiest” customers, i.e. customers who have tended to stay with us the longest? Do any other behaviors or preferences correlate with subscription length?
  • Can we build a model that will leverage the historical data we have on customer behavior to predict customer lifetime value (LTV) at time of signup? (This is actually very hard when your customer base is growing quickly, due to sampling considerations!)

I learned a ton in this role. Doing data analysis with SQL doesn’t just help you learn SQL; it actually helps you think analytically, as cliche as that sounds.

You first have to learn to translate someone’s natural-language question about customers into the appropriate metrics (where those metrics will often have different filter conditions and assumptions, depending on the intended use-case!), then ALSO learn how to actually execute that in a mathematically and technically correct way using SQL code.

Sometimes you will even have to make sure that you are using the correct tables/data, because tables get deprecated, not all data makes it into the table due to bugs in the pipeline, or X metric only started being tracked 6 months ago, etc.

There are many practical considerations you need to keep in the back of your mind when doing this kind of work. Doing rock-solid data analysis is just as challenging as machine learning IMO, albeit sometimes for different reasons.

A few additional observations:

In general, some of the most common problems people work on in the space are:

  • Churn analysis and modeling (figuring out what factors are associated with people canceling their subscriptions or not making further purchases)
  • Marketing attribution (figuring out which marketing campaigns are driving the most customer growth, so that marketing spend can be focused there)
  • Consumer insights (as mentioned, descriptive statistics about current and former customers - standard business intelligence stuff - sometimes also associated with some other measurement, such as how much $$$ the customer has spent with you thus far, aka LTV)
  • A/B testing (at Curology, this was mostly handled by the product team, but it involves testing changes to things like landing pages or signup / cancellation flows to see if you can change outcomes, such as increasing signups or decreasing cancellations)

In my experience, there is not a lot of machine learning to be done in the early stages of a D2C business. Whatever ML can be done is actually quite challenging to get right, since your customer composition is often changing very rapidly. For example, the customers you’re getting today from Instagram are going to look very different from your word-of-mouth customers from 2 years ago, in terms of their behavior, goals, loyalty, price sensitivity, LTV, and potentially other areas.

A second barrier to ML is that since the data org is likely to be more focused on analysis and business intelligence, it is not going to have reasonable infrastructure set up for deploying machine learning models. You will either have to tape it together from scratch yourself (which might mean working with Airflow, cron jobs, deploying your model to the cloud in a Docker container…) or work closely with a data engineer who can help to build out a basic starting solution.

Doing D2C data science well, especially when you’re new to the business, requires a lot of humility and willingness to ask questions. It will mean talking with people all across the org who understand things like the user experience flow, recent changes (like product launches) that might’ve caused spikes or anomalies in the data, and anything else that needs to be taken into consideration to do context-sensitive analysis that is actually useful to the business.

Frankly, at times I found this very daunting - there are so many opportunities to miss some critical piece of context, or to accidentally use the wrong table or metric for an analysis, or to make mistaken statistical assumptions about your customer data. This is why communication and humility is so important for these kinds of roles. If you’re looking for a role where you can code at your desk 90% of the time without ever having to talk to anyone, data analysis / data science in consumer-focused businesses is likely not a fit for you.

updated_at 31-10-2020