Check out highlights from the 2024 Metis Strategy Summit | Read more

Building a Data-Driven Culture Through Business Experimentation

Back to All Insights

“Given a ten percent chance of a 100 times payoff, you should take that bet every time. But you’re still going to be wrong nine times out of 10.” –Jeff Bezos

Leading organizations like Amazon, Walmart, Uber, Netflix, Google X, Intuit and Instagram have all vigorously embraced the philosophy that rapid experimentation is the most efficient and effective path to meeting customer needs. In an interview with Metis Strategy’s Peter High, entrepreneur Peter Diamandis explains that the most nimble and innovative companies like Uber and Google X “are running over 1,000 experiments per year and are creating a culture that allows for rapid experimentation and constant failure and iteration.”

Traditional strategic planning taught us to study all the pieces on the chess board, develop a multi-year roadmap, and then launch carefully sculpted new products or services. Executives believed that there was only one chance to “get it right,” which often left organizations allowing perfect to be the enemy of the good.

However, in the digital era, decision velocity is more important than perfect planning.

Accelerating decision velocity through experimentation

The most successful organizations cede the hubris of believing they will always be able to perfectly predict customer or user demands, and instead let data—not opinions—guide decision making. The data that informs decision making is derived from a series of experiments that test a hypothesis against a small but representative sample of a broader population.

The experiment should examine three questions

  • Can you use the offering?
  • Would you use the offering?
  • Would you pay (with money, data, time, etc.) to use the offering over an alternative?

And then lead to one of three conclusions:

  • The hypothesis is overwhelmingly correct, and we should pursue the idea earnestly;
  • There is positive momentum for the hypothesis, and we should expand the sample size and continue testing and evolving the offering; or
  • The hypothesis is wrong, and should either be scrapped, or the approach be dramatically recalibrated

Often, experiments fall into the second category, in which case organizations demonstrate enough viability to iterate on the idea to further hone and enhance the product-market fit. The key is to gain this insight early, and course-correct as necessary. It is easy to correct being two degrees off course every ten feet but being two degrees off course over a mile will cause you to miss your target considerably (+/-0.35 feet vs. +/- 184 feet).

One simple example is when Macy’s was evaluating the desire build a feature that would allow customers to search for a product based on a picture taken with their smartphone. Other competitors had developed something similar, but before Macy’s invested significant sums of money, the retailer wanted to know if the idea was viable.

To test the idea, Macy’s placed a “Visual Product Search” icon on its homepage and monitored the click-through behavior. While Macys.com did not yet have the capability to allow for visual search, tens of thousands of customers clicked through, and Macy’s was able to capture emails of those that wanted to be notified when the feature was ready.

This was enough to begin pursuing the idea further. Yasir Anwar, the former CTO at Macy’s, said teams are “given empowerment to go and test what is best for our customers, to go and run multiple experiments, to test with our customers, (and) come back with the results.”

To accelerate decision velocity, we recommend that all companies develop a framework to create a “Business Experimentation Lab” similar to the likes of Amazon and Walmart. This Business Experimentation Framework (BEF) should outline how people with the right mindset, enabled by technology (though sometimes technology is not necessary), can leverage iterative processes to make more well-informed, yet faster decisions. Doing so frees organizations from entrenched, bureaucratic practices and provides mechanisms for rapidly determining the best option for improving customer experiences out of a list of possibilities.

A Business Experimentation Framework is crucial to:

  • Rapidly accelerate test-and-learn cycles, allowing your organization to avoid stagnation or “analysis paralysis”
  • Provide sample data on risk mitigation, giving insights into the effort and costs needed for scaling a solution
  • Cost-effectively facilitate customer feedback, thereby allowing solution-market fit to be more easily acquired, and
  • Create an environment that fosters quality ideas, rather than one that enforces the notion that ideas can only come from top executives

Business experimentation through A/B testing at Walmart

While nearly every department can introduce some flavor of experimentation into their operating model, a core component and example in eCommerce is A/B testing, or split testing. A/B testing is a way to compare two versions of a single variable, and determine which approach is more effective.

At a recent meetup at Walmart’s Bay Area office, eCommerce product and test managers discussed the investments, processes, and roles required to sustainably hold A/B testing velocity while ensuring the occurrence of clean, accurate, and controllable experiments. Walmart began its journey towards mass A/B testing with a top-down decree—“What we launch is what we test”—and now is able to run roughly 25 experiments at any given time—and Walmart has grown the number of tests each year from 70 in 2016 to 253 in 2017.

To enable A/B testing at this velocity and quality, Walmart developed a Test Proposal process that organizes A/B tests and provides metrics for test governance, so teams can quickly make decisions at the end of a test. A Test Proposal defines:

  • Test start date and stop date: When will the test start and how long should it run?
  • Mockup of the desired test-state: What will be experienced by users in the test sample?
  • Measurements: How will we measure the business outcome and success of the test? Do we evaluate company-wide metrics like revenue, visits or conversion, or do we prioritize test-specific operational metrics like click-through rate, viewed items, items in cart?
  • Go/No-Go Decision: Given the defined measurements, what threshold is required for a test to be deemed successful or unsuccessful? If the test is successful, what will be the next action? If unsuccessful, how will the release roadmap or initiative change?

To facilitate the lasting adoption of a Business Experimentation Framework, organizations must staff critical roles like test managers, development engineers, and test analysts. Walmart, for instance, has created the following roles to enable the launch and analysis of 250 tests per year:

  • 2 Test Mangers (soon to be 3 Test Managers), responsible for coordinating test releases, ensuring no collisions (bias introduced by overlapping tests) occurs, allocating the proper amount of site traffic to tests, and more
  • 8 Development Engineers, responsible for developing, enhancing and maintaining the Walmart in-house A/B testing platform, as well as working directly with project engineers to support building features that are capable of being A/B tested
  • 20 tests analysts (yes, 20) whose responsibilities range from A/B testing specific tasks like gathering test-related metrics, analyzing, and concluding if the test passes or fails, to other tasks around trend monitoring, pre-post analysis, opportunity sizing etc. Launching roughly five tests per week, Walmart requires an inordinate number of analysts to ensure testing funnels stay clear of bottlenecks

Creating an experimentation-oriented organization

Institutionalizing a bias for experimentation is not easy. We have seen several barriers to adopting a Business Experimentation Framework, such as:

  • A lack of an analytical, data-driven culture (and basic understanding of statistics)
  • A desire to “do what we know is right” over patience for testing
  • An over-rotation and paralysis when, if the data is not conclusive, no action can be taken
  • Testing teams turning into order-takers without a clear sense of prioritization
  • Technology that becomes a bottleneck to running simultaneous tests
  • A belief that a failed test is a failure and waste of money—when in reality it is a learning, and a mitigation of further wasted resources

Typically, enthusiasm for experimentation gains momentum with one beachhead department. That department develops a test-approval process that is supported by the tools and data necessary to test, analyze, learn, and make accurate go/no-go decisions.

Here is a blueprint for introducing a test-first culture:

  • People: Create dedicated, value-stream-oriented test-manager roles responsible for test launch and accountable for business outcomes. Provide developers who will upkeep test environments, as well as analysts who will be responsible for receiving test data and analyzing results
  • Process: Infuse testing into agile product-management practices, with customer centricity at the core. After a few experiments, empower test stakeholders to develop their own test proposals and take ownership over results
  • Technology: Construct mechanisms to segment target populations, serve a unique experience, and capture data on the performance of the experience. Provide access to metrics associated with tests to accelerate decision making. While there is a wealth of tools available for experimentation, organizations can apply these principles to physical, in-person experiments as well
  • Business Experimentation Framework maturity: Developing a Business Experimentation Framework should be approached like building any other competency: As a journey of increasing maturity over time. Here are some ideas to get you started:
    • Digital tests can be manual (an email blast with two different versions), automated (A/B testing or multivariate testing following rules in a tool, such as Adobe Target or Monetate), customer research lab (customers physically interact with technology products and heat mapping/eye tracking to gather data), paid search or faux-link smoke testing (bid on search terms or fake link on a webpage that tracks clicks and gauges interest), survey based (scenario ranking, conjoint analysis, etc.) or others.
    • Physical tests include proactively sending free samples and monitoring repurchase rates, changing a process in a store and monitoring business outcomes like checkout times, or conducting customer-intercept user testing with one product over another.

If done well, establishing a Business Experimentation Framework will allow organizations to figure out what matters to most customers, within a limited amount of time, for a limited cost, and with a risk-reward tradeoff that will ultimately play to their favor.

As Bezos said, “We all know that if you swing for the fences, you’re going to strike out a lot, but you’re also going to hit some home runs. The difference between baseball and business, however, is that baseball has a truncated outcome distribution. When you swing, no matter how well you connect with the ball, the most runs you can get is four. In business, every once in a while, when you step up to the plate, you can score 1,000 runs. This long-tailed distribution of returns is why it’s important to be bold. Big winners pay for so many experiments.”