Why do 87% of data science projects fail? And are we sure that it is true?

user5724987
7 min readJul 9, 2021

--

Photo by Ales Nesetril on Unsplash

Presumably every Data Scientist, Machine Learning Engineer or any other person involved in Data Science domain have already heard about MLOps. While it is still a relatively new term, more and more people are getting interested in what it is and how to apply MLOps practices and tools in their projects.

I bet you can notice that interest too. MLOps Community is still growing so dynamically (and I’m really glad to be a part of it). MLOps topic and articles are now included in almost every Machine Learning conference. Recently even Andrew Ng and DeepLearning.ai decided to keep up and released their MLEps course on Coursera: Machine Learning Engineering for Production.

As an MLOps Engineer, I read and watch a lot of related content and I recently started noticing that it became so repetitive — same diagrams, same statistics being used as a core of almost every presentation.

Apart from the diagram you see above, there’s one another particularly interesting resource being repeatedly copied and pasted into these talks and posts which I decided to examine. So in this post, I want to find out whether it’s true that 87% of data science projects never make it into production.

87% of data science…

If the presentation you are watching is meant to be e.g. a showcase of new MLOps product which could be potentially adopted by the community or bought by a client, be prepared to see this:

87% of data science projects never make it into production

But not only then. You can see this statement in Forbes article, StackOverflow blog and all over the Internet in blog posts and conference videos. This quote, sometimes paraphrased, is a must-have if you are targeting the business side of MLOps.

Where does that statement come from? It seems everybody cite VentureBeat article entitled: Why do 87% of data science projects never make it into production? So let’s find out why is that.

VentureBeat Article

Figure 1. Venture Beat Article | Source: VentureBeat.com

The article was written in July 2019 and, I have to point it out, is a sponsored article which references a talk (panel) from VentureBeat Transform 2019 conference. It is basically nothing but a short commentary mixed with quotations from the panel: What the heck does it even mean to “Do AI”?

But if this is a universal understanding, that AI empirically provides a competitive edge, why do only 13% of data science projects, or just one out of every 10, actually make it into production?

There are three ways to get started, and avoid becoming one of the 87%, Chapo said. Pick a small project to get started, he says — don’t try to boil the ocean, but choose a pain point to solve, where you can show demonstrable progress. Ensure you have the right team, cross-functionally, to solve this. And third, leverage third parties and folks like IBM and others to help accelerate your journey at the beginning.

Once again we can see these bold statements that 87% of data science projects fail or don’t make it into production. But where does this number come from? I didn’t find the answer in that article, so I decided to watch the presentation (BTW There was no link to the video, I had to find it on YouTube) — it must be there.

What the heck does it even mean to “Do AI”?

Figure 2. Transform 2019 Panel (Source: YouTube)

Here I am. Watching the recording of the panel from Transform 2019 conference. So I assume this is where it all comes from and I will finally learn more about the magic number which is passed forward from one MLOps presentation to another.

By the way, I couldn’t help but notice that this video has only 353 views and 0 comments after 2 years from the upload date. So I assume that not many people were interested in figuring out why do almost 9 out of 10 machine learning projects fail. It’s okay, I’ll find out.

The video is 26 minutes long and I set my ears to catch the moment one of speakers mention that 87% of data science projects fail (or that only 13% of project succeed or anything similar). I watched this video three times just to be sure I don’t miss anything and I got it. Around 10th minute, you can hear:

I think CIO Dive Magazine says that only 13% of data science projects actually make it into production. 13%. I mean, that’s a staggering number…

Here it is! Said by Deborah Leff — Global Leader and Industry CTO for Data Science and AI, IBM. Unfortunately, it’s just yet another breadcrumb that I need to follow because apparently Transform 2019 panel is not a source of the information I’m trying to confirm.

Let’s find that CIO Dive article then…

CIO Dive Magazine says that…

Figure 3. An article by James Roberts | Source: CIODive.com

In 2017, two years before Transform 2019 conference, James Roberts (Chief Data Scientist at Quisitive in these days) wrote a guest article in CIO Dive Magazine. It’s called 4 reasons why most data science projects fail and I expect to finally reveal why 87% of data science projects fail and how somebody measured that i.e. where does that magic number come from.

The article is relatively short and well structured so I read it from top to bottom a couple of times. and here’s what I discovered:

Experts have called 2017 the year of data literacy and digital transformation. While data is a key component that drives true digital transformation, too often companies approach data and analytics projects the wrong way. In fact, a mere 13% of data and analytics projects reach completion, and of those that do, only 8% of company leadership report being completely satisfied with the outcome.

I already know this number (13%) very well, Deborah Leff was right — it was CIO Dive Magazine where she found that piece of information. But what’s the source? Where’s the explanation or at least another breadcrumb?

Why do only 13% of “data and analytics projects” reach completion?

Unfortunately we know nothing about the source of that statement. Maybe it was just made up for the purpose of CIO Dive article, maybe the author simply forgot to cite yet another article which would finally explain the details of how it was measured that 87% of DS projects fail.

While it’s entirely possible that 9 out of 10 ML projects fail, it is barely possible to measure it in a reliable way. Or even to define the “failure” or “making it into production”. First of all, what does it even mean for a machine learning model to be in production?

Is a single API endpoint served e.g. with FastAPI enough? Or maybe we need a whole CI/CD/CT pipeline and monitoring to be set up? More than that, some project are just not meant (planned) to be deployed into production — do we count these as failures too?

I don’t know and I’m a bit disappointed that I didn’t find anything.

So what does it mean?

Let’s wrap it up

In 2017, A Chief Data Scientist writes an “Opinion”-labeled guest article in CIO Dive Magazine where he states that “a mere 13% of data and analytics projects reach completion”. No source, no links leading to research papers, zero information about where does that magic number come from.

Then, this article is brought up by a Global Leader and Industry CTO for Data Science and AI, IBM Deborah Leff during one of Transform 2019 panel where she says that: “I think CIO Dive Magazine says that only 13% of data science projects actually make it into production”.

This is then quoted by VentureBeat in their sponsored article which promoted Transform 2019 panel. The article doesn’t even provide a link to the video recording though. What happens next?

A dozens or hundreds of ML and MLOps resources cite the same article, the same piece of information: 87% of data science projects never make it into production and use it as a background for selling their tools and products.

I am genuinely disappointed that we spread such unconfirmed piece of information so easily, especially in community which is very close to R&D and academic environment — relying heavily on research.

What does it mean for MLOps? Probably nothing, we still need it. It’s just shocking that we built this community, tools, startups on a phrase which is nothing but a magic number from a single, opinionated article.

Sign up to discover human stories that deepen your understanding of the world.

--

--

Responses (2)

Write a response