Want to learn how to lose $40,000? One company inadvertently did— by embracing infinite storage and compute without guardrails, an oversight that could have been solved with MLOps.
Sometimes defined as “DevOps for ML”, MLOps encompasses everything from the hardware and software on which machine learning is done to the governance and ethics associated with deployment.
Demetrios Brinkmann, coordinator of the MLOps community and host of a number of fantastic AI podcasts, came in to discuss how we should be evaluating models throughout their life cycle. He also introduced a completely unique way to gamify feature stores that could change the way data scientists are remunerated. Demetrios shared some stories about how businesses can benefit from MLOps (and the astounding times ML has gone wrong without it!)
Read on for the transcript!
Hi, I’m Monte Zweben, CEO of Splice Machine. You’re listening to ML Minutes, where cutting-edge thought leaders discuss Machine Learning in one minute or less.
This week, our guest is Demetrius Brinkmann, coordinator of the MLOps community and the host of a number of fantastic machine learning podcasts. Welcome, Demetrios!
Hello, what's going on Monte? Very excited to be here. I appreciate you putting me up in that group of leaders thought leaders, I wasn't expecting that one.
Well, I gotta say that what you're doing from an MLOps community perspective is fantastic. This is a, this is an area that's hard to understand for many people on their machine learning journey, and you're really contributing quite a bit to our whole community. But speaking of journeys, tell us about your journey. How did you get to where you are now?
Yeah, it was all happenstance really. I moved to Spain about 10 years ago, chasing after a girl that I had met in India, and I was teaching English. And then after about nine years of teaching English, so two years ago, my daughter was born. And I thought, teaching English is not the most reliable source of income for a father, a newly minted father to who's got a lot of things on his mind, especially in in the way that they do it in Spain, it would be I would teach for nine months. And then in the summertime, nobody wanted any English classes. So I went out and I got a job in sales for this password management company. And then that really didn't work out. And I was looking for another job. And I got a job in sales at this company called dotscience. And dotscience was doing a bunch of stuff in the MLOps space, and then it went out of business. And so before it went out of business, I managed to start this community.
Oh, that's fantastic. I love that journey. And it seems like you've seen and experienced quite a bit. Talking about MLOps for a moment, one of the things that I find is that MLOps is an acronym. And few people even on the machine learning journey may know what that is. So could you define MLOps for our audience?
Yeah, that's a great question, too. That is still being decided, I think, because we talk about this a lot in the community. And obviously, it stands for machine learning operations, you've may have heard people call it DevOps for ML. But I think it's so much more than that. And there, I used to say, yeah, it goes down to all the way to what kind of hardware you're using. But also, I think there are other pieces that encompass MLOps that aren't spoken about that much. But the main meat of it is helping. It's the operation side. It's basically if you want to get models, machine learning models into production, and keep them there, not just have them be there. Because we we know now that there's a lot of fragility with the models, but it can also span to ethics. It can span to the whole data, everything around data, gathering the data. It's there's so much that's encompassed in the term MLOps.
And I agree this is an evolving field and an evolving community with an evolving set of technologies. Perhaps to make this real for our audience, can you give one or two examples of an MLOps challenge that a practitioner would experience on their journey in machine learning?
Oh, there's so many war stories that we hear about. And it's it's nice to hear the war stories, because you realize you're not the only one going through this kind of stuff. But I mean, data access, I've heard stories about data scientists not being able to get access to the data until six months into a job, which is not that uncommon. Data poisoning. finding out that later-- I mean, this is all talking on the data layer, right. But monitoring once your model is out there, we've heard war stories about a recommender system that for 18 days straight was recommending the same product to every customer that was on this e-commerce website. And who knows how much money that lost the company. But that is something that is very difficult to actually monitor for. It's not like, oh, the system is on or off, because it says that it's on, it's going well, but you don't recognize unless you do a little more digging, that it's not recommending the right things to the right people.
Those are great war stories. And I hear very similar ones on the data access side. If we would move a little bit further along in the data science process, what are some of the war stories you hear on the feature engineering and model development side of the process?
I'm trying to think; I don't know if I have any great war stories of feature engineering, I do know that I have some fun war stories of like, what happens when you give people infinite compute and infinite storage, which is basically what we have these days with the cloud. And I've heard a war story about someone who was working in SQL and they wrote a simple join or something, I can't remember exactly what it was. But it ended up costing the company like 40 grand, because there were no guardrails in place on that. And it's like, these are the kinds of things that we now live in this is the paradigm because we're given this infinite storage, infinite compute. And we can do so many cool things with it. But at the same time, we have to be very mindful of how we do these things. Since we're talking about the infinite compute that's available now, to data engineers, and data scientists and machine learning engineers who are responsible for MLOps as a team.
I'm wondering if you can talk about some of the innovations in MLOps that you're intrigued by?
Yeah, there's, I mean, the last conversation that really jumps to my mind, is this idea of the evaluation store that Josh Tobin has been going around and talking about a lot. And really making sure that you evaluate your steps for the data and the model and everything, each step of the way. So that is really interesting, because it's taking monitoring to a whole new level, it's making sure that you're not just modeling, or you're monitoring once a model is in production, you're monitoring everything, because we know that it is such a cyclical nature with machine learning, you don't have this, just ship it, and then it's good. And we monitor it, once it's out, we need to be monitoring so many pieces of the puzzle.
The nature of machine learning projects, like you say, are so different than traditional IT projects where we design them, we implement and develop the code, we test the code, and then we push the code into production. And then more or less, it lives on its own, as long as nothing goes wrong. But as you say, this machine learning community is just now really taking seriously this iterative process of having to not only go through many, many experiments, to find a model that's worthy of being put into production, but also having to monitor as you said, each step of the way from the data pipelines to the feature engineering all the way through to the actual models in production, because models have to change over time. And I think our community is really taking that seriously. Would you agree with that?
Excellent. Well, how would you counsel companies who may only be beginning their machine learning journey, to take MLOps seriously? I find the citizen data scientist or the the beginning part of a company's journey on machine learning may not be considering what our community talks about in the beginning; Should they?
Yeah, there's a lot of things that you need to keep in mind when you're starting out on machine learning. So if you're looking for something that is going to produce business value, quickly, you may not want to go into machine learning, that might be a little harsh to say. But there are so many ways that machine learning can fail. So many more than traditional software engineering, like we talked about. And it is very research-based, you need to research if you have the data, if the data is good. And if you can do anything with that data, that is really important to think about too, because maybe you're asking something of someone like a data scientist or machine learning engineer. But in all reality, you cannot find the answers to that because of poor data or the lack of data.
You gave a great example earlier, of something that can go financially wrong when you talked about infinite compute. I'm wondering if you can give our audience a view of the financial implications of MLOps from the stories that you've heard, perhaps, are there any stories of what happens financially when you do it well? Or what are some of the financial challenges if you don't?