Want to learn how to lose $40,000? One company inadvertently did— by embracing infinite storage and compute without guardrails, an oversight that could have been solved with MLOps.
Sometimes defined as “DevOps for ML”, MLOps encompasses everything from the hardware and software on which machine learning is done to the governance and ethics associated with deployment.
Demetrios Brinkmann, coordinator of the MLOps community and host of a number of fantastic AI podcasts, came in to discuss how we should be evaluating models throughout their life cycle. He also introduced a completely unique way to gamify feature stores that could change the way data scientists are remunerated. Demetrios shared some stories about how businesses can benefit from MLOps (and the astounding times ML has gone wrong without it!)
Read on for the transcript!
Hi, I’m Monte Zweben, CEO of Splice Machine. You’re listening to ML Minutes, where cutting-edge thought leaders discuss Machine Learning in one minute or less.
This week, our guest is Demetrius Brinkmann, coordinator of the MLOps community and the host of a number of fantastic machine learning podcasts. Welcome, Demetrios!
Hello, what's going on Monte? Very excited to be here. I appreciate you putting me up in that group of leaders thought leaders, I wasn't expecting that one.
Well, I gotta say that what you're doing from an MLOps community perspective is fantastic. This is a, this is an area that's hard to understand for many people on their machine learning journey, and you're really contributing quite a bit to our whole community. But speaking of journeys, tell us about your journey. How did you get to where you are now?
Yeah, it was all happenstance really. I moved to Spain about 10 years ago, chasing after a girl that I had met in India, and I was teaching English. And then after about nine years of teaching English, so two years ago, my daughter was born. And I thought, teaching English is not the most reliable source of income for a father, a newly minted father to who's got a lot of things on his mind, especially in in the way that they do it in Spain, it would be I would teach for nine months. And then in the summertime, nobody wanted any English classes. So I went out and I got a job in sales for this password management company. And then that really didn't work out. And I was looking for another job. And I got a job in sales at this company called dotscience. And dotscience was doing a bunch of stuff in the MLOps space, and then it went out of business. And so before it went out of business, I managed to start this community.
Oh, that's fantastic. I love that journey. And it seems like you've seen and experienced quite a bit. Talking about MLOps for a moment, one of the things that I find is that MLOps is an acronym. And few people even on the machine learning journey may know what that is. So could you define MLOps for our audience?
Yeah, that's a great question, too. That is still being decided, I think, because we talk about this a lot in the community. And obviously, it stands for machine learning operations, you've may have heard people call it DevOps for ML. But I think it's so much more than that. And there, I used to say, yeah, it goes down to all the way to what kind of hardware you're using. But also, I think there are other pieces that encompass MLOps that aren't spoken about that much. But the main meat of it is helping. It's the operation side. It's basically if you want to get models, machine learning models into production, and keep them there, not just have them be there. Because we we know now that there's a lot of fragility with the models, but it can also span to ethics. It can span to the whole data, everything around data, gathering the data. It's there's so much that's encompassed in the term MLOps.
And I agree this is an evolving field and an evolving community with an evolving set of technologies. Perhaps to make this real for our audience, can you give one or two examples of an MLOps challenge that a practitioner would experience on their journey in machine learning?
Oh, there's so many war stories that we hear about. And it's it's nice to hear the war stories, because you realize you're not the only one going through this kind of stuff. But I mean, data access, I've heard stories about data scientists not being able to get access to the data until six months into a job, which is not that uncommon. Data poisoning. finding out that later-- I mean, this is all talking on the data layer, right. But monitoring once your model is out there, we've heard war stories about a recommender system that for 18 days straight was recommending the same product to every customer that was on this e-commerce website. And who knows how much money that lost the company. But that is something that is very difficult to actually monitor for. It's not like, oh, the system is on or off, because it says that it's on, it's going well, but you don't recognize unless you do a little more digging, that it's not recommending the right things to the right people.
Those are great war stories. And I hear very similar ones on the data access side. If we would move a little bit further along in the data science process, what are some of the war stories you hear on the feature engineering and model development side of the process?
I'm trying to think; I don't know if I have any great war stories of feature engineering, I do know that I have some fun war stories of like, what happens when you give people infinite compute and infinite storage, which is basically what we have these days with the cloud. And I've heard a war story about someone who was working in SQL and they wrote a simple join or something, I can't remember exactly what it was. But it ended up costing the company like 40 grand, because there were no guardrails in place on that. And it's like, these are the kinds of things that we now live in this is the paradigm because we're given this infinite storage, infinite compute. And we can do so many cool things with it. But at the same time, we have to be very mindful of how we do these things. Since we're talking about the infinite compute that's available now, to data engineers, and data scientists and machine learning engineers who are responsible for MLOps as a team.
I'm wondering if you can talk about some of the innovations in MLOps that you're intrigued by?
Yeah, there's, I mean, the last conversation that really jumps to my mind, is this idea of the evaluation store that Josh Tobin has been going around and talking about a lot. And really making sure that you evaluate your steps for the data and the model and everything, each step of the way. So that is really interesting, because it's taking monitoring to a whole new level, it's making sure that you're not just modeling, or you're monitoring once a model is in production, you're monitoring everything, because we know that it is such a cyclical nature with machine learning, you don't have this, just ship it, and then it's good. And we monitor it, once it's out, we need to be monitoring so many pieces of the puzzle.
The nature of machine learning projects, like you say, are so different than traditional IT projects where we design them, we implement and develop the code, we test the code, and then we push the code into production. And then more or less, it lives on its own, as long as nothing goes wrong. But as you say, this machine learning community is just now really taking seriously this iterative process of having to not only go through many, many experiments, to find a model that's worthy of being put into production, but also having to monitor as you said, each step of the way from the data pipelines to the feature engineering all the way through to the actual models in production, because models have to change over time. And I think our community is really taking that seriously. Would you agree with that?
Excellent. Well, how would you counsel companies who may only be beginning their machine learning journey, to take MLOps seriously? I find the citizen data scientist or the the beginning part of a company's journey on machine learning may not be considering what our community talks about in the beginning; Should they?
Yeah, there's a lot of things that you need to keep in mind when you're starting out on machine learning. So if you're looking for something that is going to produce business value, quickly, you may not want to go into machine learning, that might be a little harsh to say. But there are so many ways that machine learning can fail. So many more than traditional software engineering, like we talked about. And it is very research-based, you need to research if you have the data, if the data is good. And if you can do anything with that data, that is really important to think about too, because maybe you're asking something of someone like a data scientist or machine learning engineer. But in all reality, you cannot find the answers to that because of poor data or the lack of data.
You gave a great example earlier, of something that can go financially wrong when you talked about infinite compute. I'm wondering if you can give our audience a view of the financial implications of MLOps from the stories that you've heard, perhaps, are there any stories of what happens financially when you do it well? Or what are some of the financial challenges if you don't?
Well, I think we're, you're always seeing this spectrum of, if you're doing it well, you're able to create so much value for the company. And it's like, the possibilities are endless. And then when you aren't doing it, well, you can really lose a lot of money fast, especially because trying to get to doing it well is so difficult. And it may take a lot of time. And it's like r&d in a way, maybe it's not totally r&d. But there is a lot that goes into actually discovering what can add value to the company. And there's so many ways, again, that it can go wrong, whether it's just not knowing the problem that you're solving, to not being able to have access to the data or not being able to use the data properly, or the model starts to drift and you don't realize it. I mean, there's so many ways that this can go wrong. But when you do use it correctly, it can really go right.
As you may know, I'm writing a book with Ben Epstein, on feature stores. And feature stores make it easy for data engineers and data scientists to operationalize their feature engineering process. But one of the things that I'm really intrigued with is hearing about people's pursuit of interesting features that product produce signal. I'm wondering, in your conversations with members of the community, have you heard any stories about pretty cool features out there that perhaps are not that intuitive, or just really interesting?
Oh, I haven't heard any that jumped out at me right away. But what I have always thought about when it comes to this is, I would love to see with a feature store tool, the ability to have like it gamified a little bit. So you can see who's created the most used features, or who is created, you get little badges along the way of Okay, this person's feature was the most expensive, or this person's feature used the most GPU, or this person's feature is is never used, or this person's features save the company X amount of money or the most amount of money, which would be hard to do it in all reality. But I always have that. That funny idea in my head of like, a really cool thing that would happen with feature stores, although so far, and maybe you you all will change this. So far, nobody has implemented that. I think there's bigger fish to fry.
There may be bigger fish to fry, but it's an excellent idea. And it's kind of fun to think about where that can go. Perhaps one day, perhaps one day the data scientist is able to be remunerated just like a salesperson and I know you came from the sales were world where you're commissioned for the sales you do. Imagine if we were able to pay our data scientists based on how much revenue their features produced in the models that are making recommendations. It's kind of fun to think about that, isn't it?
Exactly. That's what I was thinking.
Well, thank you, Demetrios. We really enjoyed you being on the show. Congratulations on your podcasts and on the MLOps community. And we look forward to participating with you in building those communities.
Likewise, I really appreciate you letting me on here to chat about this. I mean, I could do this all day. Cheers.
If you want to hear Demetrios talk about his favorite upcoming technological innovation, check out our bonus minutes. They're linked in the show notes below and on our website, mlminutes.com. To stay up to date on our upcoming guests and giveaways, you can follow our Twitter and Instagram @MLMinutes. This concludes our final episode for season one. Thanks so much for listening. We'll be returning with season two, where we'll discuss more about machine learning.