Back in the 90's, people said, "Just write code!" That is, until they realized how hard it was to update the software once it had been released. DevOps techniques were created to shorten the development life cycle and provide high quality software on a regular basis. The same thing is happening in the tech industry today, as we're beginning to operationalize Machine Learning with MLOps technologies.
Read on for the transcript!
Want to win an Apple Watch? It's as simple as texting a friend. Learn more at MLMinutes.com/giveaway.
Hi, I’m Monte Zweben, CEO of Splice Machine. You’re listening to ML Minutes, where cutting-edge thought leaders discuss Machine Learning in one minute or less.
This episode, our guest is Andrew Brust, founder and CEO of Blue Badge Insights. Andrew provides strategy and advisory services to data analytics, BI and AI companies. He also writes for ZDNet, and is a lead analyst for GigaOm. Welcome, Andrew.
Hey, thank you for having me. Glad to be here. And that intro is terrific. It's a polite way of saying I've just been doing database stuff for so long that now I get to talk about it.
Well, you've worked as a practitioner, a consultant, and a journalist in the data science and database space. Tell us about your journey, though: how did you get to where you are now?
I think the short version goes like this: a long time ago, in the horse and buggy days, I was a programmer, I was a database person going back to the mid-80's, got into consulting and then I sort of made a hobby out of giving quotes to tech journalists who were writing articles about various things and needed explanations. And I decided to take that hobby and kind of pivot into what I focused on, where I'd be writing and I'd be pontificating about the industry. And somehow I wrangled my way into that.
Fantastic. I remember the first time you interviewed me, you asked me a question where I think I said, "Wait a minute, do you know how to code?" And that was how we started our relationship. So I'm really excited to move on to today's topic, perhaps you can explain to our audience what MLOps is.
Yeah, well, it's based on the idea that all of the good engineering that we've put into the development world, which once upon a time didn't really have it, right, everything was kind of ad hoc. But now we have DevOps, and we have real kind of productionalized approaches and engineering approaches to getting software out and getting updates and releases out, the idea is to take some of that same discipline, and apply it to the world of data science and artificial intelligence and machine learning. And that is now becoming not just an elegant thing to have, but a necessity, because so many companies are now doing AI at a volume where they kind of need that discipline or else it becomes unmanageable.
Okay, so I think you subscribe to the viewpoint that MLOps is sort of a merging of data science and machine learning with the traditional DevOps types of principles. And perhaps it being something that isn't just a nice to have, but a must have. Why is it so important now? Why is it a necessity?
Well, there's bare necessity that comes from just once you have more than a few models in production, you really need automation and real technology to keep yourself organized and to keep releases managed, and so forth. But beyond that, beyond those logistics, there's also kind of a, an ethical component to this, which is because machine learning models can really have impact on people's lives. If the accuracy falls below a certain threshold or if they're based on training data that no longer reflects the realities on the ground, they may inadvertently cause pain to people and actually disenfranchise them from certain entitlements and so forth, it depends on the domain of the model. And that needs to be monitored. And there needs to be automation around monitoring about retraining the models and about deploying them quickly. Without that we have real ethical problems.
Excellent. So the viewpoint of MLOps is being just something that helps us get models into production is really just a small subset, as you said, we need to look at models that are already in production, and be able to monitor them over the long haul. Now, let's move to what's under the hood: tell me about the different approaches that you've heard that people are using for MLOps
Well, I I kind of view it, and it's hard to do this over audio, but I view it as a series of concentric rings. I mean, minimally, what you want to do is have some automation around picking algorithms and setting parameter values on those algorithms to get models to be as accurate as they can be. And a lot of that can be automated, but again, we need to be careful. Every time we can automate something, we should automate away the real grunt work. But we should not sort of remove monitoring and vigilance by real human beings or we get ourselves in trouble. So we start with that. And then we can add on things like the monitoring that I mentioned, the automated deployment, the ability to roll back to a previous version of the model, if the new deployment works out not so well, it's a form of A/B testing, really. And then there are more advanced pieces to this, like feature engineering, and other things that are still emerging; explainability of models is part of that.
Okay, great. So, first, you suggested that MLOps requires the ability to pick algorithms and parameters for those algorithms in somewhat sophisticated if not automated ways. And you also talked about monitoring those algorithms as they're in production, to help automate the deployment of those models to get into production. And you talked about rollback. And I think what you mean by rollback is the traditional database and system view of rollback, where once you have something live and running, perhaps making recommendations on a commerce site or detecting fraud or suggesting recommendations in general, where if something is being monitored and looking like it might not be providing the sufficient or accurate answers, that model can be rolled back meaning removed, and an old one put into place.
Right. And that, that should happen at the click of a button rather than being a fire drill, where everyone panics and figures out how to kind of manually put everything back the way it was, and MLOps makes that so
Excellent. Well, I think that's a great segue to talk about implementation. What are some of the great MLOps implementations that you know of today?
Well, it is still emerging, as I said before, and I think I think to answer what the best one is, depends a little bit on how you view things in terms of flexibility versus polish. So some of the platforms out there are still very kind of open-ended. They're based on bringing lots of different open source frameworks together and making them all available under the umbrella of the same environment. For data scientists of a certain sophistication, I would argue that's probably the best because it provides the most flexibility. And it isn't overly prescriptive, and an area that is still kind of shaking out. On the other hand, you know, for different enterprise environments, where things may need to be more mechanized and more automated, having a more of a machine and more of a polish may make more sense. And I don't know if you want me to name names, but there are vendors that are specifically focused on on that more kind of polished approach.
Would you like to expand on that?
Sure, I would say on the more kind of open-ended side, here's a place where lots of different frameworks can come together, Cloudera with their Cloudera machine learning component of their Cloudera data platform is doing a good job. Full disclosure, they're a client of Blue Badge Insights, so I must have some bias in there. Companies like Dataiku, are doing a really nice job putting together an end to end more polished platform. Algorithmic takes a very DevOps-y approach to MLOps, such that it can even integrate with source code control, including, you know, git-based frameworks like GitHub and so forth. And then I think, you know, I have a long affiliation with Microsoft and their technology, Azure Machine Learning is kind of a good middle ground, and that they've put a lot of open source stuff together, but they've also added some nice UI abstractions over it, and I think made those things easier to use in terms of explainability, and in terms of experimentation, and model deployment and so forth.
Excellent. Andrew, while ML Minutes is not about Splice Machine, since you are talking about implementers of MLOps, where would you put Splice Machine?
Yeah, I would say you guys kind of go in your own category. I mean, you're you're using mainstream MLOps technology in that you've adapted MLFlow. And obviously, all the constructs of machine learning, in terms of features and experiments and runs and predictions and accuracy are all there, but you've turned it on its head a little bit and made it very database-centric, it lives in the database. The MLOps happens as a as a consequence of what you do in the database. I'm gonna make up a term on the fly here and call it Ambient MLOps in that it just happens as a matter of course, it's not even something that the practitioner necessarily needs to think about. And I would argue that that's probably better, because just as I said, it happens as a consequence of other practices. And it's not something that requires a detour. What I also kind of like about what you guys did this, you took that standard ML Flow, but because it's open source software, you used it in the spirit, it's designed, I think, and have enhanced it, and have kind of tailored it to work within the paradigm of the Splice Machine database. So I think that's, it's all pretty slick.
Oh, thank you, I think all of us in the space, like Dataiku and Cloudera, and what Azure is doing, and Splice Machine, all of us are just really trying to make machine learning and data science much easier for the data scientists to get quality models into production. So with that, perhaps we could move on and sort of look at the group of us in general that are deploying technologies to deal with MLOps. What's one specific challenge MLOps developers face?
I think maybe the big challenge, and it's the same for any new technology areas, that the people mandating and commissioning them to do the work and funding, may not understand all the intricacies in the importance of infrastructural things like MLOps. It's the kind of thing that you appreciate the value of once you've had a problem. And then you need to solve the problem. Once you've gone through the pain, then you appreciate the value. It's it takes more sophistication to have an a priori kind of understanding of its value before something's gone wrong. And like I said, that's been the case with other technologies. I mean, back in the day in the 90's, when we were writing code, everybody said, "Just write code, what's this version control thing about? It seems very, you know, overly engineered, and we don't really need that." And of course, you know, today, we know that to be kind of folly, and we are where we are.
Yeah, I think it's often the case that we say, emphasizing MLOps to a team that either hasn't deployed a model or has only deployed one model is probably not going to resonate. But once they deploy their second model, and it's been in production for a while, they start to feel the pain. Well, let's look forward now. What do you see coming next in the MLOps world?
Well, my crystal ball's a little fuzzy, but this notion of feature stores and the idea of feature engineering and the Data Prep for features, factoring that out separately from doing things like experiment runs on the models. That is, that's a pretty bleeding edge concept. But once you familiarize yourself with it, you kind of you see it after the fact as pretty obvious and pretty necessary. So I have a feeling that will pick up adoption, rather quickly, once knowledge of it really spreads. I also think more automation of the MLOps itself will happen if you think about the way MLFlow works, it requires you to use an API to log parameters and log assets and so forth. On the DataBricks platform, there's now a mode where you can sort of auto log. And that's still experimental. But more of that, I think will take place that that things just happen as a consequence of doing the work, not dissimilar from the way you've set things up inside the Splice Machine platform and environment.