How do you know if something is sustainably sourced? It’s hard to tell down here on the ground, but cameras way up in space are capturing movement from all around the world to help answer this question. James Crawford, CEO of Orbital Insight, came in to explain how his company uses computer vision to make observations about the world we live in today, and ways we can use these observations to optimize sustainability in business supply chains. We also discuss the potential for unsupervised learning, including simulated imagery, to improve accuracy in computer vision models.
Read on for the transcript!
Hey, it’s Monte. You have two more days to enter our Apple Watch giveaway! Raffle closes December 25th at midnight. Enter at mlminutes.com/giveaway.
Hi, I’m Monte Zweben, CEO of Splice Machine. You’re listening to ML Minutes, where cutting-edge thought leaders discuss Machine Learning in one minute or less.
This episode, our guest is Jimi Crawford, CEO of Orbital Insight, a startup that uses satellite imagery to understand national and global socio-economic trends. Welcome, Jimi.
Jimi, you and I have been around similar circles: you were at NASA Ames, that was a great experience for me in my career, you've worked on supply chain initiatives, and I2 technologies, built data platforms and composite. And you even worked on mining the moon! But in your own words, can you tell us about the journey that you're on right now and how you got there?
Certainly. So about six years ago, I was looking at the state of the space. And one of the things I noticed was that there were a tremendous number of startups, working on launching rockets, launching satellites to look at the earth. And I'm thinking Skybox, Planet Labs, EarthCast, Black Sky, and they all had plans to build massive constellations. But nobody was thinking about what to do with the imagery when it comes down to the earth. And if you do some back of the envelope math, you realize that if you have imagery of the land surface of the earth, and you went to look at it all every day, and see all the cars and all the trucks and all the ships and all the planes, all the roads and all the buildings, you would need 8 million people doing nothing all day, every day, but staring at satellite imagery, if you had that 8 million people, and you had them well organized, you could figure out the world's economy, right? What's going on, where the economy is going up, going down, and who's trading with whom, and who's farming well, and who's mining right. But but that's a lot of people, nobody's ever gonna dedicate 8 million people to looking at satellite imagery all day every day. So the answer it seemed to me was to use AI for this, especially with the tremendous advances we're seeing in computer vision, and set up the computer vision so that it can process the imagery, and count the cars and the trucks and the ships and the planes.
That's a fantastic discovery there, and I'm really excited about your venture. What I'd like to know is, how does your company build models from this satellite imagery and other complimentary data?
Sure, that's actually relatively straightforward. If we have a type of object we want to find, let's say we want to identify trucks, we simply start with a tremendous amount of imagery. And we do what we do labeling campaigns, you know, it's really very classic supervised labeling from a CV point of view. Now, it's a little bit different from other kinds of CV because we can do rotations. And we can do other kinds of transformations that don't make sense on terrestrial imagery. Because you know, you've never seen see an upside down truck on a road. But from a satellite point of view, you can see the trucks in any orientation. So we do different kinds of perturbations on the imagery. But other than that just traditional labeling campaign, we build large datasets, large labelled data sets, we try to get different lighting conditions, different parts of the world, different kinds of roads, different size trucks, put it all together into a large training and test set. And then beyond that, it's a pretty straightforward, you know, training of a compositional neural network.
Fantastic. And so what the supervised learning models are doing is taking those labels and learning exactly which of the images and what features of those images are predicting those labels. One of my favorite applications that I've read about for your company, Orbital Insights, is increasing the transparency of business supply chains. Could you tell us a little bit about what your inspiration was for this? And was there a specific problem you were trying to solve?
Sure. There's a bunch of great problems in supply chain, but one of I think the most profound ones is around sustainability. And there are actually several problems that you have to solve. One of them is, what does deforestation mean? Because often supply chain sustainability is about all you causing deforestation. Now, if you are managing a plantation and every 10 or 20 years, you cut down the trees, that's not actually deforestation. It may be ugly, it may be unfortunate, but it's not deforestation. Deforestation is when you have a virgin rainforest. It's been there for 1000 years, you cut it down. So we actually trained the deep learning algorithms using the same kind of labeling approach to learn the difference between a managed forest and a virgin rainforest so that we could then look at incidences of deforestation, and then go back a couple years and see whether the thing that got cut down was a was a virgin rain forest or a managed forest. So that's a part of sustainability that's that's really nice.
Excellent. So why is it important for you to be able to locate deforestation from a supply chain perspective, what do you do with that information?
So in order to say, let's say that you are a big company like like Unilever or Bungie and you're buying a tremendous amount of palm oil, it goes into the things you make. And you want to be able to put a sticker on your goods in the store shelf that says "Sustainably farmed." What that really means is that the process of building that product didn't cause environmental harm, didn't cause serious environmental harm. And one of the major kinds of serious environmental harm you worry about, for chocolate for, for palm oil and for other products is deforestation. So you want to look at the places where that product is farmed, and make sure that those places weren't virgin rainforests, anytime in the last X number of years where x is defined by your definition of sustainability.
Thank you so much for that. Let me see if I understand. Then translating that into the machine learning models. Does that mean that for any one of these farms, you're creating models that predict the likelihood that the raw materials were sourced from a good location, versus a location that may have committed some sort of deforestation?
Yeah. And then the other part of this, of course, is knowing where the goods came from. So we know where the Unilever factories are. But then we have to figure out where the trucks are coming from that come into those factories and trace them back to the farms. And we actually use anonymized cell phone data for that. So we get, we get very large amounts of anonymized cell phone pings. And we don't know whose phone it is. But we can tell it's the same phone that pings this is pinging that within the course of the day. So we can say this, these trucks all came from this plant, and they all went to this farm. So this is one of the farms that supplying that mill. And we do that, you know, thousands and thousands, and thousands of times, we actually figure out the empirical structure of the supply chain. Right. So now you know where the stuff is coming from. And you look at those places, you go back in time, and look to see whether or not they were deforested sometime in the last few years. And that gives you a picture of the sustainability of that plant.
I see. So you've traced the supply chain, using machine learning models, and knowing the routes of both the source and the destination to tackle this very important problem for our planet and for business. What's one specific challenge you faced along the way?
In terms of the sustainability? Yeah, so one of the interesting problems that we ran into that I mentioned earlier is that really the hardest thing for the customer is tracking all the way back to the farms, because there are literally millions of farms. Many of these palm oil plantations are mom and pop operations. And so getting the details of that supply chain is quite hard. Unlike Monte when you and I used to work on supply chain optimization, those were supply chains that were compared to that relatively small and short and well understood, right? These are in Indonesia, they're hugely broad. So getting enough data, enough cell phone pings to really elucidate that supply chain has been a real challenge. So we've been looking at multiple providers for that data, we've been looking at having the drivers that are working in the in the supply chain for our customer actually having them install custom apps that just ping us under the contract where we only use those pings to establish supply chain structure. Right. So just getting a data set together that's rich enough to really understand that very complicated supply chain has been a really interesting challenge.
Fantastic. Maybe one more question: is there a specific challenge that you've had in building a business on machine learning, and you know, not just one particular use case, like the deforestation and sustainability use case we're talking about, but you're taking this very fast moving science that has really just emerged into the business world, and you're building a whole venture up on this--what's been a big challenge there?
I think the biggest challenge has been balancing the the need for R&D against the need for business certainty. So if you go into a customer and the customer says, you know, we want to tr