2 - Krishna Rao, Stanford: Can machine learning help predict forest fires?
Krishna Rao, an Earth scientist PhD student at Stanford University, came in to explain the factors that influence wildfire risk, and talk about a tool he created using machine learning that helps predict where wildfires might happen. You can test out Krishna's fuel moisture map here, read the academic paper he wrote, or his summary in Toward Data Science.
Read on for the transcript!
Hi, I’m Monte Zweben, CEO of Splice Machine. You’re listening to ML Minutes, where we solve big problems in little time. Every two weeks, we invite a different thought leader on the show to talk about a problem they’re solving with Machine Learning, with an exciting twist: our guest has only one minute to answer each question we throw at them. Let’s get started!
This episode, our guest is Krishna Rao, an Earth scientist currently pursuing his PhD at Stanford University. In his research, Krishna develops technologies that measure forest health using remote sensing and machine learning. Outside of work, Krishna is an avid bike tourer and triathlete.
Welcome Krishna, so happy to have you here.
0:54 Krishna Rao
Happy to be here, and thank you for inviting me.
Recording today in smoky California, it seems like forest fires are a necessary part of life on the west coast. Unfortunately, however, it hasn't always been this way. Could you give our listeners a quick overview of why these disastrous forest fires are happening, in one minute or less?
Sure. The general tendency of us is that we think about fires as something negative: fires kill humans, fires are disastrous because they burned down our houses. But we need to keep in mind that fires are an entirely natural process. Fires were here even before humans were here, fires were here even before the Europeans colonized the United States, when Native Americans were managing the land. So it's a completely normal process, I want to tell you that. Over the last 100 years, we have had a tendency to suppress and kill every fire as soon as it begins, which has led to a tremendous increase in fuel loads. By fuel, I mean, the substance that catches fire has increased tremendously in density. So currently, there are about-- estimates range from two to four times the amount of fuel that are there in the western United States and California. And that's one of the drivers that we have seen, which has increased forest fires.
Excellent. So if this fuel is what's increasing the likelihood of forest fires, how is that affecting the world in general?
If only the fuels were increasing, it would have been manageable. But the problem is that there are a few other variables that are also increasing, two of them being the hot and dry climate that we are experiencing in California, because of global warming and climate change. And something called the wildland-urban interface. So that is the territory that humans develop into while taking over natural forested landscapes. So as urban environments increase in size, they encroach upon forest land. And whenever there are these boundaries close to natural forest lands as well as semi-urban landscapes, there is a tremendous increase in fire risk over there, because humans are close to those regions which are catching fire.
Excellent. So what I take away from that is that it really is a combination of factors that are increasing the likelihood of fires. Why don't you talk a little more about what you're doing to help solve the problem? How did you build the neural network that predicts fire threats?
Fires thrive on four key ingredients: hot weather, dry weather, dry fuels, and some kind of ignition source. So when we have all of these four combined, there is going to be a recipe for disaster. We have become substantially good at modeling precipitation and temperature and humidity, so those kind of meteorological variables have already been factored in. But we have a very, very limited understanding of fuel dryness: so how dry or wet the actual plant components are, the branches and leaves, are very limited. So my research is geared towards developing data-driven solutions for mapping forest dryness, which is a key indicator of wildfire risk.
So it sounds like what you see is a really important part of your research. What signals are you using for your observations?
Two fundamental observations go into measuring the dryness of forests. One is the color of the trees. This is something that everyone is familiar with; if you open Google Earth, we can see the different red, green, blue colors of the trees or even your houses. The second key input that we use is also a satellite-driven input, is something called microwave backscatter. Now, microwaves are something that we all experience in our daily lives; if you wake up in the morning and go to your kitchen, you heat up something in the microwave. That is because the microwaves are very sensitive to the water in a medium. That's why basically it heats up. We use the same principle but in the real world by shooting microwaves at trees from space, and we analyze the amount of microwave that is scattered by the tree, principle being the amount of scattering is directly proportional to the amount of water inside the tree.
Excellent. So, the observations that you're using as features include both color, as well as the microwave backscatter. And this is all coming from space.
5: 48 Krishna
That's fascinating. And I'm wondering, given that that's your feature set, mostly, what approach are you taking to machine learning and what tools are you using to do your predictions?
Even though forests dryness is related to the color and the amount of microwave backscatter, well, it's not quite as simple as that. There are a lot of different latent processes, which affect the amount of color or microwave backscatter relationship to the forest dryness. So, in our machine learning method, we use a physics-based deep learning model to estimate forest dryness, in which we use a specific way of feature engineering as well as a specific loss function, which has some kind of physics information built into it; it is not a completely empirical model.
That's very interesting, because so much of deep learning are based on generic loss functions that everyone uses. You've derived a loss function that incorporates some physical, some physics theories directly in it. That's really the blending of computer science with earth science. Fantastic. What particular problems have you faced on the project?
Well, *laughs* there have been many problems while modeling forest dryness. But the first biggest challenge I faced is the data manipulation. Everyone who has worked with data knows how hard it is to come across clean data. So there was a monumental task to clean up, filter out and obtain all the data in the rows and columns that we all love. The second key challenge that we faced is to balance out the different errors that different users might care about. So when we map forest dryness across space and time, there are going to be errors in space as well as errors in time. So the way we used our training optimization, we had to balance out these two errors.
I understand. I think the first problem you mentioned is a ubiquitous problem for all the machine learning research projects around the world: data manipulation, feature engineering, data prep, data filtering, data quality, sometimes can be 80% of the work, and I appreciate that you had to do that. And what did you do to overcome those specific spatial and temporal errors? How did you account for that?
To account for the fact that the errors in space can be different from the errors in time, and that different users might care about each of them, we used a recurrent neural network, but with a twist: in standard recurrent neural networks, we have some time series of information acting as inputs relating to a time series of outputs. But then we had some variables which were static in time, they were only varying in space related to the biogeographical characteristics, like the amount of soil texture in a place, the topographic variables which have generally been static in time. So we tune our model architecture in a way that the time series of inputs go into all the hidden nodes, but then we use the same static variables at each time step of the hidden node to let the model train to not just the temporally varying information, but also the static information in space. This is the way we tune the spatial as well as temporal errors.
Very good, I think I understand. You were able to take your static variables that did not change over time and feed them into the units with the temporal information, so that each of the temporal changing information could consider the static variables as well. What other applications do you want to pursue for machine learning and earth science?
The next problem that I am going to be working on is understanding the effect of fuel dryness on wildfires. So we already mentioned that fuel dryness affects wildfires, but we don't really have a sense of how much they affect. And add to that the varying climate conditions because of global warming, and we almost have no idea how this wildfire danger and fuel dryness relationship would change in the future. So my future research would be geared towards, again, from a data-driven perspective, understand how wildfire risk would change based on the changing fuel conditions and changing climate conditions.
Excellent. So, it's a natural extension of your current research to take some of the predictions about moisture and biomass and combine that with how the wildfire likelihood as well as climate change is impacting the overarching threat out there. Fantastic.
Well, it's been really fun. Krishna, thank you so much for coming in.
It was my pleasure. Thank you for having me.
If you want to hear Krishna’s thoughts on GPT-3 and bias in machine learning, you can check out our bonus minutes. They're linked in the show notes below, and on our website, mlminutes.com. Next episode, we'll be discussing the role multi-agent systems play in finance and robotics with Dr. Manuela Veloso. To stay up-to-date on our upcoming guests and giveaways, you can follow our Twitter and Instagram, @MLMinutes. Our intro music is Funkin' It by the Jazzual Suspects, and our outro music is Last Call by Shiny Objects, both on the Om Records label; ML Minutes is produced and edited by Morgan Sweeney. I’m your host, Monte Zweben, and this was an ML Minute.