Agriculture plays a critical role in the United States economy, and the Corn Belt region is at the heart of this industry. Accurate and timely predictions of crop yield are essential for farmers, agribusinesses, and policymakers to make informed decisions that optimize resources, minimize losses, and ensure food security. In our last case study, we examined the historical Normalized Difference Vegetation Index (NDVI) patterns for the top four Corn Belt states in the US. Building on that research, this case study aims to explore the potential of NDVI-based metrics as predictors of corn crop yield across the entire Corn Belt region.
- To what extent do simple NDVI-based metrics (max, mean, cumulative NDVI across growing season) correlate with corn crop yield variations across the Corn Belt states?
- How do the relationships between these metrics and corn crop yields differ across the various Corn Belt states and sub-regions?
Extracting Locations from United States CDL
We once again used the United States Cropland Data Layer (CDL) to extract information about corn farm locations. For this analysis, however, we extracted a larger number (33,000) of corn farm locations from the GeoTIFF map.
Visualize corn farm locations
As always, we did a sanity check by plotting the locations to be used in the query:
Querying Data from Streambatch
We then used the corn farm locations to query the Streambatch API for Sentinel-2 NDVI data.
First, we filtered the dataset to only include data for the growing season months (April - November). We then calculated the NDVI-based metrics of interest (max NDVI, mean NDVI, and cumulative NDVI across the growing season) for each farm location:
Next, we grabbed the county-level boundary information from the Agricultural Census shapefile using it to tag the corn farm locations.
We then merged the dataset with annual, county-level yield data that we sourced from the USDA NASS Quickstats data portal for all Corn Belt states. We also calculated the average NDVI metrics and renamed any columns that will be used later for visualization purposes.
Visualizing Overall Relationships
Next we visualized the relationships with scatterplots for each NDVI metric (max, mean, cumulative) against corn yield at the county level to visually explore their relationships.
Next, we calculated the Spearman correlation coefficient and p-value for each NDVI metric and county-level corn yield, allowing us to quantify the strength and statistical significance of the relationships.
Variability Between Corn Belt States
Given the predictive ability of max growing season NDVI for yield, we next wanted to determine the variability of this key metric between states. We started by visualizing the relationship between max NDVI and yield for all states individually. We kept the max NDVI vs Yield plot from above in the first grid square for comparison.
Next, we calculated the correlation for each state individually, creating a new dataframe and sorting it by correlation value.
Finally, we plotted the distributions of max NDVI for each state individually.
- Max NDVI during the growing season was found to be the best predictor (correlation = 0.63) of crop yield for corn farms across all corn belt states. This result is consistent with previous studies (e.g., Johnson et al, 2021; Roznik et al, 2022), which demonstrated the strong relationship between max NDVI values and crop yield for corn. The max NDVI typically represents the peak of crop growth, capturing the highest level of crop biomass and photosynthetic activity during the growing season.
- Mean NDVI (correlation = 0.31) and cumulative NDVI (correlation = 0.18) during the growing season were also found to correlate with overall yield, albeit to a lesser extent. These metrics capture the average and total vegetation activity throughout the growing season, respectively. According to existing research (e.g., Lobell et al., 2003; Sakamoto et al., 2005), these metrics can be indicative of the general health and productivity of crops but may not be as strongly associated with crop yield as max NDVI due to their sensitivity to transient weather events and other environmental factors.
- We found considerable variation in the predictive power of NDVI for different Corn Belt states, with North Dakota (0.73), South Dakota (0.69), and Nebraska (0.64) exhibiting the highest correlation values. Additionally, we observed notable variability in max NDVI distributions among these states, particularly in Kansas, North Dakota, and South Dakota. This variability may enhance the overall predictive strength of max NDVI for yield. In contrast, states like Illinois and Iowa displayed much lower variability in their max NDVI during the growing season. This observation is consistent with previous research, indicating that regions with more uniform agricultural landscapes and stable growing conditions tend to exhibit less variability in NDVI values (Wardlow et al., 2007). Interestingly, there appears to be some overlap between the states with low variability in their max NDVI distributions and the top states overall by production, suggesting a potential connection between consistent growing conditions and high agricultural productivity.
In conclusion, our analysis supports existing literature by demonstrating the strong predictive power of max NDVI during the growing season for corn crop yields across the corn belt states. While mean and cumulative NDVI metrics also showed correlations with overall yield, their predictive capabilities were less robust compared to max NDVI. Additionally, states with lower variability in their max NDVI distributions, such as Illinois and Iowa, are also among the top corn-producing states. By better understanding the relationship between NDVI metrics and crop yield, stakeholders in the agricultural industry can make more informed decisions and optimize resource use, ultimately leading to increased productivity and sustainability.