Cropland-mapping extent and time intervals

The global boundaries for the cropland mapping were informed by the US Geological Survey (USGS) Global Food Security-Support Analysis Data at 30m (GFSAD)11. The cropland mapping extent was defined using the geographic 11 grid. We included every 11 grid cell that contains cropland area according to the GFSAD. Small islands were excluded due to the absence of Landsat geometrically corrected data (Supplementary Fig. 1).

The cropland mapping was performed at 4-year intervals (20002003, 20042007, 20082011, 20122015 and 20162019). Use of a long interval (rather than a single year) increased the number of clear-sky satellite observations in the time-series, which improves representation of land-surface phenology and the accuracy of cropland detection. For each 4-year interval, we mapped an area as cropland if a growing crop was detected during any of these years. In this way, we implemented the criterion of the maximum fallow length: if an area was not used as cropland for >4 years, it was not included in the cropland map for the corresponding time interval.

We employed the global 16-day normalized surface reflectance Landsat Analysis Ready Data (Landsat ARD19) as input data for cropland mapping. The Landsat ARD were generated from the entire Landsat archive from 1997 to 2019. The Landsat top-of-atmosphere reflectance was normalized using globally consistent MODIS surface reflectance as a normalization target. Individual Landsat images were aggregated into 16-day composites by prioritizing clear-sky observations.

For each 4-year interval, we created a single annualized gap-free 16-day observation time-series. For each 16-day interval, we selected the observation with the highest near-infrared reflectance value (to prioritize observations with the highest vegetation cover) from 4years of Landsat data. Observations contaminated by haze, clouds and cloud shadows, as indicated by the Landsat ARD quality layer, were removed from the analysis. If no clear-sky data were available for a 16-day interval, we filled the missing reflectance values using linear interpolation.

The annualized, 16-day time-series within each 4-year interval were transformed into a set of multitemporal metrics that provide consistent land-surface phenology inputs for global cropland mapping. Metrics include selected ranks, inter-rank averages and amplitudes of surface reflectance and vegetation index values, and surface reflectance averages for selected land-surface phenology stages defined by vegetation indices (that is, surface reflectance for the maximum and minimum greenness periods). The multitemporal metrics methodology is provided in detail19,38. The Landsat metrics were augmented with elevation data39. In this way, we created spatially consistent inputs for each of the 4-year intervals. The complete list of input metrics is presented in Supplementary Table 1.

Global cropland mapping included three stages that enabled extrapolation of visually delineated cropland training data to a temporally consistent, global cropland map time-series using machine learning. At all three stages, we employed bagged decision tree ensembles40 as a supervised classification algorithm that used class presence and absence data as the dependent variables, and a set of multitemporal metrics as independent variables at a Landsat ARD pixel scale. The bagged decision tree results in a per-pixel cropland probability layer, which has a threshold of 0.5 to obtain a cropland map.

The first stage consisted of performing individual cropland classifications for a set of 924 Landsat ARD 11 tiles for the 20162019 interval (Supplementary Fig. 1). The tiles were chosen to represent diverse global agriculture landscapes. Classification training data (cropland class presence and absence) were manually selected through visual interpretation of Landsat metric composites and high-resolution data from Google Earth. An individual supervised classification model (bagged decision trees) was calibrated and applied to each tile.

At the second stage, we used the 924 tiles that had been classified as cropland/other land and the 20162019 metric set to train a series of regional cropland mapping models. The classification was iterated by adding training tiles and assessing the results until the resulting map was satisfactory. We then applied the regional models to each of the preceding 4-year intervals, thus creating a preliminary time-series of global cropland maps.

At the third stage, we used the preliminary global cropland maps as training data to generate temporally consistent global cropland data. As the regional models applied at the second stage were calibrated using 20162019 data alone, classification errors may arise due to Landsat data inconsistencies before 2016. The goal of this third stage was to create a robust spatiotemporally consistent set of locally calibrated cropland detection models. For each 11 Landsat ARD tile (13,451 tiles total), we collected training data for each 4-year interval from the preliminary cropland extent maps within a 3 radius of the target tile, with preference to select stable cropland and non-cropland pixels as training. Training data from all intervals were used to calibrate a single decision tree ensemble for each ARD tile. The per-tile models were then applied to each time interval, and the results were post-processed to remove single cropland class detections and omissions within time-series and eliminate cropland patches <0.5ha. Manual masks to remove map artefacts (for example, cropland overestimation over temperate wetlands and flooded grasslands) were applied in some regions to improve the map quality. The final global cropland map time-series are available at https://glad.umd.edu/dataset/croplands.

The sample analysis had two objectives: to estimate cropland area and its associated uncertainty and to assess cropland map accuracy. Sample interpretation and sample-based analysis were done only for the start (2003) and the end (2019) of the cropland-mapping interval. Accuracies of intermediate cropland maps (2007, 2011 and 2015) were not assessed, but were considered to be similar to those of the 2003 and 2019 maps due to implementation of the same classification model and consistently processed Landsat data41. The analysis was performed separately for each of the seven regions outlined in Extended Data Fig. 1, as well as globally. The regional boundaries were aligned with national boundaries to enable comparison with national data. Only land pixels were considered; pixels labelled as permanent water and snow/ice in the Landsat ARD data quality layer were excluded. In each region, we selected five strata based on the map time-series corresponding to stable croplands, cropland gain and loss, possible cropland omission area and other lands (Supplementary Tables 2 and 3). The possible cropland omission stratum (stratum 4) includes areas where omission errors are probable, specifically pixels that were not mapped as cropland and either (1) were identified as crops by the GFSAD11 or (2) had the decision tree-based cropland probability between 0.1 and 0.5. We randomly selected 100 sample units (Landsat data pixels) from each stratum (500 samples pixels per region, 3,500 in total).

Sample interpretation was performed visually using available remotely sensed data time-series, including Landsat ARD 16-day data, composites of selected multitemporal metrics and high-resolution images provided by Google Earth (Supplementary Fig. 2). Each sample pixel was interpreted by two experts independently and the disagreements were discussed and resolved by the research team. The interpretation legend includes the 20032019 cropland dynamics categories and land-use transition types. The sample reference data and interpretation results are available at https://glad.umd.edu/dataset/croplands.

The sample-based area estimation was performed following previously published methods42,43. The 2003 and 2019 total cropland area, stable crops, gross cropland loss and gain, and net change were estimated within each region separately, and for the entire world using equation (1). The area and the total number of Landsat pixels for each region and each stratum are provided in Supplementary Table 3. For each of the 100 sample pixels sampled in each stratum, pu was defined by class presence, for example, for 2003 cropland, pu=0 (2003 cropland absence) or pu=1 (2003 cropland presence). The pu was defined similarly for the 2019 cropland, stable crop, gross cropland loss and gain classes. For the net cropland area change, pu had values of 1 (cropland gain), 1 (cropland loss) and 0 (no change).

$$hat A = mathop {sum}nolimits_{h = 1}^H {A_hbar p_h}$$

(1)

where (hat A) is the estimated cropland/cropland change area,

Ah the area of stratum h,

H the number of sampling strata,

(bar p_h = frac{{mathop {sum }nolimits_{u in h} p_u}}{{n_h}})the mean cropland/cropland change proportion of samples in stratum h; and

nh the sample size (number of sample pixels) in stratum h.

The s.e.m. of the area was estimated from the variances of cropland (or cropland dynamics category) class values of pu for sample pixels in each stratum using equation (2). The 95% CI was obtained by multiplying s.e.m. by 1.96:

$${mathrm{s.e.}}left( {hat A} right) = sqrt {mathop {sum }limits_{h = 1}^H A_h^2left( {1 - frac{{n_h}}{{N_h}}} right)frac{{s_{ph}^2}}{{n_h}}}$$

(2)

where ({mathrm{s.e.}}left( {hat A} right)) is the s.e.m. of cropland/cropland change class area and

(s_{ph}^2 = frac{{mathop {sum }nolimits_{u in h} left( {p_u - bar p_h} right)^2}}{{n_h - 1}}) the sample variance for stratum h.

We analysed the land-use trajectories of cropland loss and gain using reference sample data within cropland gain and loss strata only. Inclusion of sample pixels from other strata where cropland change was detected would have inflated the area of land-use trajectories that these pixels represent (that is, if a sample pixel from a stable cropland stratum was interpreted as cropland gain due to forest clearing, including the proportion of forest clearing from this large stratum, it will dominate the total regional estimate). The proportion of each land-use trajectory (within cropland gain and loss separately) was estimated from the sample and reported as the percentage of the total gain or loss along with its s.e.m. (Table 2). A combined ratio estimator for stratified random sampling43 was employed to estimate the percentages (equation (3)).

$$hat R = frac{{mathop {sum }nolimits_{h = 1}^H A_hbar y_h}}{{mathop {sum }nolimits_{h = 1}^H A_hbar x_h}} times 100$$

(3)

where: (hat R) is the estimated class proportion expressed as a percentage;

H the number of sampling strata;

Ah the area of stratum h;

(bar y_h = frac{{mathop {sum}nolimits_{u in h} {y_u} }}{{n_h}}) the sample mean of the yu values in stratum h, where yu=1 if pixel u is classified as belonging to a specific transition in the reference sample interpretation, and yu=0 otherwise; and

(bar x_h = frac{{mathop {sum}nolimits_{u in h} {x_u} }}{{n_h}}) the sample mean of the xu values in stratum h, where xu=1 if pixel u is classified as any cropland loss/gain in the reference sample interpretation, and xu=0 otherwise.

The s.e.m. of the estimated ratio of class proportion expressed as percentage was calculated using equation (4):

$${mathrm{s.e.}}left( {hat R} right) = sqrt {frac{1}{{hat X^2}}mathop {sum }limits_{h = 1}^H A_h^2left( {1 - frac{{n_h}}{{N_h}}} right)left( {s_{yh}^2 + hat R^2s_{xh}^2 - 2hat Rs_{xyh}} right)/n_h} times 100$$

(4)

where: ({mathrm{s.e.}}left( {hat R} right)) is the s.e.m. of the estimated proportion expressed as a percentage;

Nh the total number of pixels in stratum h;

nh number of sample pixels in stratum h;

(hat X = mathop {sum }limits_{h = 1}^H A_hbar x_h) the estimated total area of cropland loss/gain expressed in area units; and

(s_{yh}^2) and (s_{xh}^2) the sample variances in stratum h; and sxyh the sample covariance in stratum h estimated as follows:

$$s_{yh}^2 = mathop {sum}nolimits_{u in } {left( {y_u - bar y_h} right)^2/left( {n_h - 1} right)}$$

$$s_{xh}^2 = mathop {sum}nolimits_{u in h} {left( {x_u - bar x_h} right)^2/left( {n_h - 1} right)}$$

$$s_{xyh} = mathop {sum}nolimits_{u in h} {left( {y_u - bar y_h} right)} left( {x_u - bar x_h} right)/left( {n_h - 1} right).$$

The map accuracy metrics include overall accuracy (the proportion of correctly mapped sample pixels), users accuracy of the cropland class (which reflects the cropland class commission) and producers accuracy of the cropland class (which reflects the cropland class omission)42. All accuracy metrics and respective s.e.m.s are presented as percentages (Table 3).

To estimate overall accuracy, we defined yu=1 if pixel u is classified correctly and yu=0 if pixel u is classified incorrectly. The estimator for overall accuracy is then expressed by equation (5), and s.e.m. for overall accuracy is computed using equation (6).

$$hat O = frac{{mathop {sum }nolimits_{h = 1}^H N_hbar y_h}}{N} times 100$$

(5)

where: (hat O) is the estimated overall accuracy, expressed as a percentage; H the number of sampling strata; Nh the total number of pixels in stratum h; N the total number of pixels in the reporting region; and (bar y_h = mathop {sum }limits_{u in h} y_u/n_h) the sample mean of the yu values in stratum h.

$${mathrm{s.e.}}left( {hat O} right) = sqrt {frac{1}{{N^2}}mathop {sum}nolimits_{h = 1}^H {N_h^2left( {1 - n_h/N_h} right)s_{yh}^2/n_h} } times 100$$

(6)

where ({mathrm{s.e.}}left( {hat O} right)) is the s.e.m. of the overall accuracy, expressed as percentage; nh the number of sample pixels in stratum h; and (s_{yh}^2) the sample variance: (s_{yh}^2 = mathop {sum }limits_{u in h} left( {y_u - bar y_h} right)^2/(n_h - 1).) For estimating users accuracy of the croplands class, we defined yu=1 if sample pixel u is correctly mapped as cropland, otherwise yu=0, and xu=1 if sample pixel u is mapped cropland, otherwise xu=0. For the producers accuracy, we defined yu=1 if sample pixel u is correctly mapped as cropland, otherwise yu=0, and xu=1 if sample pixel u is interpreted as cropland, otherwise xu=0. The estimator of the users accuracy and producers accuracy was then expressed as a ratio estimator (equation (7)) and their s.e.m. calculated using equation (8), which are similar to equations (3) and (4), except that the strata were weighted by their total number of pixels (Nh) rather than the areas (Ah) for the purposes of map accuracy assessment (with pixel being the primary mapping unit):

$$hat R = frac{{mathop {sum }nolimits_{h = 1}^H N_hbar y_h}}{{mathop {sum }nolimits_{h = 1}^H N_hbar x_h}} times 100$$

(7)

where (hat R) is the estimated users/producers accuracy, expressed as a percentage.

$${mathrm{s.e.}}left( {hat R} right) = sqrt {frac{1}{{hat X^2}}mathop {sum }limits_{h = 1}^H N_h^2left( {1 - frac{{n_h}}{{N_h}}} right)left( {s_{yh}^2 + hat R^2s_{xh}^2 - 2hat Rs_{xyh}} right)/n_h} times 100$$

(8)

where ({mathrm{s.e.}}left( {hat R} right)) is the s.e.m. of the estimated users/producers accuracy, expressed as a percentage.

(hat X = mathop {sum }limits_{h = 1}^H N_hbar x_h.)

The cropland NPP was evaluated using the globally consistent Collection 6 MODIS-based, annual year-end gap-filled NPP product (MOD17A3HGF20). The product provides the sum of total daily NPP through the year at a 500-m spatial resolution (kgCm2year1). The annual NPP data were resampled to our Landsat ARD data grid and were overlaid with the corresponding 4-year cropland maps to calculate total and per-unit area NPP for each region and each year. We used average annual NPP for each 4-year interval, except for the 20002003 interval, where a 3-year average was used instead to avoid using the year 2000 when MODIS data were incomplete. The s.d. of the annual estimates is provided as an uncertainty metric.

For the national cropland area analysis, we used public geographic information systems (GIS) country boundaries from GADM (https://gadm.org).

We employed the 2019 Revision of World Population Prospects21 to calculate global, regional and national population for years 2003 and 2019. As the boundaries of analysis regions (Extended Data Fig. 1) are aligned with country boundaries, we were able to summarize the regional population totals from national data. The population data were related to our sample-based (for global and regional estimates) and map-based (for national estimates) cropland area to estimate per-capita cropland area and change. Similarly, we related regional cropland NPP to population data to estimate per-capita cropland NPP for 2003 and 2019.

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Continue reading here:
Global maps of cropland extent and change show accelerated cropland expansion in the twenty-first century - Nature.com

Related Posts
December 28, 2021 at 2:06 am by Mr HomeBuilder
Category: Land Clearing