Method

A 6 x 6 km region of interest (ROI) was selected within each floodplain area to reasonably manage large data volumes. In summary, remote sensing imagery, including airborne LiDAR, Sentinel-1 and Sentinel-2, was obtained to develop a method to derive FTCC. LiDAR data was used to provide a three-dimensional representation of the ROI’s and derive high-resolution FTCC data. Representing in-situ data, the LiDAR derived FTCC was used to calibrate a Random forest model based on Sentinel-1 and -2. As the Sentinel dataset is open-access, freely accessible and available globally, the technique can be implemented over regional or continental scales if training data (direct in-situFTCC measurement, or as here, a LiDAR surrogate) are available.

LiDAR data

LiDAR (Light Detection And Ranging) remote sensing uses pulsed light waves from an airborne laser to measure distance of Earth objects from an aircraft via reflectance of light (Dubayah and Drake, 2000). The returned wavelengths and time combined, allow three dimensional representations of the reflected surface to be constructed. When LiDAR is collected over natural environments, 3-D reconstruction of canopy structure provides fine resolution field representation of the study location. Airborne LiDAR data for each ROI was obtained from ELVIS (https://elevation.fsdf.org.au/), a spatial data portal. Each tile covered 2 km × 2 km. Acquisition date was September 2009 and 2015 for Yanga and Barmah, respectively, and was performed with two different LiDAR sensors (Leica ALS50-II -Yanga; Trimble AX60 - Barmah).
The Yanga dataset was collected 0.50 km above the earth surface with a swath width of 1.6 km and swath overlap of 20%. Similarly, the Barmah dataset was measured at a height of 0.85 km with a swath width of 1 km and swath overlap of 30%. Sensors recorded an average point spacing of around 4.0 and 4.4 points per m2 for Yanga and Barmah, respectively.

Retrieving vegetation height and fractional tree canopy cover from LiDAR data

Tree structural information for both ROIs was retrieved from LiDAR tile data. Each tile includes a dense collection of ‘points’ based on reflectance time and georeferencing information, such as x and y coordinates, point heights and point return ‘types’ (related to time each point returns to the sensor and height of the object). FUSION software (http://forsys.cfr.washington.edu/fusion.html) was implemented to partition a digital surface model and digital terrain model from the raw LiDAR data (Boehm et al. , 2013). The digital surface model was applied to approximate elevation of each grid cell. The digital terrain model was used to estimate elevation of the ground surface. A canopy height model (Koukoulas and Blackburn, 2005) was created by subtracting the digital terrain model from the digital surface model at 1 m spatial resolution. The canopy height map was then converted from point clouds to pixels. A FTCC product was derived from the canopy height model using all LiDAR points reflected from 2 m above the ground surface (referred to as LiDAR FTCC). As the objective of the study was to map tree canopy cover, smaller shrubs and bushes were excluded (Equation 1). R package ForestTools was applied to identify dominant treetops and tree crown radius from the canopy height model. A moving window was created to scan the canopy height model and tag treetops that depended on the highest point in the window. The ‘watershed’ method was implemented to outline tree crowns (Beucher and Meyer, 1993). Finally, from the canopy height model, tree number was counted as well as the tree height and crown radius.
\(FTCC=\frac{numbers\ of\ pixels\ (height>2m)}{\text{total\ pixel\ numbers\ at\ given\ area}}\)(Equation 1.)

Sentinel-1 data

Sentinel-1A and 1B satellites carry C-band Synthetic Aperture Radar (SAR) sensors. They are part of the European Space Agency’s Copernicus mission, and were launched in 2014 (Sentinel-1A) and 2016 (Sentinel-1B). They are the first globally acquiring SAR sensors, providing dual-polarized (VV and VH) C-band SAR images with a 12-day repeat path frequency. Over land, Interferometric Wide imaging mode is the default automatic imaging mode, with a nominal sensing resolution of 20 (Azimuth) by 5 m (Range).
A Sentinel-1 Ground Range Detected image acquired in May 2016 was obtained from Sentinel Australia Regional Access (SASA; https://copernicus.nci.org.au/). Processing was performed with the Sentinel Application Platform (SNAP) and included updating the orbital metadata, thermal noise removal, border noise removal, calibration, range doppler terrain correction and conversion to decibel (Filipponi, 2019). VV and VH bands were converted to Sigma Nought backscattering coefficients, which includes a compensation for Line-Of-Sight variations in Range.

Sentinel-2 data

The Sentinel-2 satellites consist of two satellites, launched in 2015 (Sentinel-2A) and 2017 (Sentinel-2B), respectively. Each carry multispectral sensors with 13 spectral bands recording visible, near-infrared and short-wave infrared regions of the electro-magnetic wave spectrum. The revisit time of Sentinel-2 is 10 days.
Sentinel-2 Level 1C (L1C) top-of-atmosphere data with less than 10% cloud cover, collected in May 2016, was downloaded from SASA. The original tile (100 km × 100 km) was cropped to the ROIs. Sen2cor was applied to obtain bottom of atmosphere reflectance, converting data from L1C to atmospherically corrected L2A (Main-Knorn et al. , 2015). Ten bands were selected and these represent vegetation functional and structural information (Verrelst et al. , 2012). Bands include B2-B8, B8a, and B11-B12 from Sentinel-2. All bands where relevant, were resampled to 20 m x 20 m (Table 1) .

Random forest regression analysis

Random forest regression, proposed by Breiman (2001), is an assembling machine learning algorithm that can be applied to high-dimensional spatial dataset analysis. Random forest starts with a random selection of subset data from a training dataset, then creates decision trees for each sample. A ‘voting’ method is then implemented for the prediction of each decision tree. The most voted prediction is selected as the final result among all individual decision trees (Gislason et al. , 2006).
Random forest regression was employed to determine the relationship between LiDAR FTCC and Sentinel-1 and Sentinel-2 bands. Before applying the Random forest regression, Sentinel-1 and 2 bands and the canopy height model were resampled to the same spatial resolution of 20 m. VV and VH bands from Sentinel-1 were resampled to 20 m based on Sentinel-2 image resolution using bilinear interpolation. In order to retrieve FTCC from the canopy height model at Sentinel-2 spatial resolution, a 20 m fishnet grid was created. FTCC was calculated based on equation 1 from the canopy height model for each fishnet grid. With resampling of Sentinel-1, Sentinel-2 and LiDAR FTCC, 733,800 pixels for both Yanga and Barmah ROIs were created for Random forest training and validation.
Three models were created using ‘randomForest’ (R package) which included single models trained and predicated for both the Yanga and Barmah ROIs (RFYanga and RFBarmah where RF is Random forest) and a model that combined data from both ROIs (RFall). For each model, the dataset was split into 70% training and 30% validation by random sampling. A ten-fold cross-validation was implemented to keep the best performance of each Random forest model.

Statistical analysis

Root Mean Square Error (RMSE) was applied to analyse the performance of the Random forest predictor model. The RMSE is defined as;
\(RMSE=\ \frac{\sqrt{\sum_{i=1}^{n}{(x_{ret,i}-\ x_{pre,i})}^{2}}}{N}\)(Equation 2.)
The \(x_{ret,i}\) and \(x_{pre,i}\) are LiDAR FTCC and FTCC predicted by the Random forest model, respectively. N is the number of pixels used for prediction. The coefficient of determination was applied to check the relationship between LiDAR FTCC and predicted FTCC for the ROIs. Hence, higher R2 indicates the regression model fits the LiDAR FTCC, and lower RMSE indicates better predictions of the Random forest models. Data processing, statistical analysis and visualisation were conducted in R scientific computation environment (R core team version 3.6) and associated packages obtained from the comprehensive R archive network (http://cran.r-project.orj).