Traditional neural networks only use a single time as the input for
prediction, which can be a shortcoming for complex dynamical systems
like the atmosphere. Thus, we used a specific recurrent neural network
(RNN) called a Long Short Term Memory1,2 (LSTM)
network, where the ability to retain trends and past behavior is
inherent in the structure of the memory module. The simplest way to
understand the LSTM process is to walk the data through a single LSTM
cell. The first operation is a forget gate shown in figure S1 as theσ neural operator in the lower left of the LSTM cell. The
forget gate uses a sigmoid operator to assess how much of the cell state
is forgotten. The second stage of the LSTM is a combination ofσ and tanh neural operators where the new data at timet , Xt , is evaluated with the output of the
previos cell, ht-1 , for updating the cell state.
The combination of both neural operators allow for the updating of
memory for new conditions, such as a change in the symmetry of the
eyewall or crossing into a cooler ocean, without a complete loss of
information about the prior storm state. The forgetting gate and update
are then combined in order to produce a new candidate state, shown in
the top row of figure S1. The final σ gate then decides whether
the candidate state is accepted over the previous cell state, which
allows for enough wiggle in the LSTM to minimize the chance of sticking
in a local minima.
Figure S1: An LSTM cell in chain, whereht’ is maximum wind speed from COAMPS model at
timestep (t’) and is the input at the bottom of each new cell, t’ is
time at prediction, and Xt is vector of
“features” or variables of interest that we hope to predict the
behavior of ht . Inside the LSTM gate is the
process of updating the predictive values (the top through arrow) based
on the neural layers, sigmoid (σ) and tanh for forget and cell-state
memory respectively.
LSTM model was developed using the python module
Tensorflow3 and Keras4. The open
source nature of the python toolkit provides for building a functional
statistical model with relative ease. For our build the time that was
used in the memory was 6 hours and the forecast prediction
(ht’ ) of maximum winds was 24 hours in the
future. In the language of tensorflow and with hourly data output, this
translated to a “lookback” of 6 and a “delay” of 24 hours in code.