Traditional neural networks only use a single time as the input for prediction, which can be a shortcoming for complex dynamical systems like the atmosphere. Thus, we used a specific recurrent neural network (RNN) called a Long Short Term Memory1,2 (LSTM) network, where the ability to retain trends and past behavior is inherent in the structure of the memory module. The simplest way to understand the LSTM process is to walk the data through a single LSTM cell. The first operation is a forget gate shown in figure S1 as theσ neural operator in the lower left of the LSTM cell. The forget gate uses a sigmoid operator to assess how much of the cell state is forgotten. The second stage of the LSTM is a combination ofσ and tanh neural operators where the new data at timet , Xt , is evaluated with the output of the previos cell, ht-1 , for updating the cell state. The combination of both neural operators allow for the updating of memory for new conditions, such as a change in the symmetry of the eyewall or crossing into a cooler ocean, without a complete loss of information about the prior storm state. The forgetting gate and update are then combined in order to produce a new candidate state, shown in the top row of figure S1. The final σ gate then decides whether the candidate state is accepted over the previous cell state, which allows for enough wiggle in the LSTM to minimize the chance of sticking in a local minima.
Figure S1: An LSTM cell in chain, whereht’ is maximum wind speed from COAMPS model at timestep (t’) and is the input at the bottom of each new cell, t’ is time at prediction, and Xt is vector of “features” or variables of interest that we hope to predict the behavior of ht . Inside the LSTM gate is the process of updating the predictive values (the top through arrow) based on the neural layers, sigmoid (σ) and tanh for forget and cell-state memory respectively.
LSTM model was developed using the python module Tensorflow3 and Keras4. The open source nature of the python toolkit provides for building a functional statistical model with relative ease. For our build the time that was used in the memory was 6 hours and the forecast prediction (ht’ ) of maximum winds was 24 hours in the future. In the language of tensorflow and with hourly data output, this translated to a “lookback” of 6 and a “delay” of 24 hours in code.