Data and Method
In recent research, we have developed methods that we call earthquakenowcasting whose goal is to estimate the current state of hazard. A number of authors have now begun to use these methods in a variety of applications. Recent research has developed the idea of earthquake nowcasting, which uses state (“proxy”) variables to infer the current state of the earthquake cycle (Rundle et al., 2016, 2018, 2019, 2021a,b; Rundle et al., 2022; Rundle and Donnellan, 2020; Pasari and Mehta, 2018; Pasari, 2019, 2020; Pasari and Sharma, 2020; Luginbuhl et al. 2019; 2020). An approach such as this is needed since the cycle of stress accumulation and release is not observable (Rundle et al., 2021b; Scholz, 2019). These first approaches to nowcasting has been based on the concept of natural time (Varotsos et al., 2001; 2002; 2011, 2013; 2014; 2020a,b; Sarlis et al., 2018).
More specifically, in this work we analyze the result of applying a filter that, when applied to a timeseries of small earthquakes, reveals the cycle of large earthquake occurrence and recovery. Details of the process of building, optimizing, and applying the filter is indicated in Figure 1, and discussed elsewhere (Rundle et al, 2022). The Python code used to compute the filter is available on the ESSOAR site as well. In this section, we sketch the process, details of which can be found in the cited references.
A critical component of the current approach is that the information is encoded in the earthquake clusters or bursts, a series of events closely spaced in time (Rundle et al., 2020; Rundle and Donnellan, 2020). Bursts are a temporal clustering of highly correlated seismicity, typically in a small spatial region.
Data. Referring to Figure 1, we begin with the seismicity in a regional box of size 10o latitude by 10o longitude centered on Los Angeles, CA (Figure 1a). The timeseries of earthquakes in that region since 1970, having magnitudes M > 3.29, is shown in Figure 1b as a series of vertical lines. Also shown as a blue curve is the exponential moving average (EMA) with number of weights N = 36 [1]. Note that the blue curve shows an “inverted” cycle of large earthquakes that is the primary basis for the nowcast filter. This inverted cycle shows a sudden decrease at the time of a large earthquake, due to the occurrence of the aftershocks.
In Figure 1c, we show a time series for the mean number \(\mu(t)\) of small earthquakes as a function of time. The mean is taken beginning in 1960, and is also shown since 1970. It can be seen that the mean does not indicate a steady state. Instead, there is a general increase in mean number of events up to about 1993, after which it shows a cycle similar to that in Figure 1b.
This catalog behavior may be due either to actual tectonic processes, or perhaps to changes in methods of earthquake detection and magnitude assignment in the early 1990’s, when the network was fully automated and digital (Hutton et al., 2010). In fact, it is interesting that the temporal trends in Figure 1c seem to somewhat mirror the general historical change in number of seismic stations in California as shown in Figure 3 of Hutten et al. (2010).
State Variable \(\Theta(t).\)The data in Figures 1b and 1c are then combined to form the state variable timeseries \(\Theta(t)\) shown in Figure 1d. The state variable itself is the EMA average of the small earthquakes, then adjusted using the current mean number \(\mu(2022)\)of small earthquakes, using a constant of proportionality \(\lambda\). TheN -value and \(\lambda\)-value are obtained by optimizing the ROC skill.
The adjustment corresponds to an assumption that there is a minimum number of small earthquakes that occur each month. An important component of this adjustment is the assumption that there appears to be a transition from unstable seismic slip, observable with seismometers, to stable sliding that is observable only with geodetic observational instruments such as GNSS or InSAR. Figure 1d then represents an “inverted” and adjusted and EMA averaged timeseries of the small earthquakes.
Receiver Operating Characteristic (ROC). To calculate the the EMA N -value, and the contribution of the mean number \(\mu(t)\)of small earthquakes, we construct the temporal Receiver Operating Characteristic (ROC) for a forward time window \(T_{W}=\)3 years beyond a given time t (Rundle et al., 2022). The ROC curve [2] is constructed by establishing a series of increasing thresholdsTH in the state variable \(\Theta(t)\) from low values to high values. We then consider all values of time, and a series of 4 clauses (statements). A review of these methods can be found in Jolliffe and Stephenson (2003).
For each time t : if a given \(\Theta(t)\geq T_{H}\) and a large earthquake occurs within the next TW years, we classify that as true positive TP; if \(\Theta(t)\geq T_{H}\) and no earthquake occurs within the next TW years, we classify as false positive FP; if \(\Theta(t)<T_{H}\) and no earthquake occurs within the next TW years, we classify as true negative TN; if \(\Theta(t)<T_{H}\) and an earthquake does occur within the next TW years, we classify as false negative FN. We then repeat this procedure over all values of threshold \(T_{H}\).
Having TP, FP, FN, TN, we then define the true positive rate TPR or “hit rate” TP/(TP + FP); the false positive rate FPR or “false alarm rate” FP/(FP+TN). A plot of TPR against FPR defines the ROC curve, which is the red curve in Figure 1e. For future consideration, we also define the Precision, or positive predictive value as TP/(TP+FP), the fraction of predictions that turn out to be accurate. These and other quantities are described in [2].
Supervised Machine Learning. The area under the ROC curve is the Skill , which specifies the ability of the method to discriminate between true signals and false signals. The diagonal line in Figure 1e is the no skill line, equivalent to a random predictor. Note that the area under the no skill line is 0.5.
For a method to have skill, the ROC curve must either be above the diagonal line, or below it. For a method with skill, the area under the ROC curve can either be a maximum of 1.0, or a minimum of 0.0. For future reference, we define a skill index SKI in % as:
\(SKI=100(|\frac{\text{Skill}}{0.5}-1.0|)\) (1)
SKI can then range from 0% (when skill = 0.5), to 100% (when skill is either 1.0 or 0.0). In Figure 1e, the no skill area is indicated by the darker shaded area. The skill of the nowcasting method that we discuss here is indicated by the total shaded area.
As discussed in Rundle et al. (2022), we find the optimal values ofN for the EMA, and the contribution of the mean earthquakeµ (t ) adjustment λ , by maximizing the skill. This is indicated in Figure 1d and 1e as a feedback between the state variable curve and the ROC skill calculation. This procedure results in a filter that has been optimized by well-understood, reliable methods. We note that the code is available on the AGU preprint archive ESSOAR (Rundle et al., 2022, supplemental tab).
Shannon Information. Claude Shannon’s famous paper on statistical communication theory (Shannon, 1948) describes a measure of the information content of a message between communicating parties. It is based on the idea of viewing a message consisting of a bit string as a series of intermixed 1’s and 0’s, with an associated entropy of mixing. Examples of the use of these methods can be found, e.g., in Cover and Thomas (1991), and Stone (2015).
The usual interpretation of Shannon information entropy is then the number of binary yes/no questions that must be asked in order to determine the information in the message being sent. If more questions are required, the entropy is higher and the information communicated is more surprising. Conversely if fewer questions are required, the entropy is lower and the information communicated is not as surprising.
For a message in which symbols in an alphabet (i.e., the 1’s and 0’s) have probability mass function p (\(\omega\)), where\(\omega\in\)[0,1], the self-information \(I_{\text{self}}\)associated with a given symbol is:
\(I_{\text{self}}=-Log_{2}p(\omega)\) (2)
The Shannon information of a string of symbols is then given by the expectation of the self-information:
\(I_{S}=-\sum_{\omega}p(\omega)Log_{2}p(\omega)\) (3)