We are developing a new approach to earthquake nowcasting based on science transformers (GC Fox et al., Geohazards, 2022). As explained in the seminal paper by Vaswani et al. (NIPS, 2017), a transformer is a type of deep learning model that learns the context of a set of time series values by means of tracking the relationships in a sequence of data, such as the words in a sentence. Transformers extend deep learning in the adoption of a context-sensitive protocol "attention", which is used to tag important sequences of data, and to identify relationships between those tagged data. Pretrained transformers are the foundational technology that underpins the new AI models ChatGPT (Generative Pretrained Transformers) from openAI.com, and Bard, from Google.com. In our case, we hypothesize that a transformer might be able to learn the sequence of events leading up to a major earthquake. Typically, the data used to train the model is in the billions or larger, so these models, when applied to earthquake problems, need the size of data sets that only long numerical earthquake simulations can provide. In this research, we are developing the Earthquake Generative Pretrained Transformer model, "QuakeGPT", in a similar vein. For simulations, we are using simulation catalogs from the physics-based model Virtual Quake, the statistical model ETAS, and a statistical physics model based on invasion percolation. Observed data, which is the data to anticipate with nowcasting, is taken from the USGS online catalog for California. In this talk, we discuss the architecture of QuakeGPT and report first results. We also report results using other types of simulated seismicity such as slider block models, to quantify how well a Wednesday, 13 December 2023 14:45-14:55 2016-West (Level 2, West, Moscone Center) Nowcasting Earthquakes with QuakeGPT: An AI-Enhanced Earthquak.

John B. Rundle

and 2 more

Earthquake nowcasting has been proposed as a means of tracking the change in large earthquake potential in a seismically active area. The method was developed using observable seismic data, in which probabilities of future large earthquakes can be computed using Receiver Operating Characteristic (ROC) methods. Furthermore, analysis of the Shannon information content of the earthquake catalogs has been used to show that there is information contained in the catalogs, and that it can vary in time. So an important question remains, where does the information originate? In this paper, we examine this question using statistical simulations of earthquake catalogs computed using Epidemic Type Aftershock Sequence (ETAS) simulations. ETAS earthquake simulations are currently in widespread use for a variety of tasks, in modeling, analysis and forecasting. After examining several of the standard ETAS models, we propose a version of the ETAS model that conforms to the standard ETAS statistical relations of magnitude-frequency scaling, aftershock scaling, Bath’s law, and the productivity relation, but with an additional property. We modify the model to introduce variable non-Poisson aftershock clustering, inasmuch as we test the hypothesis that the information in the catalogs originates from aftershock clustering. We find that significant information in the catalogs arises from the non-Poisson aftershock clustering, implying that the common practice of de-clustering catalogs may remove information that would otherwise be useful in forecasting and nowcasting. We also show that the nowcasting method provides similar results with the the ETAS models as it does with observed seismicity.

John B. Rundle

and 6 more

Robert A Granat

and 9 more

We present a data-driven approach to clustering or grouping Global Navigation Satellite System (GNSS) stations according to their observed velocities, displacements or other selected characteristics. Clustering GNSS stations has the potential for identifying useful scientific information, and is a necessary initial step in other analysis, such as detecting aseismic transient signals (Granat et. al., 2013). Desired features of the data can be selected for clustering, including some subset of displacement or velocity components, uncertainty estimates, station location, and other relevant information. Based on those selections, the clustering procedure autonomously groups the GNSS stations according to a selected clustering method. We have implemented this approach as a Python application, allowing us to draw upon the full range of open source clustering methods available in Python’s scikit-learn package (Pedregosa et. al., 2011). The application returns the stations labeled by group as a table and color coded KML file and is designed to work with the GNSS information available from GeoGateway (Heflin et. al., 2020; Donnellan et al, 2021) but is easily extensible. We focused on California and western Nevada. The results show partitions that follow faults or geologic boundaries, including for recent large earthquakes and post-seismic motion. The San Andreas fault system is most prominent, reflecting Pacific-North American plate boundary motion. Deformation reflected as class boundaries is distributed north and south of the central California creeping section. For most models the southernmost San Andreas fault connects with the Eastern California Shear Zone (ECSZ) rather than continuing through the San Gorgonio Pass.

John Rundle

and 3 more

The earthquake cycle of stress accumulation and release is associated with the elastic rebound hypothesis proposed by H.F. Reid following the M7.9 San Francisco earthquake of 1906. However, observing details of the actual values of time- and space-dependent tectonic stress is not possible at the present time. In previous research, we have proposed two methods to image the earthquake cycle in California by means of proxy variables. These variables are based on correlations in patterns of small earthquakes that occur nearly continuously in time. One of these is based on the construction of a time series by the unsupervised detection of small earthquake clusters. The other is based on expanding earthquake seismicity in PCA-derived patterns, to construct a weighted correlation time series. The purpose of the present research is to compare these two methods by evaluating their information content using decision thresholds and Receiver Operating Characteristic methods together with Shannon information entropy. Using seismic data from 1940 to present in California, we find that both methods provide nearly equivalent information on the rise and fall of earthquake correlations associated with major earthquakes in the region. We conclude that the resulting time series can be viewed as proxies for the cycle of stress accumulation and release associated with major tectonic activity. The figure shows the PCA patterns of small earthquakes associated with 5 major M>7 earthquakes in California since 1950.

John B. Rundle

and 1 more

Seismic bursts in Southern California are sequences of small earthquakes strongly clustered in space and time, and include seismic swarms and aftershock sequences. A readily observable property of these events, the radius of gyration (), allows us connect the bursts to the temporal occurrence of the largest ³7 earthquakes in California since 1984. In the Southern California earthquake catalog, we identify hundreds of these potentially coherent space-time structures in a region defined by a circle of radius 600 km around Los Angeles. We compute for each cluster, then filter them to identify those bursts with large numbers of events closely clustered in space, which we call “compact” bursts. Our basic assumption is that these compact bursts reflect the dynamics associated with large earthquakes. Once we have filtered the burst catalog, we apply an exponential moving average to construct a time series for the Southern California region. We observe that the of these bursts systematically decreases prior to large earthquakes, in a process that we might term “radial localization.” The then rapidly increases during an aftershock sequence, and a new cycle of “radial localization” then begins. These time series display cycles of recharge and discharge reminiscent of seismic stress accumulation and release in the elastic rebound process. The complex burst dynamics we observe are evidently a property of the region as a whole, rather than being associated with individual faults. This new method allows us to improve earthquake nowcasting in a seismically active region.

John B. Rundle

and 4 more

We propose a new machine learning-based method for nowcasting earthquakes to image the time-dependent earthquake cycle. The result is a timeseries which may correspond to the process of stress accumulation and release. The timeseries is constructed by using Principal Component Analysis of regional seismicity. The patterns are found as eigenvectors of the cross-correlation matrix of a collection of seismicity timeseries in a coarse grained regional spatial grid (pattern recognition via unsupervised machine learning). The eigenvalues of this matrix represent the relative importance of the various eigenpatterns. Using the eigenvectors and eigenvalues, we then compute the weighted correlation timeseries (WCT) of the regional seismicity. This timeseries has the property that the weighted correlation generally decreases prior to major earthquakes in the region, and increases suddenly just after a major earthquake occurs. As in a previous paper (Rundle and Donnellan, 2020), we find that this method produces a nowcasting timeseries that resembles the hypothesized regional stress accumulation and release process characterizing the earthquake cycle. We then address the problem of whether the timeseries contains information regarding future large earthquakes. For this we compute a Receiver Operating Characteristic and determine the decision thresholds for several future time periods of interest (optimization via supervised machine learning). We find that signals can be detected that can be used to characterize the information content of the timeseries. These signals may be useful in assessing present and near-future seismic hazard.