John B. Rundle - Authorea

John B. Rundle

Professor

University of California, Davis

Member of: ESS Open Archive AGU Fall Meeting 2022

Public Documents 12

Nowcasting Earthquakes with QuakeGPT An AI-Enhanced Earthquake Generative Pretrained...

John B. Rundle

November 18, 2023

We are developing a new approach to earthquake nowcasting based on science transformers (GC Fox et al., Geohazards, 2022). As explained in the seminal paper by Vaswani et al. (NIPS, 2017), a transformer is a type of deep learning model that learns the context of a set of time series values by means of tracking the relationships in a sequence of data, such as the words in a sentence. Transformers extend deep learning in the adoption of a context-sensitive protocol "attention", which is used to tag important sequences of data, and to identify relationships between those tagged data. Pretrained transformers are the foundational technology that underpins the new AI models ChatGPT (Generative Pretrained Transformers) from openAI.com, and Bard, from Google.com. In our case, we hypothesize that a transformer might be able to learn the sequence of events leading up to a major earthquake. Typically, the data used to train the model is in the billions or larger, so these models, when applied to earthquake problems, need the size of data sets that only long numerical earthquake simulations can provide. In this research, we are developing the Earthquake Generative Pretrained Transformer model, "QuakeGPT", in a similar vein. For simulations, we are using simulation catalogs from the physics-based model Virtual Quake, the statistical model ETAS, and a statistical physics model based on invasion percolation. Observed data, which is the data to anticipate with nowcasting, is taken from the USGS online catalog for California. In this talk, we discuss the architecture of QuakeGPT and report first results. We also report results using other types of simulated seismicity such as slider block models, to quantify how well a Wednesday, 13 December 2023 14:45-14:55 2016-West (Level 2, West, Moscone Center) Nowcasting Earthquakes with QuakeGPT: An AI-Enhanced Earthquak.

A Data-Driven Approach to Deformation Forecasting: Machine Learning on InSAR Data

Joe Yazbeck

and 1 more

December 27, 2023

Anthropogenic activities such as fluid injection, fluid extraction, mining, and hydraulic fracturing can all cause induced seismicity which can in turn result in land subsidence. This latter phenomenon is devastating to local infrastructure as well as underlying aquifers. It is for this reason that monitoring and predicting land deformation is of utmost importance. We relied on Interferometric Synthetic Aperture Radar (InSAR) images captured by Sentinel-1 to monitor deformation in the line-of-sight of the satellite. The Geysers geothermal field, where injection plays a direct role in induced seismicity, was used as the area of study and a deformation time series was built using LiCSBAS [1]. Two machine learning models (model A and model B) that included Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) layers were built to predict future deformation maps. The only difference between the models was the incorporation of geothermal injection and production data in model B. While both models outperformed a baseline linear model, it was model B that performed the best based on a mean squared error metric.

Nowcasting ETAS Earthquakes: Information Entropy of Earthquake Catalogs

John B. Rundle

and 2 more

October 25, 2023

Earthquake nowcasting has been proposed as a means of tracking the change in large earthquake potential in a seismically active area. The method was developed using observable seismic data, in which probabilities of future large earthquakes can be computed using Receiver Operating Characteristic (ROC) methods. Furthermore, analysis of the Shannon information content of the earthquake catalogs has been used to show that there is information contained in the catalogs, and that it can vary in time. So an important question remains, where does the information originate? In this paper, we examine this question using statistical simulations of earthquake catalogs computed using Epidemic Type Aftershock Sequence (ETAS) simulations. ETAS earthquake simulations are currently in widespread use for a variety of tasks, in modeling, analysis and forecasting. After examining several of the standard ETAS models, we propose a version of the ETAS model that conforms to the standard ETAS statistical relations of magnitude-frequency scaling, aftershock scaling, Bath’s law, and the productivity relation, but with an additional property. We modify the model to introduce variable non-Poisson aftershock clustering, inasmuch as we test the hypothesis that the information in the catalogs originates from aftershock clustering. We find that significant information in the catalogs arises from the non-Poisson aftershock clustering, implying that the common practice of de-clustering catalogs may remove information that would otherwise be useful in forecasting and nowcasting. We also show that the nowcasting method provides similar results with the the ETAS models as it does with observed seismicity.

Optimizing Earthquake Nowcasting with Machine Learning: The Role of Strain Hardening...

John B. Rundle

and 6 more

August 22, 2022

Nowcasting is a term originating from economics, finance and meteorology. It refers to the process of determining the uncertain state of the economy, markets or the weather at the current time by indirect means. In this paper we describe a simple 2-parameter data analysis that reveals hidden order in otherwise seemingly chaotic earthquake seismicity. One of these parameters relates to a mechanism of seismic quiescence arising from the physics of strain-hardening of the crust prior to major events. We observe an earthquake cycle associated with major earthquakes in California, similar to what has long been postulated. An estimate of the earthquake hazard revealed by this state variable timeseries can be can be optimized by the use of machine learning in the form of the Receiver Operating Characteristic skill score. The ROC skill is used here as a loss function in a supervised learning mode. Our analysis is conducted in the region of 5o x 5o in latitude-longitude centered on Los Angeles, a region which we used in previous papers to build similar timeseries using more involved methods (Rundle and Donnellan, 2020; Rundle et al., 2021). Here we show that not only does the state variable timeseries have forecast skill, the associated spatial probability densities have skill as well. In addition, use of the standard ROC and Precision (PPV) metrics allow probabilities of current earthquake hazard to be defined in a simple, straightforward and rigorous way.

Does the Catalog of California Earthquakes, with Aftershocks Included, Contain Inform...

John B. Rundle

and 4 more

August 31, 2022

A document by John B. Rundle. Click on the document to view its contents.

Clustering Analysis Methods for GNSS Observations: A Data-Driven Approach to Identify...

Robert A Granat

and 9 more

February 03, 2021

We present a data-driven approach to clustering or grouping Global Navigation Satellite System (GNSS) stations according to their observed velocities, displacements or other selected characteristics. Clustering GNSS stations has the potential for identifying useful scientific information, and is a necessary initial step in other analysis, such as detecting aseismic transient signals (Granat et. al., 2013). Desired features of the data can be selected for clustering, including some subset of displacement or velocity components, uncertainty estimates, station location, and other relevant information. Based on those selections, the clustering procedure autonomously groups the GNSS stations according to a selected clustering method. We have implemented this approach as a Python application, allowing us to draw upon the full range of open source clustering methods available in Python’s scikit-learn package (Pedregosa et. al., 2011). The application returns the stations labeled by group as a table and color coded KML file and is designed to work with the GNSS information available from GeoGateway (Heflin et. al., 2020; Donnellan et al, 2021) but is easily extensible. We focused on California and western Nevada. The results show partitions that follow faults or geologic boundaries, including for recent large earthquakes and post-seismic motion. The San Andreas fault system is most prominent, reflecting Pacific-North American plate boundary motion. Deformation reflected as class boundaries is distributed north and south of the central California creeping section. For most models the southernmost San Andreas fault connects with the Eastern California Shear Zone (ECSZ) rather than continuing through the San Gorgonio Pass.

Constrained Invasion Percolation Model: Growth via Leath Bursts and the origin of Sei...

john rundle

and 3 more

February 11, 2020

We analyze a new model for growing networks, the constrained Leath invasion percolation (CLIP) model. Cluster dynamics are characterized by bursts in space and time. The model quantitatively reproduces the observed frequency-magnitude scaling of earthquakes in the limit that the occupation probability approaches the critical bond percolation probability in d=2. The model may have application to other systems characterized by burst dynamics.

Nowcasting Earthquakes by Visualizing the Earthquake Cycle with Machine Learning:A Co...

John Rundle

and 3 more

August 22, 2022

The earthquake cycle of stress accumulation and release is associated with the elastic rebound hypothesis proposed by H.F. Reid following the M7.9 San Francisco earthquake of 1906. However, observing details of the actual values of time- and space-dependent tectonic stress is not possible at the present time. In previous research, we have proposed two methods to image the earthquake cycle in California by means of proxy variables. These variables are based on correlations in patterns of small earthquakes that occur nearly continuously in time. One of these is based on the construction of a time series by the unsupervised detection of small earthquake clusters. The other is based on expanding earthquake seismicity in PCA-derived patterns, to construct a weighted correlation time series. The purpose of the present research is to compare these two methods by evaluating their information content using decision thresholds and Receiver Operating Characteristic methods together with Shannon information entropy. Using seismic data from 1940 to present in California, we find that both methods provide nearly equivalent information on the rise and fall of earthquake correlations associated with major earthquakes in the region. We conclude that the resulting time series can be viewed as proxies for the cycle of stress accumulation and release associated with major tectonic activity. The figure shows the PCA patterns of small earthquakes associated with 5 major M>7 earthquakes in California since 1950.

Nowcasting Earthquakes in Southern California with Machine Learning:Bursts, Swarms an...

John B. Rundle

and 1 more

January 18, 2020

Seismic bursts in Southern California are sequences of small earthquakes strongly clustered in space and time, and include seismic swarms and aftershock sequences. A readily observable property of these events, the radius of gyration (), allows us connect the bursts to the temporal occurrence of the largest ³7 earthquakes in California since 1984. In the Southern California earthquake catalog, we identify hundreds of these potentially coherent space-time structures in a region defined by a circle of radius 600 km around Los Angeles. We compute for each cluster, then filter them to identify those bursts with large numbers of events closely clustered in space, which we call “compact” bursts. Our basic assumption is that these compact bursts reflect the dynamics associated with large earthquakes. Once we have filtered the burst catalog, we apply an exponential moving average to construct a time series for the Southern California region. We observe that the of these bursts systematically decreases prior to large earthquakes, in a process that we might term “radial localization.” The then rapidly increases during an aftershock sequence, and a new cycle of “radial localization” then begins. These time series display cycles of recharge and discharge reminiscent of seismic stress accumulation and release in the elastic rebound process. The complex burst dynamics we observe are evidently a property of the region as a whole, rather than being associated with individual faults. This new method allows us to improve earthquake nowcasting in a seismically active region.

Tsunami Squares Implementation Changes to Improve Wave Resolution and Accuracy

David Grzan

and 4 more

October 20, 2021

Tsunami Squares is a computationally lightweight tsunami and inundation simulator which utilizes a unique cellular automata technique. We make modifications to the underlying algorithm which result in increased accuracy and enhanced waveform resolution. These improvements leave Tsunami Squares well suited for machine learning applications where large pre-computed tsunami simulation databases are required. Previous implementations relied heavily on a smoothing algorithm which acts as a moving average applied to the water surface heights and velocities to eliminate anomalies at every time step. Although this allowed the simulation to function properly, it brings several unwanted effects such as reduced wave detail and lowered energy. A solution is found by shifting the location at which the water surface gradient is calculated, reducing the amount of anomalies in the simulation, and thus lowering the amount of smoothing needed by a factor of $\sim 10$. Also introduced is a new method to conserve energy locally, compared to previous methods which reference a simulation-wide energy calculation. We make comparison tests were made using the 2011 Tohoku tsunami along with the 2010 Maule tsunami to demonstrate the improvements made.

Simultaneous Inversion of Multiple Faults' Parameters From InSAR Data Using a Genetic...

Cameron Saylor

and 2 more

August 28, 2020

Interferometric synthetic-aperture radar (InSAR) interferograms contain valuable information about the fault systems hidden beneath the surface of the Earth. In a new approach, we aim to fit InSAR ground deformation data using a volumetric distribution of multiple seismic point sources whose parameters are found by a genetic algorithm. The resulting source distribution could provide another useful tool in solving the difficult problem of accurately mapping earthquake faults. To test the algorithm, we first apply it to synthetic data, followed by applications to an ALOS-2 InSAR interferogram. We report first results and discuss advantages and disadvantages of this approach.

Nowcasting Earthquakes: Imaging the Earthquake Cycle in California with Machine Learn...

John B. Rundle

and 4 more

March 27, 2021

We propose a new machine learning-based method for nowcasting earthquakes to image the time-dependent earthquake cycle. The result is a timeseries which may correspond to the process of stress accumulation and release. The timeseries is constructed by using Principal Component Analysis of regional seismicity. The patterns are found as eigenvectors of the cross-correlation matrix of a collection of seismicity timeseries in a coarse grained regional spatial grid (pattern recognition via unsupervised machine learning). The eigenvalues of this matrix represent the relative importance of the various eigenpatterns. Using the eigenvectors and eigenvalues, we then compute the weighted correlation timeseries (WCT) of the regional seismicity. This timeseries has the property that the weighted correlation generally decreases prior to major earthquakes in the region, and increases suddenly just after a major earthquake occurs. As in a previous paper (Rundle and Donnellan, 2020), we find that this method produces a nowcasting timeseries that resembles the hypothesized regional stress accumulation and release process characterizing the earthquake cycle. We then address the problem of whether the timeseries contains information regarding future large earthquakes. For this we compute a Receiver Operating Characteristic and determine the decision thresholds for several future time periods of interest (optimization via supervised machine learning). We find that signals can be detected that can be used to characterize the information content of the timeseries. These signals may be useful in assessing present and near-future seismic hazard.