Building Foundational and Surrogate Models for Experiment Steering at LCLS

(LCLS/Oak Ridge)

Experiments at LCLS enable the direct observation and characterization of materials and molecular assemblies critical for energy research. Ongoing facility enhancements are exponentially increasing the rate and volume of data collected, opening up new frontiers of scientific research but also necessitating advancements in computing, algorithms and analysis to exploit this data effectively. As data rates surge, analysis methods need to scale and mine continuously streamed data to select interesting events, reject poor data, and adapt to changing experimental conditions. Real-time data analysis can offer immediate feedback to users or direct instrument controls for self-driving experiments. Autonomous experiment steering in turn is poised to maximize the efficiency and quality of data collection by connecting the user's intent in collecting data, data analysis results, and algorithms capable of driving intelligent data collection and guiding the instrument to optimal operating regimes. As part of the Integrated Research Infrastructure vision, this project is developing AI models that will ultimately enable autonomous steering of LCLS experiments while leveraging Leadership Computing power that is not directly available at the data source. The AI models developed share a common end-to-end self-supervised learning approach and an auto-encoding strategy to transform raw data into actionable information, and either fall in the category of “foundational models” - which will provide generalist tools to work with LCLS produced images, or in the category of “surrogate models” which will provide highly-tailored tools for specific types of LCLS experiments. This project will also address a critical challenge to the development of such models, namely tackling the data loading bottleneck by upgrading the libraries currently in use at LCLS, historically CPU only, so they can be efficiently used by deep learning code able to run on massively parallel GPU infrastructures. Ultimately, these capabilities will significantly enhance experimental output and enable groundbreaking scientific exploration. Facilitating rapid data analysis and autonomous experiment steering capabilities will shed light on some of the most challenging scientific research areas facing the nation including structural biology, materials science, quantum materials, environmental science, nanoscience, nanotechnology, additive manufacturing, and condensed matter physics.

Research Objectives and Milestones:

The overarching objective of this project is the development of AI models that transform raw diffraction images from LCLS into interpretable features which can be used downstream for a variety of applications ranging from data visualization to experiment steering, through data filtering or molecular reconstruction. The common requirement shared by the different applications is time sensitivity as they all need to happen live or shortly after the image was generated at LCLS. Another key requirement for successful autonomous experiment steering, following the Integrated Research Infrastructure vision, is to leverage the seamless integration of computing and data made possible by dedicated workflows. While this project does not address the development of workflows per se, it is designed around the premise that data and model flows across facilities will ultimately be seamless, thus informing how the models developed with this project will ultimately be used in production (see Goal 4 - Putting It Together).

Goal 1 - Solving the data loading challenge

Goal 2 - Developing surrogate models for specific experiment types

Single Particle Imaging (X-RAI)
(Team: Shenoy, Levy, Chen, Turner, Wetzstein)
Heterogeneous/Time-resolved 3D reconstruction using implicit neural representation of molecular volume. This work is in the direct continuation of similar work in the field of cryoEM. Amortization of the pose estimation step compared to existing orientation matching strategies will ensure scalability of the method to massive datasets while providing live feedback on reconstruction quality during data collection.
Serial Femtosecond Crystallography (SFX)
(Team: Mendez, Dalton, Brewster, Sauter)
Refinement beyond integration using implicit neural representation of structure factors and advanced simulation of diffraction physics. This work is an extension of ongoing work [4], and combines ideas from similar work in the field and in cryoEM [. Through careful modeling of the physics at play in the diffraction process, this method unlocks access to minute details about proteins, while the end-to-end differentiable modeling approach allows to concurrently refine key experimental parameters such as the detector geometry and calibration, further improving the quality of the results.
X-ray Photon Fluctuation Spectroscopy (XPFS)
(Team: Chen, Turner)
A unique and powerful technique to probe ultrafast dynamics in condensed materials, the dynamical structure factors measured by XPFS inform on the spin Hamiltonian of the material. In previous work [9], a surrogate model of the forward model (from Hamiltonian parameters to structure factors) was built from expensive first principles calculations. This approach enabled rapid comparison to the current state of the experiment and even suggestion for setting the next value of one experimental parameter, namely the delay time between two probes. Going forward, this Type 2 surrogate will be replaced with a more expressive implicit neural representation and supplemented with a Type 3 surrogate to replace the current Bayesian optimization process with an auto-encoder approach which might more readily enable exploration of a larger parameter set for experiment steering.
Ultrafast X-ray Scattering of Quantum Materials (Nguyen):

Goal 3 - Building the MAE-LCLS foundational model

Goal 4 - Putting It Together - Demonstration