AISDC Renewal (2023 - 2026)
(LCLS/Argonne)
This proposal will build on previous work by taking the next critical steps needed to render the information generated by ML routines from sensors to data centers fully actionable. Importantly, these advances will meet the challenge of high data volumes by closing the automated feedback loop during active experiments. We will pursue five specific avenues of research to achieve this goal. On the infrastructure side, we will expand our ability to generalize and deploy our AI inference capabilities within specialized hardware at the edge. Our rapid ML re-training framework will be augmented to integrate automated steering capabilities. On the experiment side, we will optimize the information extracted from attosecond science experiments so that this information can be used directly for control. On the experiment side, we will deploy new real-time algorithms at HEDM experiments to reliably handle new materials. Further, we will leverage the previously-developed framework for high-bandwidth, low-latency data processing to chain calibration, assembly, and inference tasks to overcome the bottleneck to providing near real-time feedback. This will benefit imaging experiments on tiled detectors with high repetition rates, with impact in heterogeneous catalysis and biochemical science. The infrastructure and experiment-driven aims are highly synergistic, as the actionable feedback required to steer diverse experiments will motivate the development of more adaptable ML frameworks and vice versa. The edge-to-HPC pipeline we build will provide an integrated approach to data analytics, advances that will be instrumental for handling the massive data throughput generated by LCLS-II and APS-U.
Year 1: MCTS Detectors: Develop ML for spectral analysis of Timetool (D). Featurization of elec- tron time-of-flight spectra in FPGA (D). Demonstrate 100 kfps x-ray/optical delay corrective action output (M). Demonstrate use of FPGA to ingest streaming camera for TimeTool (M). SNL: Generalize Neural-Net Library (D). Calibration in FPGA implementation (D). ML for Tiled Detectors: Benchmark PeakNet speed/accuracy performance from none to full calibration (pedestal, gain, pixel mask) (M). Evaluate PeakNet performance based on tiled image size - develop veto and data aggregation strategy. (M) HEDM: Identify datasets for model training (D). Develop method- ologies (D). Implement automated hyper-parameter optimization (M). Develop automated model searching and selection (M). Framework for Rapid Model Retraining: Implement automated hyper- parameter optimization (M). Develop automated model searching and selection (M). Deploy rapid (re)training framework at SLAC (D). Exhibit the advantages of automated hyper-parameter tuning by enhancing the training time of BraggNN (D).
Year 2: MCTS Detectors: Demonstrate Edge inference in CookieBox voltage settings (M). Demonstrate carrier envelope phase tagging for 100 kHz high power laser with 2D Time-Tool (M). Spectrum and polarization reconstruction with sub-ms latency with CookieBox (D) SNL: Imple- ment spectrum and polarization reconstruction in FPGA (D). Evaluate performance of calibration algorithms in FPGA (M). Begin implementation of PeakNet in FPGA (M). Begin implementa- tion SpeckleNN in FPGA (M). ML for Tiled Detectors: Benchmark SpeckleNN speed/accuracy performance from none to full calibration (pedestal, gain, pixel mask) (M). Evaluate SpeckleNN performance based on tiled image size - develop veto and data aggregation strategy. (M) HEDM: Demonstrate analysis of overlapping peaks (M). Demonstrate analysis of adaptive patch generation (M). Peak finding model trained on APS HEDM data (D). Peak finding model trained with simu- lated HEDM using EBSD as ground truth (D). Framework for Rapid Model Retraining: Optimize training for advanced DCAI systems (M). Add streaming capability to training and inferencing pipelines (M). Demonstrate the automated model search and selection feature using our rapid train framework (D). Demonstrate the integration of advanced DCAI in the optimized (re)training frame- work by training a large model with massive data efficiently (D).
Year 3: MCTS Detectors: Demonstrate closed loop feedback for CookieBox data with HPC training (D). Demonstrate X-ray pulse time-energy reconstruction of inference pipeline with hetero- geneous hardware (D). SNL: Implmentation of CookieBox Actionable ML across multiple FPGA and GPU layers (D). Implementation of Timetool ML in edge hardware with necessary latency (D). Implement PeakNet and SpeckleNN in FPGA (D). ML for Tiled Detectors: Implement PeakNet in heterogeneous pipeline (D). Implemment SpeckleNN in heterogeneous pipeline (D). HEDM Demon- strate analysis of adaptive patch generation (M). Train AI-assisted peak detection algorithms using EBSD as ground truth (M). ML peak overlap at APS HEDM (D). Anomaly detection on quartz sam- ples(D). Framework for Rapid Model Retraining: Integrate with SciStream for memory-to-memory data transfer between instrument and training process at DCAI (M). Generalize framework and deploy on more computing platforms (M). Demonstrate streaming online (re)training and online redeployment of updated model at edge to maintain accuracy of inferencing (D). Demonstrate generalization of framework to other workflows and other DCAI systems (D).