Actionable Information from Sensor to Data Center

(LCLS/Argonne)

The goal of the Actionable Information from Sensor to Data Center (AISDC) project is the development of adaptable ML algorithms performing inference at the experiment edge to provide real-time feedback to experiments. This was achieved by developing ML algorithms for several demanding use cases, building a framework to deploy AI/ML to Field Programmable Gate Arrays (FPGA) and implementing a workflow to stream data to Argonne to retrain models within minutes. During the previous performance period, the project defined four research thrusts: 1) ML at the Edge, 2) Data extraction for SX and SPI, 3) In-situ HEDM, and 4) Rapid Model (Re)Training on Data Center AI Systems (DCAI). Over the course of the project, the team used simulated and experimental datasets to embed science domain expertise into ML algorithms for reconstructing photo-electron spectra from attosecond angular streaking data, extracting features from serial crystallography and single particle imaging data, and rapidly determining peak position for high-energy diffraction mi- croscopy. The infrastructure needed to deploy trained ML models on edge devices and leverage HPC to rapidly (re)train ML/AI models on Data Center AI Systems (DCAI) was developed.

Machine Learning at the Edge:
The SNL framework for high-bandwidth data processing on EdgeAI accelerators The re-programmability of emerging EdgeAI accelerators makes it possible to deploy a custom ML inference model for each detector and experiment, and even for individual shifts. Select model parameters can be tuned during continuous training without having to alter the model architecture itself. This flexibility allows EdgeAI accelerators to deliver actionable information within minutes of source or sample conditions changing during active experiments. To leverage this infrastructure, we created the SLAC Neural-network Library (SNL) framework. This framework facilitates implementing ML inference engines on FPGAs for high-bandwidth data processing and low latency feedback across diverse experiments by translating the machine learning layers into FPGA byte code. The SNL can accommodate networks of up to a few hundred parameters, depending on the geometry and required numerical precision, while robustly adapting to changes as data are streamed in real-time. We have successfully used the SNL to deploy multiple networks (Conv2D, MaxPooling, AveragePooling, Dense and others) onto moderately-sized FPGAs as proof-of-principle. To support a broad user and developer community, the library is presented as a collection of C++ templates that define layer types and activators by specifying their parameters following Python Keras con- ventions. To anticipate increasingly heterogeneous EdgeAI hardware, SNL can be extended to other compute architectures that span conventional CPUs, GPUs and specialized Intelligence Processing Units (IPUs) and Reconfigurable Dataflow Units (RDUs).

Real-time reconstruction of electron spectra and model retraining using CookieNetAE:
We leveraged the SNL to develop CookieNetAE, a convolutional auto-encoder that reconstructs photo-electron spectra from attosecond angular streaking data collected on the CookieBox detector at LCLS. Multi-channel time-series data are typical of spectroscopy experiments across light sources and thus provide a widely relevant scientific use case to demonstrate the potential of ML at the edge. Due to the absence of the requisite laser field at LCLS during Phase I, we trained CookieNetAE on simulated data. The CookieNetAE model was able to reproduce explicit reconstructions of streaked photo-electron spectra with high fidelity while suppressing sampling noise present in the input signal. Importantly, benchmarking this model confirmed that it is fast enough to support the high data rates anticipated at APS-U and LCLS-II. We demonstrated full model training in under 10 minutes and local inference above 100 kFps when CookieNetAE was run on Graphcore IPUs directly attached to the EdgeML host node. Finally, we identified an alternate representation of the data with a lower bit-depth that reduced model dimensionality and improved performance with only modest changes to the network architecture. This strategy of compressing data prior to streaming it to remote compute facilities could be extended to other analysis pipelines to reduce the compute load of data processing.

Real-time Bragg peak detection with PeakNet:
To advance SX experiments, we focused our efforts on the development of a fast and accurate Bragg peak finder able to operate autonomously on detectors of any size. Current CPU-based peak finders often require manual fine-tuning and exhibit processing times in the range of 100-200 ms/event. Furthermore, existing deep learning based peak finders are restricted to limited input sizes, such as 32 × 32 images containing a single peak per frame. Our solution, named PeakNet, employs the U-Net deep learning model to segment Bragg peaks directly from background, resulting in better improvement in processing speed at 33 ms/event and the ability to operate on detectors of any size, overcoming the limitations of current approaches. Despite being demonstrated on just a moderate gaming GPU 1080 Ti, PeakNet offers a Bragg peak detection process on CsPad detector that is 3× faster and more accurate (7% increased indexing rate, a measure of how well the crystal orientations are recovered) than traditional image processing-based algorithms. The entire peak finding process takes place on GPUs, offering a cost-effective option for achieving scalability. Its performance is expected to improve with the next generation of GPUs.

Real-time SPI scattering pattern classification with SpeckleNN:
We developed a fast and accurate real-time SPI scattering pattern classifier called SpeckleNN , which extracts single hit events that are needed for real-time vetoing and three-dimensional single particle reconstruction. One notable feature of SpeckleNN is its few-shot classification capability, which allows it to accurately classify new never-seen samples after being trained offline on a library of SPI patterns taken from multiple samples. Another key feature of SpeckleNN is its ability to deliver more robust performance than competing methods despite being given only tens of labels per classification category via a few-shot learning framework. Without the need for excessive manual labeling or even a full detector image, our classification method offers a great solution for real-time high-throughput SPI experiments. From the runtime perspective, SpeckleNN can perform classification at nearly 450 Hz and 99% accuracy on a GPU 1080 Ti.

In-situ High Energy Diffraction Microscopy (HEDM):
We developed BraggNN, a deep-learning-based method to determine peak positions in HEDM with 15% smaller errors compared with those calculated using analytical pseudo-Voigt func- tion fitting referenced to grain positions obtained from near-field HEDM as ground truth. Using recent advances in deep-learning methods and special-purpose model inference accelerators, Brag- gNN delivered enormous performance improvements compared to conventional methods, running >200× faster on a GPU with out-of-the-box implementation as compared to the legacy CPU-based implementation. A real-time streaming implementation of BraggNN was demonstrated using a 4 mega-pixel area detector with a frame rate of 6 Hz streaming data via an EPICS control system to a nearby GPU. Code is available here: https://github.com/AISDC/edgeBragg

We have developed a parametric generative method utilizing kernel density estimation for replicat- ing and sampling experimental parameter distributions paired with a simulator to generate training examples with well-defined ground truth values. In particular, we use HEDM as a representative application to demonstrate this method, using generated diffraction peaks for training BraggNN. As compared to BraggNN trained using experimental diffraction peaks, improvements in the de- termination of peak locations are observed utilizing this method with a 25% reduction in error. In combination with our model retraining work, this method can significantly widen the application of BraggNN to a variety of materials/use cases. A manuscript describing this method is in preparation. Code is available here: https://github.com/AISDC/PeakGenerator

We have developed a new method for rapid anomaly event detection which was inspired by our model retraining work. Our approach provides users with an anomaly event score for new HEDM datasets, which is calculated using an uncertainty quantification (UQ) score. The UQ score mea- sures the difference between a baseline dataset and new datasets. Our framework leverages an image representation model and clustering algorithms to accomplish this task. The rep- resentation model plays a crucial role in transforming the dataset into compact, semantic-rich representations of visually salient characteristics. This method for rapid event detection is based on the pre-processing and clustering of peak patches in the baseline dataset, followed by the use of the pre-trained representation model to generate representation vectors for the new datasets as the material microstructure evolves. By calculating the UQ score, our approach provides users with a reliable indicator of anomalous events such as changes in diffraction peak characteristics. We have successfully applied this method in an HEDM experiment on a Ti-7Al alloy sample with in situ loading to detect the onset of plastic deformation in real-time, where the characteristics of diffraction peaks would change. As compared to traditional approaches for the determi- nation of the onset of plastic deformation utilizing microstructure reconstruction, this method is fully automated, computationally faster by up to 50 times, and also works for up to 7 times sparser datasets. A manuscript describing this method is in preparation.

AI Model (Re) Training on Data Center AI (DCAI) systems:
A key challenge to using ML methods at high-rate light source experiments is the ability to train models with sufficient rapidity that they can be deployed and used within minutes of a change to an experiment configuration. A data center AI (DCAI) system is an AI accelerator that must be deployed in a data center due to its cooling, power supply, ventilation, and fire suppression requirements. We used DCAI systems to train ML models much more rapidly than computing clusters that are maximally deployable within the experiment facility. Once the DNN was trained, we used another set of AI accelerators specialized for model inference, called edge-AI (e.g., FPGA, GPU with Tensor cores, edge TPU), to process experiment data near the data acquisition in real- time. We used the Globus Flows, funcX, and Globus file transfer services to build a distributed workflow to automatically (re)train BraggNN and CookieNetAE. We demonstrated that despite data movement cost and service overhead to use remote DCAI systems for DNN training, the turnaround time is less than 1/30 of using a locally deployable GPU. Code is available here: https://github.com/AISDC/nnTrainFlow and https://github.com/AISDC/DNNTrainerFlow

Rapid analysis and closed loop control of experiments at light sources rely on the ability to update ML models rapidly in response to changes in an instrument or sample while an experiment is running. We developed a data service and model service to accelerate deep neural network training. Our approach first, detects when model performance degrades; second, identifies data from previous experiments that can be reused to reduce labeling efforts; and third, finds a suitable model, from a Zoo of models trained for previous experiments, as a foundation to fine-tune with new data to improve its performance. We evaluated the performance of these services using three representative scientific datasets and two different scientific applications. Our data service achieves a 100× speedup in terms of data labelling compared to the current state-of-the-art. Our model service achieves up to 200× improvement in training speed. Overall, the data and model services together (fairDMS) achieves up to 92× speedup in-terms of end-to-end model update time.

We have developed an open source, lightweight compiler framework without any proprietary dependencies, OpenHLS, based on high-level synthesis techniques, for translating high-level representations of deep neural networks to low-level representations, suitable for deployment to near-sensor devices such as FPGAs. We evaluated OpenHLS on various workloads and presented a case-study implementation of BraggNN. We showed that OpenHLS is able to produce an implementation of the network with a throughput of 4.8 μs/sample, which is approximately a 4× improvement over the existing implementation. Code is available here: https://github.com/makslevental/openhls.