Figure 7: Workflow for model-based real-time monitoring of a chromatographic step adapted from Sauer et a. 2018. In addition to the sensors for pH, conductivity, UV absorbance and pressure which a chromatography workstation is typically equipped with, four online sensors have been implemented in the flow after the column. Online data were obtained for 8 identical runs of a chromatographic purification step. The eluates were aliquoted and collected as 15 fractions and offline analysis was carried out to determine the desired quality attributes, product quantity and impurity content. Part of this data set was then used to establish mathematical models for each quality attribute by relating the offline with the online data. The models were selected via the lowest root mean squared error (RMSE) and then evaluated via their predictability for independent test data sets which have not been used for the model training before. Implemented in the stirring software of the chromatographic workstation, the established models give information on all quality attribute they have been trained on in real-time (<1sec) and enable real-time decision making for e.g. pooling of the eluate for the next step.
Sauer et.al. equipped a chromatographic workstation with multiple sensors (Sauer et al., 2019). Besides standard detectors (UV, pH and conductivity), multi-angle light scattering, refractive index, attenuated total reflection Fourier-transform infrared and fluorescence spectroscopy were included. The real-time monitoring system was used in a cation exchange capture step of fibroblast growth factor 2 expressed in E. coli . Eight training runs were performed where 15 fractions of the eluate were analyzed to get information about the product quantity, host cell protein and double-stranded DNA impurities as well as endotoxins and Monomer/aggregates (Figure 7). Prediction models were generated for each individual response variable using cross-validation. The same system was used for an antibody capture process (Walch et al., 2019). The input data of the various devices was time-aligned considering the different void volumes and time resolution. Individual preprocessing methods were applied to the individual sensors together with a variable selection procedure specific for the sensor. Finally, the online signals were averaged over the time intervals of the collected fractions that were analyzed off-line. A multiple sensor approach is only feasible if the chromatographic workstation is equipped with a central database (Oliveira, 2019; Steinwandter et al., 2019). For these multiple sensors the software solution XAMIris (Evon, Austria) was used for the recording of various signals, starting of the chromatographic runs, data export, time-alignment as well as the implementation of soft sensors for real-time monitoring of several CQAs (Christler et al., 2021; Sauer et al., 2019; Walch et al., 2019).
Impact of different sensors
In same multi sensor setup UV/Vis spectroscopy was used as it mainly measures the primary structure, such as the content of aromatic amino acids (UV280nm), polypeptide backbone (UV214nm) or DNA content (UV260nm) (Christler et al., 2021; Sauer et al., 2019; Walch et al., 2019). The refractive index (RI) was included as it was previously used to quantify protein (Zhao et al., 2011). ATR‐FTIR can distinguish between HCP and target protein (Capito et al., 2013). Intrinsic fluorescence of the aromatic amino acids can used to measure the tertiary structure of proteins and to detect structural changes induced by polarity (Ghisaidoobe et al., 2014; Rathore et al., 2009). Light scattering methods (Minton, 2016) are used to determine their quaternary structure, for example, protein aggregation. Fluorescence spectroscopy, as well as light scattering techniques, have been used for at‐line determination of quality attributes (Patel et al., 2018; Rathore et al., 2009; Yu et al., 2013).
Multiple CQA monitoring- single sensors vs multiple sensors
If multiple CQAs are monitored using multiple sensors decent model selection is required. Extensive investigation of the impact of the individual sensors on the prediction performance was done (Sauer et al., 2019; Walch et al., 2019). Typically, a prediction model is as simple as possible but as complex as necessary. Therefore, models based solely on one single sensor were compared to models with two, three up to all available sensors. The best model was selected based solely on the prediction error, e.g., the root mean squared error (RMSE) of prediction on an independent test set, a purely data-driven approach. However, if the performance of an extensive model including fluorescence and/or ATR-FTIR data only slightly outperformed a basic model, it was still recommended to use the basic model (Sauer et al., 2019). For all investigated CQAs, the finally selected models contained more than one single sensor.
Robustness of the monitoring system - sensor fouling / sensor shift
For the set-up of a real-time monitoring system, it is recommended to implement multiple prediction models as sensor fault, shift or fouling can easily distort the input data and make the prediction models useless. The model used impacts the pooling decision (Walch et al., 2019). Even though the performance of the more complex models was superior on the independent test set, the simpler models were smoother and more robust. An optimal monitoring system should always be based on several prediction models based on different sensor combinations in order to react on sensor fouling or sensor shifts. If one sensor fails, the real-time monitoring can easily be based on an alternative model where the specific sensor is not used. The technology transfer of the monitoring system with multiple sensor (Sauer et al., 2019; Walch et al., 2019) revealed that only a subset of possible prediction models could be used for real-time prediction at the different sites as the fluorescence device was not robust enough (Christler et al., 2021). Simpler models without fluorescence could still be used at the different sites. However, due to the very specific properties of the individual sensors, the performance of the prediction models could be considerably improved by new model training.