Figure 3: Concept how a soft sensor is generated by correlation of offline measurement with on-line signals from sensors and process parameters for Critical Quality Attributes QCAs.
Correlation analysis is used to measure the extent of the relationship between variables (Varmuza et al., 2009). Typically, only the extent of the linear relationship is considered using Pearson correlation. However, there are also methods available to measure nonlinear relationships, e.g., Spearman’s rank correlation, which is a nonparametric measure of rank correlation, reporting the statistical relationship between the rankings of two variables (Lee et al., 2000). The correlation structure in the data has a major impact on the subsequent analysis of the data. It is important to note that correlation does not imply a causal relationship between the variables, i.e., one variable is affected by another.
Often, we want to model a critical quality attribute (CQA) of a downstream process based on one or many input variables. Typically, measuring the CQA is laborious, cost intensive and takes hours or even days. On the other hand, the input variables are easy to measure online using sensors like the standard UV detector, the standard pH and conductivity probe or spectroscopic data. We distinguish between situations where the relationship can be described by a fundamental scientific law (first-principle model), by a relatively simple mathematical equation (based on physical/chemical knowledge) and purely data-driven models where we only assume that relationships exist (Varmuza et al., 2009). Coming up with a fundamental scientific law in downstream processing is a highly complex task which requires an immense number of experiments, therefore data-driven approaches appear to be more suitable (Rathore et al., 2022a). A compromise is sometimes found in so-called grey-box models or hybrid models (Hong et al., 2018; Simon et al., 2015a). Hybrid modelling approaches have already been successfully used in downstream processing to model the flux evolution and duration of ultrafiltration processes (Krippl et al., 2020; Krippl et al., 2021) and capture chromatography (Narayanan et al., 2021) (Figure 4).