Figure 3: Concept how a soft
sensor is generated by correlation of offline measurement with on-line
signals from sensors and process parameters for Critical Quality
Attributes QCAs.
Correlation analysis is used to measure the extent of the relationship
between variables (Varmuza et al., 2009). Typically, only the extent of
the linear relationship is considered using Pearson correlation.
However, there are also methods available to measure nonlinear
relationships, e.g., Spearman’s rank correlation, which is a
nonparametric measure of rank correlation, reporting the statistical
relationship between the rankings of two variables (Lee et al., 2000).
The correlation structure in the data has a major impact on the
subsequent analysis of the data. It is important to note that
correlation does not imply a causal relationship between the variables,
i.e., one variable is affected by another.
Often, we want to model a critical quality attribute (CQA) of a
downstream process based on one or many input variables. Typically,
measuring the CQA is laborious, cost intensive and takes hours or even
days. On the other hand, the input variables are easy to measure online
using sensors like the standard UV detector, the standard pH and
conductivity probe or spectroscopic data. We distinguish between
situations where the relationship can be described by a fundamental
scientific law (first-principle model), by a relatively simple
mathematical equation (based on physical/chemical knowledge) and purely
data-driven models where we only assume that relationships exist
(Varmuza et al., 2009). Coming up with a fundamental scientific law in
downstream processing is a highly complex task which requires an immense
number of experiments, therefore data-driven approaches appear to be
more suitable (Rathore et al., 2022a). A compromise is sometimes found
in so-called grey-box models or hybrid models (Hong et al., 2018; Simon
et al., 2015a). Hybrid modelling approaches have already been
successfully used in downstream processing to model the flux evolution
and duration of ultrafiltration processes (Krippl et al., 2020; Krippl
et al., 2021) and capture chromatography (Narayanan et al., 2021)
(Figure 4).