Identifying Variable Importance with Stoichiometric Balances
Since amino acid SBs provided information on the cellular metabolic state in terms of amino acid consumption, model-predicted key amino acid additions at specific time points provided the potential to improve cell growth and mAb productivity. However, to identify key stoichiometric balances from the growth model and the mAb production model, three aspects had to be considered: (1) a variable ranking heuristic that could provide a mathematical method to evaluate the importance of each variable at a given time point; (2) a directionality coefficient that could provide information on each variable’s relationship towards the response variable; and (3) the direction of the stoichiometric balance since positive and negative balances represented two distinct metabolic states. First, since each component in an OPLS model is described by a weighted contribution of each variable in the dataset, the cumulative squared sum of all the weighted contributions of each variable across all the components was calculated to represent the variable importance to projection (VIP). Since the VIP is a squared sum always resulting in a positive magnitude, the VIP was used as a ranking system for all the variables in terms of importance to the predictability of the model. Moreover, Powers et al described that the average VIP value compared across all variables is typically around 1 (Powers et al., 2020). Accordingly, VIP values greater than 1 were selected to be variables that significantly contributed towards the predictability of the response variable whereas those with a VIP less than 0.5 were presumed to have nominal contributions to the overall model. Variables with VIP values between 1 and 0.5 added to the accuracy and reliability of the model however did not significantly contribute to the predictive power of the model. Second, correlative directionality for each variable was averaged across all the components and represented as the magnitude and directionality of each variable’s coefficient. In such a case, positively correlated values represented those stoichiometric balances that would increase co-dependently with the response variable regardless of the actual magnitude of the variable. Lastly, since stoichiometric balances could exist either as a positive value representing greater consumption than the theoretical demand or a negative value representing a lack of consumption compared to the theoretical demand, the directionality of the stoichiometric balance was used to distinguish between nutrient rich and nutrient limited conditions. However, since the model-selected amino acid SBs were the weighted sums across all the batches, the directionality sign of the amino acid SBs was based on the process control from the training dataset. Accordingly, Table 2 shows the directionality signs of all the amino acid SBs from the process control condition and provides the reference values for biomass and mAb composition for each amino acid.
Based on the selection factors for variable importance, three distinct experimental criteria emerged for SBs to validate the growth and production models. The first Criterion consisted of positively correlated SBs with a VIP greater than 1 and a positive stoichiometric balance sign representing amino acids that are being favorably consumed greater than the theoretical demand. The second Criterion consisted of positively correlated stoichiometric balances with a VIP greater than 1 and a negative stoichiometric balance sign representing amino acids that are being favorably consumed less than the theoretical demand. The third Criterion consisted of positively correlated stoichiometric balances with a VIP less than 0.5 and a negative stoichiometric balance sign also representing amino acids that are being favorably consumed less than the theoretical demand but deemed unimportant by the model. For each Criterion however, the sign of the magnitude of the stoichiometric balance was representative of the process control cultures of the training dataset as opposed to the average across all the training batches as the goal of the model was to improve cell growth and mAb production beyond the current benchmark (Table 1). In all cases however, negatively correlated stoichiometric balances were disregarded since removing nutrient components from an existing chemically defined medium poses a greater operational challenge than supplementing additional nutrients. Therefore, the scope of this study focused only on positively correlated stoichiometric balances.
For both the growth model and production model, scaled positive coefficients from OPLS were plotted for each time-dependent stoichiometric balance grouped by amino acid (Fig. 2). For each plot, the variables displayed included those that fell within the VIP factor for each Criterion and the variables highlighted included those that met the stoichiometric balance sign factor for each Criterion (Growth Model: Orange Bars; Production Model: Green Bars). For each model and Criterion, amino acid cocktail feeds were developed for the corresponding day based on the highlighted amino acid SBs. Interestingly, most variables for both the growth and the production models for Criterion 1 had positive stoichiometric balance signs (Fig. 2a and Fig. 2b). This was representative of the high nutrient feed conditions since amino acids can routinely be saturated within the extracellular environment. However, Criterion 1 was designed to measure the effectiveness of providing an increased concentration of amino acids already being consumed beyond the theoretical demand. For Criterion 2, only a few variables were highlighted as important based on negative stoichiometric balance signs. For instance, only alanine (days 1, 2, 4, 5, 6, and 7), cystine (days 3, 4, 6, 7, and 8) , and glycine (days 1, 2, 3, and 7) were identified as important for the growth model (Fig. 2c) whereas the same variables with the addition of lysine (day 9) and methionine (day 9) were identified as important in addition to the former three for the production model (Fig. 2d). Interestingly, the high crossover of amino acids identified between the production model and the cell growth model supports the notion that increased total cells would produce increased antibody. Criterion 3, on the other hand, was primarily designed to measure the heuristic property of the VIP value and thus, served as a negative control. Only those stoichiometric balances were selected that provided minimal contribution to the predictive power of the model. The selected Criterion 3 amino acids were like that of Criterion two but were found important on different days providing additional justification that stoichiometric balances can also help highlight when a specific nutrient demand is needed by the cells (Fig. 3e and Fig. 3f).