TERM DEFINITION
Artificial neural network (ANN) Technique that is build up of a network of interconnected nodes (neurons) that process signals over weighted arcs. Central to deep learning with a wide range of complex variations
Convolutional neural network (CNN) Class of neural networks that is charactericed by convolution filters that slide over input data to extract relevant patterns. Frequently used in in deep learning-based image analysis
Decision tree Among the most popular ML algorithms that learns to split data on certain conditions of variables to classify or predict an outcome variable, creating an hierarchical tree shape with predictions in leaf nodes
Generative adverserial network (GAN) Class of deep neural networks for the generation of new data samples. GANs has formed a rapidly advancing field since the 2016 introduction, used in applications such as deepfakes
Gradient boosting Machine learning model type that uses an ensemble of weak prediction models (often decision trees), optimized over an differentiable loss function. XGBoost and LightGBM are popular algorithms in this family
k-means Unsupervised clustering method that aims to partition observations into k clusters, where each observation is mapped to the closest cluster centroid
LightGBM Popular ML algorithm of relatively recent origin (2016), similar performance to XGBoost but with more efficient training due to improved decision tree splitting strategy
Natural language processing (NLP) The discipline in AI involved in the understanding of written and spoken human language
Overfitting A model that captures the training data too closely, hereby hindering generalization and prediction on future data
Principal component analysis (PCA) Dimensionaliry reduction technique that uses linear transformation to map data to a lower dimension than the initial data
Random forest Popular ML algorithm that builds an ensemble of decision trees, improving on the performance and generalizability of single decision trees
Support vector machine (SVM) Supervised model that aims to find the optimal hyperplane that best seperates different categories of observations
Tabular data Data that is organized in a table with rows and column
Transfer learning Technique to improve model learning by leveraging knowledge gained on a related problem. Often used for recalibrating large-scale pre-trained deep learning models
Unstructured data Data that has an internal structure but one that is not represented in a row-column table, such as image, text and audio
XGBoost Popular ML algorithm that uses gradient boosting and builds decision trees iteratively, often delivering best-of-class performance and fast model training