Models for Understanding versus Models for Prediction

Models for Understanding versus Models for Prediction by Gilbert SAPORTA 15.12.2016 Antoine Tordeux Forschungszentrum Jülich, Germany

Models for Understanding versus Models for Prediction COMPSTAT 2008, pp. 315 322 Opposition between two modelling approaches in statistic (and elsewhere) : 1. Model to understand : Parsimonious representation of data to identify underlying mechanisms which have produced it. 2. Model to predict : Model intentionally complex (very large degrees of freedom) that are assessed by its performances to predict new observations. Author : Gilbert Saporta University professor emeritus at the CNAM Research field : Statistic (famous statistician in France) Models : Understand or predict? Introduction Slide 2

Content Models for understanding Models for prediction Applications Models : Understand or predict? Introduction Slide 3

Content Models for understanding Models for prediction Applications Models : Understand or predict? Models for understanding Slide 4

Models for understanding Models for understanding : Identification of underlying mechanisms Insights in the nature of the phenomenon of interest Few parameters that should be interpretable Parsimony principle Occam s razor attributed to William of Ockham (1287 1347) Among competing hypotheses, the one with the fewest assumptions should be selected and also : Ptolemy (90 168) We consider it a good principle to explain the phenomena by the simplest hypothesis possible, Isaac Newton (1642 1727) We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances, Albert Einstein (1879 1955) Everything should be made as simple as possible, but not simpler,... Models : Understand or predict? Models for understanding Slide 5

Models for understanding : Examples Model f ( ) : y = f (x; θ) + ε y : Variable to explain/predict Dependent variable, regressand, output variable,... x : Explanatory variables Independent variable, regressor, input variable,... θ : Parameters of the model Variable to calibrate and interpret ε : Unexplained part Noise with amplitude σ Parameter calibration : Least-squares, likelihood-based, Bayes + Confidence interval Model choice : Information criteria (Akaike, AIC ; Bayesian, BIC) + Test Regression model, linear regression, principal component regression, partial least squares (PLS regression),... Models : Understand or predict? Models for understanding Slide 6

Models for understanding : Limit Difficulties with Big data Observation number n Concentration of the likelihood : Information criteria AIC = 2ln ( L n(ˆθ) ) + 2k or BIC =. 2 ln ( L n(ˆθ) ) + ln(n)k tend to select models with minimal number of parameters Everything is significant (CI = [ˆµ ± qˆσ/ n ] = {ˆµ}, cor = 0.01 significant,... ) Difficulties with nonlinear/nonmonotonic relationship Complex phenomena Correlation : Linear relationship / Least squares : optimal for linear models Non-linear transformations to be initially determined (task that can be difficult) George Box (1919 2013) Essentially, all models are wrong, but some are useful Models : Understand or predict? Models for understanding Slide 7

Models for understanding : Illustration

Content Models for understanding Models for prediction Applications Models : Understand or predict? Models for prediction Slide 9

Models for prediction Data mining : KDD, big data Gregory Piatetsky-Shapiro, 1980 A model is merely an algorithm coming more from the data than from a theory Thanks to high computational power of modern computers Focus is no more accurate estimation of parameter or adequacy on past observations but the predictive ability, i.e. capacity of making good predictions Black-box model Vladimir Vapnic, 2006 Same formulation y = f (x; θ) + ε but here f is a very complex (non-linear) function (f has in general no explicit definition) and the dimensions of x and θ are high AI : Auto-formulation and calibration of the model (automatic Bayesian approach, unsupervised classification, machine learning,... ) Models for prediction : Neural network (e.g. multilayer perceptrons), support vector machine, genetic algorithm, decision tree,... Models : Understand or predict? Models for prediction Slide 10

Can we open the black box of AI? Davide Castelvecchi, Nature 538, 20 23, 2016 Illustration by Simon Prades

Models for prediction : Theory Risk minimization L is a loss function, the risk R = E(L) is the expectation of the loss Empirical risk : R emp = 1 n i L( y i, f (x i ; θ) ) h(ln(2n/h)+1) ln(α/4) Vapnik s inequality : R < R emp + n with h the Vapnik Chervonenkis dimension (i.e. the cardinality of the largest set of points that the algorithm can shatter prediction ability). No distributional assumptions are necessary (only h << n) Selection of the model with minimal bound for R : Ratio h/n is of interest Increase of the complexity and prediction ability h as n increases Models : Understand or predict? Models for prediction Slide 12

Models for prediction : Practice Empirical model choice VC-dimension difficult to evaluate in practice Empirical approach : Trade-off between the fit and robustness of a model Repeat in a K-Bootstrap loop :. S k is the k-th bootstrap-sampling ; partition S k in two sub-samples Sk 1 and S2 k Sk 1 : Training set used to fit the models S 2 k : Validation set use to estimate prediction error E k (Cross-validation) Select model with minimal mean prediction error E = 1 K k E k Models : Understand or predict? Models for prediction Slide 13

Models for prediction : Illustration

Content Models for understanding Models for prediction Applications Models : Understand or predict? Applications Slide 15

Models to predict : Applications Big data (KDD, data mining,... ) Video analysis (perception, facial identification,... ) Game (Go game,... ) Modelling of the brain Reliability (decision tree,... ) Robotic and autonomous vehicle Web (optimization, social networks,... ) Bank/Insurance (marketing, customer relation, risk assessment, fraud detection,... )... Models : Understand or predict? Applications Slide 16

Applications : Autonomous vehicles Driving situations very varied / Driving process poorly structured (F. Saad, 1987) Defining an understandable model giving satisfying response in any situation is not possible (especially in urban/dense situations or for mixed flow) Autonomous driving is a typical application field for the models for prediction Motion planning of autonomous vehicles by machine learning actively developed since the 1990 s (currently extensively developed) Neural networks, genetic algorithm, simulated annealing,... Projects : Eureka (1985), Cybercar (1997), Darpa Challenges (2004 07), Google Car (since 2010), Tesla (since 2014), PROUD (2015), DELPHI (2016),... Models : Understand or predict? Applications Slide 17

Autonomous vehicles: Example (1) Premise work (Autonomous steering Stanford University, 1992) Neural networks learning based on video analysis Experiment : 2mn learning (120 obs) Autonomous steering in curved roads Models : Understand or predict? Applications Slide 18

Autonomous vehicles: Example (2) Recent work (End-to-End Deep Learning for Self-Driving Cars, Bojarski et al., 2016) Convolutional Neural Network learning based on video analysis DAVE-2 Project (DARPA Challenge) Neural network : 27 M connections and 250 000 parameters! Training phase CNN architecture Models : Understand or predict? Applications Slide 19