This is part 35 of a series of articles featuring the book Beyond Connecting the Dots, Modeling for Meaningful Results.
The three distinctions just presented—deterministic vs. stochastic, mechanistic vs. statistical, aggregated vs. disaggregated—can be used to classify models. We can even use them to classify the models we have discussed in this interactive learning environment (ILE). Most of our models would be classified as deterministic (random chance is generally not explicitly incorporated in these models), mechanistic (we generally assume mechanisms rather than estimating dependencies from data), and highly aggregated (the agent based models are an exception).
There are many nuances to these broad distinctions (e.g., the type of statistical techniques used for a statistical model). Many other distinctions can be made between model implementations such as, for example, the programming language or software that was used to implement the model. These distinctions and technical choices are important when constructing a model, however, what is of key importance is the utility of the model for fulfilling a specific goal.
Technical details matter—they can affect maintainability and other factors—but they are of secondary interest to the adequacy of a model in fulfilling its main purpose. It would make as little sense to say a model was fundamentally bad because it was written in a relatively ancient programming language such as Fortran, as it would to say a model was fundamentally bad because it was, for instance, deterministic. Let’s look back at Box’s quote at the beginning of this chapter. We know all models are wrong, what we should really care about is their utility in meeting a specific task.
So rather than using the aforementioned technical classifications to discuss models, we think it is more useful to base our discussions of models on the model’s driving purpose. This allows us to leave behind relatively mundane technical and implementation details to focus on what we really care about. Among the many different reasons for building models, they all boil down basically to three broad purposes displayed in Figure 1: prediction, inference, and narrative.
Prediction: Models used for prediction are the most straightforward. They attempt to forecast an outcome given information about variables that may influence that outcome. A weather forecast is an example of a model used for prediction. Likewise, when you apply for a credit card, the bank runs a predictive model to determine your risk of not paying them back and defaulting. Life insurance companies use a model that predicts how long an applicant is expected to live. The results determine the premium charged. All these models take in data (the current temperature for the weather forecast, the amount of money in your bank account for your risk of default, your age for the life insurance application) and apply various forms of analysis to generate a prediction of the outcome.
Inference: Models used for inference are most common in academic research. Often, academic research questions distill down to this simple template: “Does X affect Y?” These are inferential questions1. As an example, a researcher may make a hypothesis statement such as, “The wealthier a high-school student’s family, the higher the student’s test scores will be”. The researcher may then build a model to test the validity of this hypothesis. The model’s results will generally be phrased in terms of a p value indicating the statistical significance of the evidence in support of the hypothesis.
Narrative: Models are often used to tell a persuasive story. When the Obama administration wanted to persuade lawmakers and the public to support their economic stimulus, they famously published the graph shown in Figure 2. A great deal of complex modeling and mathematics surely went into constructing this figure. However, its core purpose was to tell the nation a story: Things are going to be bad, but the recovery plan will make them less so. Such stories are at the heart of narrative models. We will return to this figure later and discuss why it is not really a predictive model despite it generating predictions.
All models can be classified in terms of these three primary purposes. We will see how useful it is to discuss modeling projects in this manner2.
|Classify each of these modeling tasks as primarily prediction, inference, or narrative tasks:
Next edition: Models and Truth: The Strange Case of Inference.
Header image source: Beyond Connecting the Dots.
- Predictions are also inferential results, but we prefer to discuss prediction and more hypothesis-testing types of inference separately. This distinction makes our understanding of modeling clearer. ↩
- And we strongly recommend doing so. It is important to clearly define the purpose at the start of a project. The techniques used and data required depend significantly on the model’s overall purpose. To be very clear, it is important to clarify at the outset whether your primary goal is to use a model for prediction or for narrative. Many modeling projects may attempt to do both only to find themselves with a model that does neither. ↩