Three Types of Precision and Their Usefulness in Forecasting
This article is a section from my upcoming book "An Introduction to Probabilistic Planning and Forecasting" edited for the current context. Readers are invited to comment here. Any comments provided may be used to improve the text in the book and if used credit will be given, with permission.
In a previous LinkedIn article, the difference between accuracy and numerical precision was explained, applied to time-series forecasting. That article is a prerequisite for a proper understanding of the current article.
In the course of the previous sections [in the above mentioned and this LinkedIn article], a few examples of precision were given. Here we explore the different appearances of precision more structurally. Precision measures a level of specificity. When a forecast is expressed as a series of exact numbers it is numerically extremely precise. However, if that same forecast is provided at some level of aggregation, say by product family or sales territory, it is not as precise granularly as a forecast provided by SKU and location. This forecast is precise in one way, but imprecise in another. Precision can assert itself in three different ways:
- Arithmetic precision - number of significant digits for a value
- Stochastic precision - probability distribution of possible values
- Granularity - grouping or level of aggregation of values
For all three of these the same golden rule applies:
You want to be as precise as possible, but not more precise than the data allows.
A second rule holds in general:
As you increase precision it becomes more difficult to be accurate
This second rule often constrains how precise we can be to avoid unwanted erosion of accuracy. Combined with the first rule, one effect is that every forecast has a level of precision beyond which it is impossible to be accurate at all. This level is typically dictated by the precision of the data, but might be less precise based on other characteristics of the data, such as sample size or level of noise. The deterioration in the second rule starts happening before reaching the tipping point expressed by the golden rule. We will now explore how these rules apply to each of the three types of precision.
Arithmetic precision
Arithmetic precision is determined by the number of significant digits of a value. This type of precision is found in all applied sciences and is a key difference with theoretical mathematics. If you were to ask a theoretical mathematician what the difference is between 3.1415926535 and 2.1 the answer will be 1.0415926535. If you were to ask the same question from an applied mathematician the answer would instead be 1.0. The reason is that of the two differenced numbers one only has two significant digits, and hence in applied science, also the result may not have more than two significant digits. Any additional digits are considered a pure guess, or in other words likely completely inaccurate. The underlying reason is that the most significant digit is where the limit of measurement precision lies and it could have been rounded up or down from the next digit. For example, the 2.1 could have been rounded from a true value as low as 2.05, or as high as 2.15. Because in all likelihood there will already be an error in the third digit it is deceptive to provide a value for it, and stupidity to provide a value for fourth and further digits. A forecast of 1 thousand units will be accurate if the actual value is between 500 and 1500 units, whilst a forecast of 1.0 thousand units would only be accurate between 950 and 1050 units, and a forecast of 1.00 thousand units would be accurate between 995 and 1005 units. Each of these forecasts is more precise arithmetically than the previous one, but harder to make accurate.
Most forecasts created in spreadsheets and commercial software alike are oblivious to this kind of precision and the (theoretically) correct way to measure accuracy for them. Practically, this is fine, since this type of precision has a very limited impact in the time-series forecasting domain. The impact of the other types of precision dwarf this one. Also, historical data generally has high arithmetic precision, leading to the same high precision being allowed for any output. There is one mistake in forecast arithmetic precision that is common. Often demand is recorded in integer units (e.g. whole cases or whole packs) but a forecast is then provided in fractions. This is typically criticized for being unrealistic since partial units would never be sold as such. The less obvious mistake is that the output has higher precision than the input data. This kind of output is hence neither useful nor reliable.
Stochastic precision
The other type of numerical precision is stochastic precision. Unlike arithmetic precision, this type has a large impact on the reliability of forecasts. It expresses the amount of uncertainty of a predicted or measured value. It is often provided as a probability distribution, or through some metric of dispersion such as standard deviation, or via confidence levels. It is this type that is most often confused with accuracy and included in metrics that convolute accuracy with precision. To illustrate why a stochastic precision metric is still required, consider the following scenario. Assume we have seasonal demand with variability identically, normally distributed around the seasonally changing mean. Now let's assume we need to compare two forecasts for this demand. One forecast incorporates the seasonality and looks like the graph in figure 6.19 below:
Figure 6.19: a seasonal forecast (blue) with 95% confidence levels (red)
The second forecast ignores seasonality and looks like figure 6.20:
Figure 6.20: a non-seasonal forecast (blue) with 95% confidence levels (red)
When measured across all 52 week periods of the year, both may achieve a 100% accuracy *), but it should be clear that the top forecast is much more useful. The difference in the quality of these forecasts is in their precision. The second forecast is 68% less precise than the first one. Note that we can improve our forecast from case two to case one without losing any accuracy or introducing bias. So, while a useful forecast is one that maximizes accuracy at the expense of precision, it does not imply that we should forego precision. It means we need to prioritize accuracy, then maximize precision in a way that does not negatively impact accuracy.
*) Note that traditional metrics that blend accuracy and precision are often called accuracy metrics. Using such metrics, a 100% blended accuracy is impossible, and neither forecast would be able to achieve it. Using such metrics the forecast of figure 6.19 would erroneously be considered more accurate than the one in figure 6.20, but that is actually precision at work. A downside of isolating pure accuracy is exhibited here: it is oblivious to this difference. Hence precision, an orthogonal quality, must also be measured.
The forecast in figure 6.20 is easier to generate than the forecast in figure 6.19 since it does not require extracting the seasonal signal from the historical data. The increase of precision in figure 6.19 is worth the effort because it does not diminish accuracy. It may be possible to reduce precision further, but at some point, any extra reduction will come at the expense of accuracy. At that point, the confidence in our forecast starts to exceed the quality of that forecast. In the extreme, plans and decisions are made using only the middle forecast value with complete disregard of the uncertainty around that value. This is by far the most common and most detrimental mistake made in planning, and the one leading to the oft-heard lament that forecasts are always wrong. It is generally accepted that residual error dispersion of statistical forecasts has a strong downward bias. Using it in plans means an underestimation of uncertainty, but the pervasive behavior of ignoring it altogether is much worse still. Both are violations of the golden rule to not be more precise than the data allows. Most chaos and firefighting in supply chains are caused by the lack of accuracy due to this ignorance of stochastic precision, both in the forecast itself and throughout all the downstream planning processes.
Granularity
The third type of precision is granularity. It determines how fine or coarse the detail is at which a forecast or plan is created. Granularity in forecasting is measured along four dimensions:
- Product
- Source location
- Destination location
- Time
Granularity in planning processes could include additional dimensions, each of which occurs in detail or at some grouping. Examples of these dimensions are:
- Resource
- Process
The most detailed granularity in the product dimension is typically the stock-keeping unit (SKU). In various industries, it is common to forecast and plan on groupings of such SKUs. For example in fashion, garments are generally forecasted per style. Each style will have multiple sizes, and sometimes multiple colors. Each unique combination of size, color and style is an SKU. For longer-term plans, products may be grouped even coarser into product types or product families. As more SKUs are grouped together, the granularity becomes coarser and the precision becomes lower. Similarly, in the source location dimension, the most detailed may be ship-from warehouse or distribution center. Depending on the planning problem, these may be grouped by manufacturing plant, supplier, country or business unit. Production scheduling does not care about greater detail than the sum requirements for the plant. Distribution planning, on the other hand, needs to know the exact warehouse where the product is required. In the destination location dimension, the finest granularity may be the ship-to location such as the customer receiving warehouse or in some scenarios the store where the product is sold to the consumer. Planning processes may then group these by customer account, key account groups, consumer segment, or sales territory. The granularity in the time dimension includes not just the time period size, but also the number of such periods that are grouped together in plans. For example, a weekly forecast may be used to determine a 4-week requirement for manufacturing running on a roughly monthly production cycle. The time granularity of the forecast is 1 week, but the time granularity of the input data of the production schedule is 4 weeks. This grouping leads to a loss of precision but generally results in a gain of accuracy. If the loss of precision is not material, then the increased accuracy is free. [In the discussion whether to use lags or accumulated forecasts, this is another consideration.] Common time granularities in planning are days, weeks, months, quarters and years. In detailed scheduling, time granularity could be much finer: minutes or even seconds. Resource granularity could be by individual machine or truck, work center or 3PL, an anonymous pool of capacity, or a total across a plant or fleet. Process granularity could be per individual critical process or a grouping of processes that always occur sequentially or includes processes that are not constrained.
For forecasts, the granularity is determined by the combination of the granularities of each of the first four dimensions. For plans, it is often a bit more complicated. Across multiple bills of material and routing steps, each could have slightly different granularity, all within the same plan. For example, in a production schedule, critical resources may be scheduled individually, whilst noncritical resources may be scheduled as a pool of anonymous capacity. Capacity planning may need a forecast granularity of product family/plant/month where the resource, destination location, and process dimensions are irrelevant. Master production scheduling may have a need for a granularity of SKU/plant/week, which is finer, but also not in need of detail in the other dimensions. Promotion planning may need a granularity of SKU/account/week, caring about the destination but not the source warehouse or plant. If a single forecast needs to provide the input for all these it needs a granularity of at least SKU/warehouse/account/week, but possibly SKU/warehouse/account/day if a sufficiently accurate direct conversion from weeks to months is not achievable.
Figure 6.21: the second rule: as granularity increases, it gets harder to be accurate
Granularity is often dictated by business needs and capabilities. This is one area where the trade-off between accuracy and precision is commonly understood, albeit not necessarily in these exact terms. Before a forecasting solution is implemented a decision on the granularity has typically already been made. Stochastic precision is then a result of the capabilities of the tools used to generate a forecast or plan at the chosen granularity. Both have a similar impact on the ease of obtaining accuracy. As an example of the second general rule, consider a commonly encountered situation. A statistical forecast is supplemented with a great effort by human collaborators to achieve a 60% accuracy at a monthly/factory/SKU level measured using some traditional combination accuracy metric, such as MAPE. Upper management then dictates that it needs to be adjusted to match its quarterly revenue forecast by division and continent, based on their historical accuracy of 99%. Every single time this occurs, it causes mayhem and loss of accuracy at the finer detail level. A few things go wrong here, but an underlying misunderstanding in the decision process is that 99% combination accuracy at the coarse granularity would be better than a 60% accuracy at the fine granularity. It may be, but usually, it is not. The 99% accuracy occurs at a level that lacks all detail required to operationalize it. The 60% accuracy may seem bad, but forcing a 99% top-level accuracy down to the same operational level tends to lead to much lower accuracy still, often as low as the 10%-20% range. This is the second general precision rule in action. The incredible imprecision at the top-level leads to incredibly misleading accuracy, and incredible loss of that accuracy when pushed to usable precision. The application of the second precision rule to granularity is illustrated in Figure 6.21 with Rubik's cubes of various granularities. Evidently, the more granular they get, the harder they become to solve. But one may question if solving the complete 2x2x2 cube is more impressive than solving just one side of the 7x7x7 cube?
Note: one part of the confusion is that the top-level accuracy is plan accuracy, whilst the low-level accuracy is forecast accuracy. The first is what you are doing, the second is what you expect to happen under currently known conditions. In other words, the first is practically a self-fulfilling prophecy. Rather than force it down onto the lower level forecast, it should be considered a target, and any gaps made topics of planning decisions. The combination of an overly ambitious pursuit of a single-version-of-the-truth with a misunderstanding of the value of accuracy at different precisions is the cause of the problem in these cases.
Another commonly made mistake is the opposite one of providing too much precision, a violation of the golden rule. A full understanding of granularity as a measure of precision will help identify when precision is finer than the data allows. For example, if historical demand data is provided as monthly totals, then using it to calculate a weekly or daily forecast or inventory level is dangerous. This may be the most common mistake in forecasting and planning made by businesses, by consultants, and even built into commercial software systems. Like the arithmetic precision example, here the resulting forecast is nothing more than a wild guess. Making decisions at the detail level based on groupings or aggregations of data in the other dimensions is similarly fraught with danger. Whilst it is well known that accuracy is often lower at detail than at grouped levels, most practitioners are unaware that providing a high-accuracy grouped forecast leads to much lower accuracy after splitting to a detail level than if the forecast was made there in the first place. In this case, the grouped forecast is input data to a planning process, which then attempts to translate it to a more detailed requirement. Again, the violation of the golden rule here leads to horrendous results in general. The correct way to tackle this is to provide the forecast algorithm with detail data, let it generate a detail level forecast and let the planning process consume that to make its own detail decisions. The forecast accuracy metric may look less flattering, but the results for the business will be much better.
Putting it in order
In a properly designed planning and forecasting environment the order in which each of the above is known is as follows:
- Assess the needs of the business. For all planning and execution processes determine the ideal and minimum acceptable detail required for proper decision making. The resulting design determines granularity and arithmetic precision.
- Generate forecasts and plans. The stochastic precision is now known since this is determined both by the problem and the capabilities of the tools used.
- Execute and record actuals. Now accuracy can be assessed as well.
Getting precision right is predominantly done in the design phase. Any mistakes here will require painful and costly reimplementations later. The golden rule is the most basic and critical piece of that. For each planning and execution process determine the required precision of the outputs. This determines the required precision of the inputs of the same process. Map all outputs to all other processes consuming these as their inputs. If there are conflicts, iterate through precision refinement cycles until no outputs break the golden rule with regards to their respective inputs. This then enforces both the required precision of the data and the capabilities of any tool that may be used to process the inputs and generate the outputs. If a deterministic plan or forecast is created, precision is fully known at this time. Stochastic precision will be 100%. If a probabilistic plan or forecast is created, the stochastic precision will not be known until the testing phase when data is being fed into the system and output can be measured. Any error in stochastic precision can be corrected during testing iterations and should continue on an ongoing basis for the lifetime of the process. If there is excessive inefficiency in the supply chain, increasing precision should be the focus. If there is instability in the supply chain, increasing accuracy should be the focus, which may, in turn, lead to lowering precision.
This excerpt of the book explained three different types of precision in the time-series forecasting domain, two of which have a significant impact. It should be clear that between two forecasts, both having the same accuracy, if one has a higher precision it is more useful and of a higher quality. But if one forecast has lower accuracy and higher precision, that judgment is not so easy. Is 1% MAPE at division/continent/quarter precision better than 40% MAPE at SKU/location/week precision? Sometimes yes, sometimes no. The problem is, MAPE cannot give a conclusive answer because it is an arbitrary blend of stochastic precision and accuracy and it is oblivious to granularity. Most traditional metrics are in fact arbitrary blends and oblivious to granularity. But to judge accuracy across multiple precisions a blended metric might be exactly what is required. Upcoming excerpts will introduce a forecast quality framework that covers each quality separately and offer various ways to combine them into one or a few complementary metrics. Other articles present excerpts from the book that cover related content:
- The difference between accuracy and numerical precision
- How precision and accuracy complement each other
- A complementary numerical precision metric to the Total Percentile Error
If you are interested in probabilistic planning and forecasting please consider joining the "Probabilistic Supply Chain Planning" group here on LinkedIn.
Find all my articles by category here. Also listing outstanding articles by other authors.
Hey Stefan! How are you doing? Got anyone asking about forecast models that incorporate coronavirus slowdowns, and multi-chain drop-outs and demand issues due to it? Just curious whether people are talking about it on your side of the block.
CEO bei SPL
4yScenario: A buyer has to decide how many items of a product to buy. He knows the forecast distribution /Confidence limits e.g. 150 (upper limit) and 50 (lower limit). What can he learn from the facts, how many items to buy? - Do I get it right, that this practical question is outside the area you are discussing? Am I on the wrong track?
CEO bei SPL
4yHi Stefan, for me it is difficult following and comprehending in depth your philosophy of goodness of forecasts. The first reason might be, my brain is not good/quick enough for your speed of publication. Secondly I am not able reading all of your blogs in time. So in case I miss a bloc I am cut off understanding the next ones. - But on the other side I would like joining the discussion e.g. by introducing "similarity" e.g. of the actual-/ and forecast distribution. To me precision, accuracy and similarity are logically linked. My thinking is may be too simple: "Nearly right is better than precisely wrong!"
Along with your previous LI article, I like this article very much too. I am learning something from this.