Modelling Greek Beverage Sales as a Hierarchical Time Series

An application of a hierarchical time series model on a dataset of historical soda sales in Greece.

Photo by Martin Lostak on Unsplash

On a hot summer day, you walk into the local store, looking for something to quench your thirst. You make a turn down the beverage aisle, and are greeted by a fully stocked shelf with a wide selection of ice cold sodas. It’s almost something that we take for granted, that the shelves never seem to go empty. Behind the scenes, forecasting is a critical part of every store’s operations to ensure that inventory levels are kept at just the right level. In this article, we explore the benefits of using a more granular method of time series forecasting, hierarchical time series.

The code used to create the graphics and models described below can be found here.


Using the first 6 years of data (2012–2017) as our training set, we can validate our model forecast on the last year’s (2018) data.

Hierarchical structure

Partial tree diagram illustrating the time series hierarchy. The full tree contains 127 nodes and is too wide to be shown here. The full tree can be constructed using the code linked above.

The hierarchy we’ve chosen starts from the total aggregated sales at the top, which is then divided by individual store sales, then divided by brand, then by individual product. Of course, the levels of the hierarchy could be ordered differently if one chose to do so (and might possibly result in a more accurate model), but if you think of what goes through a consumer’s mind when buying a drink, this is quite an intuitive hierarchy. First, you decide which store to go to, then what kind of drink you want to buy, then lastly, how much (or what container) you want to buy. In total, there are 6 shops which each sell 5 different brands, which each have 3 different container sizes for a total of 90 products; or up to 90 separate time series.

We can also visualize our time series at each level of the hierarchy. First, viewing our total sales, we can observe some clear seasonality. Sales appear to rise in the spring, peaking and dropping in the summer, briefly rising again in the fall before reaching a trough in the winter.

Total sales over time.

Moving downwards in our hierarchy to the shop level, we can still observe strong seasonality, but note that we can now sometimes see some irregularities in the pattern — there are larger differences in peak heights, and the some years have more jagged sales patterns than others.

Total sales over time. Each plot represents sales at a different shop.

Digging another level deeper: each plot below represents the sales of a certain brand at a given shop. It’s tiny, but if you squint, you can see even more irregularities.

Total sales over time. Each plot represents sales of a given brand at a given shop.

At the bottom level of the hierarchy, the time series become even more irregular. It would be quite difficult to fit all 90 plots into one picture, so below are the time series for each of the individual products sold at shop 1 (it looks similar for other shops).

Sales over time at shop 1. Each plot represents sales of a given brand, and each line represents a given product.

As we deepen the hierarchy, there appears to be more and more noise at the bottom level. This makes sense, as when you move up the hierarchy, the time series are aggregated (in a sense, averaged), which hides some of the noise.


Forecasting Model

Forecast Reconciliation

Implementation and Results

Shop-level forecasts

Forecasts plotted on top of actual data for shop-level data. Total sales in the top left.

Let’s zoom in on just the forecast portion of the total sales, so it’s easier to examine. In the image below, the forecasts resulting from two reconciliation approaches are shown. The blue brand represents the 95% prediction interval for the hierarchical forecasts reconciled using the MinT approach, and the slightly wider red band represents the forecasts if we were to forecast the total sales directly (non-hierarchical method). The slightly bumpier black line is the actual test data.

95% prediction intervals for the hierarchical forecast reconciled (shop-level) using MinT (in blue) are narrower than those under a non-hierarchical forecasting approach (in red).

To further compare the two methods, we can use an error metric such as MASE (a wide variety of metrics could be used, the relative performance of the two methods will be similar). The hierarchical forecast achieved had a MASE of 0.817 while the non-hierarchical forecast came in with a slightly lower error at 0.793.

Brand-level forecasts

In the below picture, we see the forecasts at shop 6. Since we have gone lower in our hierarchy, there is more noise, and some forecast reconciliation methods are not working as well. Particularly, the green line in each of the plots below represent the forecasts which would be used under the bottom-up approach, which deviate quite a bit from the actual data.

Forecasts for each brand (and aggregated) at shop 6. We can see that some reconciliation methods begin to break down in the presence of the additional noise.

We see a similar result when these forecasts are aggregated at the shop level, and at the top level.

Brand-level forecasts aggregated at the shop level. Total sales in the top left.

Even though the bottom approach seems to be poor, when we compare the MinT reconciled hierarchical forecast with the non-hierarchical forecast, the results are still quite good. Again, we see that the prediction intervals are narrower under the hierarchical approach. MASE under the hierarchical approach is 0.821 (compared to the same non-hierarchical MASE of 0.793).

95% prediction intervals for the hierarchical forecast reconciled (brand-level) using MinT (in blue) are narrower than those under a non-hierarchical forecasting approach (in red).

Product-level forecast

Bias-Variance Tradeoff

It is not quite visible in the plots above, but the variance of the errors has also decreased as we extended the depth of the hierarchy. Creating forecasts from the brand-level resulted in a variance that was about 5% lower than forecasts created at the shop-level.

Conclusions and Further Considerations

In practice, many time series exist in an aggregated form, though data at a more granular level is hard to obtain, or the subgroups may be unclear. In these cases, clustering analysis might be used as a preliminary step to identify an effective hierarchical structure.

mostly for fun

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store