A Multivariate Model for Electricity Demand using Facebook Prophet

A detailed case study in building a multivariate time series model to forecast daily electricity demand in Victoria, Australia.

Photo by Alain Duchateau on Unsplash

Electricity demand forecasting is critical to power grid management and operation. As electricity demand ebbs and flows cyclically throughout the days and seasons, power generators aim to sell excess capacity for the highest price, while filling excess demand at the lowest price. As such, the ability to predict electricity demand has real economic value.

In this case study, we will be using this dataset containing 6 years of daily electricity demand data in Victoria, Australia. Our model will be built using Prophet, a forecasting library currently under development by Facebook.

Our main objective is to create a model that can accurately forecast electricity demand on some reasonable forecast horizon.

The code for this case study can be found here.

Data Exploration

Our data has 2,106 rows, one for each day from January 1, 2015 to October 6, 2020. There are 14 variates:

  • date: Date
  • demand: Daily electricity demand (MWh)
  • RRP: Average Daily Recommended Retail Price (RRP)($/MWh), weighted by intraday demand
  • demand_pos_RRP: Total daily electricity demand at positive RRP (MWh)
  • RRP_positive: Average positive RRP, weighted by intraday demand ($)
  • demand_neg_RRP: Total daily electricity demand at positive RRP (MWh)
  • RRP_negative: Average positive RRP, weighted by intraday demand ($)
  • frac_at_neg_RRP: Fraction of the day (time) when RRP was negative
  • min_temperature: Minimum daily temperature (C)
  • max_temperature: Maximum daily temperature (C)
  • solar_exposure: Total daily sunlight (MJ/m²)
  • rainfall: Daily rainfall (mm)
  • school_day: (Y/N)
  • holiday: (Y/N)

While most of these variates are self-explanatory, the RRP system for electricity pricing in Victoria may differ from that where the reader is residing. Electricity consumers can choose from a selection of retailers from which to buy electricity. While electricity prices differ between retailers, the RRP is a reference price on which retail prices are based. The RRP, like any other price, is subject to the forces of supply and demand, and fluctuates throughout the day.

Electricity Demand

Visualizing the data as a time series, we can see that there is definitely a wavy pattern in electricity demand. Electricity demand peaks in the winter and summer months, with troughs during the spring and fall (readers in the Northern Hemisphere should recall that seasonal times are reversed in Australia). If your first guess at explaining this phenomenon was “climate control”, you’d probably be correct, as heating and cooling make up almost half of all residential electricity usage.

Daily demand over time (top) and maximum/minimum temperatures over time (bottom).

Lighting makes up a relatively small amount of residential and commercial electricity usage, but it is worth mentioning that we can observe the same cyclical wavy pattern in daily solar exposure. Good to see that Earth is still reliably rotating around the sun.

Daily solar exposure over time.

COVID-19 effects?

One might find it interesting that there does not appear to be a noticeable effect of the COVID-19 pandemic on electricity demand. According to the Australian Competition & Consumer Commission, “demand across the National Electricity Market (NEM) was down just 2 per cent in Q2 2020 compared to Q2 2019 with increased residential consumption largely offsetting the decrease in business consumption”.

Annual electricity usage appears to be quiet flat over the span of the dataset, with a very slight decreasing trend. Below is a graph of total annual electricity demand, up to October 6th of each year (since our 2020 data is incomplete).

Annual electricity demand up to October 6 of each year.

We also have a little more insight into electricity demand — split into positive RRP and negative RRP demand. Unsurprisingly, most electricity is sold at a positive RRP, and as such the plot for positive RRP demand looks a lot like the one for overall demand. Towards the latter half of 2020, we see that there are a few days where negative RRP demand was unusually high compared to previous years. Perhaps this was an effect of the COVID-19 pandemic on electricity consumption habits which we didn’t see earlier?

Daily positive RRP demand over time (top), daily negative RRP demand over time (middle), and daily fraction of energy at negative RRP over time (bottom).

Since RRP is a function of electricity demand (and not the other way around), RRP-related variates should not be included in the model we’ll be building to predict electricity demand.

Rainfall

2020 was one of the wettest years for Victoria — there was more rain in the first 9 months of 2020 than any of the previous 5 years.

Total annual rainfall in Victoria. Data point for 2020 only includes dates up to October 6.

School days and holidays

We compare the number of school days and holidays before October 6th of each year. The holidays are about the same (depending on the timing of the holiday in each year), though there were significantly fewer school days in 2020 (resulting from school closures due to COVID-19).

Number of school days in the year (left) and Number of holidays in the year (right). Calculated over January 1 to October 6 of each year.

Model Building

Univariate Model

The first iteration of our model is a univariate model (electricity demand vs. time); and as mentioned, we will be using Prophet.

Prophet is an extremely easy way to create additive regression models and create plots to visualize each of the components, straight out of the box. Using Prophet you can also avoid the hassle of tinkering with different parameters (e.g. p, m, d in an ARIMA model).

General form for an additive model in Prophet. y(t) denotes the response variate, g(t) denotes the general trend, s(t) denotes seasonality, h(t) denotes holidays, a(t) denotes additional variates, and epsilon is the error term. Image by the author.

In the image below, we see the components of our first model. The trend line shows the decreasing trend we saw earlier. Weekly seasonality shows us that electricity demand is lower during the weekends than the weekday. Lastly, the annual seasonality confirms the wavy pattern we saw earlier, that demand peaks in the winter and summer months, and troughs during the spring and fall.

Univariate model seasonality components: Trend (top), weekly seasonality (middle), and yearly seasonality (bottom).

Prophet also comes with diagnostics that can be used to evaluate the model. For example, it’s very easy to perform cross validation. After training the model using two years of training data, and cross-validating it using a one year forecast horizon every 6 months, Prophet automatically generates a plot of MAPE across the forecast horizon.

Results of cross-validation on the univariate model.

We can also take a look at the residual plot to get a feeling of whether our model is any good. There are several problems here:

  • The residuals are heteroskedastic. The variability appears to get larger during the summer months.
  • There still seems to be a slight wave pattern still visible in the residual plot (though you might need to squint).
  • We’d also like to reduce the variability of the residuals overall. The variability of the residual is similar to the scale of our seasonal components, which limits the usefulness of our model.
Univariate model residuals. Notice that the wave pattern is still present, and the residuals are heteroskedastic.

Prophet allows us to plot our fitted model predictions on the historical data. We can see that several points lie outside a generous 95% uncertainty interval.

Fitted univariate model predictions on the dataset. Dark blue indicates point estimates, and light blue indicates a 95% uncertainty interval. Notice several outliers above the uncertainty interval during the summer months.

Multivariate Model

Let’s try to improve upon the univariate model by including some of the other data we have. It’s usually not a good idea to just throw in all the variables you have in the model, as you’ll end up with a lot of noise.

Feature Selection

So which variables should we include in the model? The easiest way to start answering this question is to look at a pairs plot. The top row of the below pairs plot is the most important, as it shows the relationship between our target variate (demand) and the other variates.

Temperature (max or min) and demand are clearly related, so we’ll definitely want to include daily temperature in the model as a predictor. It’s a v-shaped (not linear) relationship though, so we’ll need to transform this variate first (see next section).

Demand also appears to somewhat depend on whether it is a holiday or not. We will also include this in the model.

There might be a very weak relationship, if any, between solar exposure and demand. Spoiler: after building the model both with and without this variate, and it performs slightly better without it. As such, we won’t include it. We also won’t include rainfall nor school day, since there is clearly no relationship to demand.

Pairs plot of demand and explanatory variates.

Feature Engineering

We want to create a variable that allows our model to capture the effect of temperature on electricity demand. In Prophet, all additional variates are treated linearly, and as such, we remedy the v-shaped relationship we observe in our temperature variate by splitting it into two.

The idea is that if the temperature is comfortable (about 20 degrees Celsius), you wouldn’t need to turn on the heat nor the air conditioning. If it gets too hot or too cold, that’s when you would start using electricity. Furthermore, it’s not clear whether the increase in electricity demand for heating (where gas heaters exist as an alternative) would be equivalent to that for cooling; yet another reason why we are using two separate variables to allow for separate scaling.

Two engineered variables. If we designate a “comfortable” temperature around 20 degrees Celsius, the top plot shows the days where the mean temperature is above 20 degrees, and the bottom plot shows the days where the mean temperature is below 20 degrees.

Below are the results of the second model. Note that there are now two additional components: the holidays, and the additional regressors. Note that in the additive component, we see a wavy pattern; the peaks are also higher in the summer than in the winter, which matches the demand patterns!

Multivariate model components. From top, effect of: trend, holiday, weekly seasonality, yearly seasonality, and additional regressors.

Using the diagnostics again, we’ve lowered the MAPE a little bit (certainly at the extremes).

Results of cross-validation on the multivariate model.

An examination of the residual plot shows that we’ve definitely improved upon the univariate model in a few ways:

  • We’ve reduced the variability of the residuals (verify this by comparing the y-axis scaling to the previous residual plot).
  • We’ve eliminated most of the wavy pattern in the residuals.
Multivariate model residuals. Unfortunately, some heteroskedasticity remains.

Using the fitted multivariate model on the historical data, all the data points now lie within the 95% uncertainty interval. Note how the estimates “reach out” further to capture the points that would have previously been classified as outliers under the univariate model.

Fitted multivariate model predictions on the dataset. Dark blue indicates point estimates, and light blue indicates a 95% uncertainty interval.

Discussion and Possible Improvements

So, is our model now perfect?

No. There is clearly some pattern in the residuals that our model hasn’t captured. Our model is likely under parameterized, and we are missing some variate which causes the increased variability during the summer months. One possible idea is humidity, which causes the perceived temperature to be higher during the summer months.

Is our model good enough? That depends.

The standard deviation of our residuals is about 5,000 MWh (daily electricity demand). In the case of a 3 sigma event, that’s still only 15,000 MWh, or 12.5% of the average of 120,000 MWh produced daily. While power systems are obviously flexible enough to handle this level of variable demand without catastrophic failure, costs are incurred in gathering sufficient supply from other sources.

What else can we try?

One might try the traditional methods of time series forecasting on this dataset (e.g. ARIMA). These models generally require more expertise, time, and computing resources; though, might yield better results than Prophet if done correctly.

Conclusions

In this case study we’ve started from raw data and come up with a reasonably accurate forecasting model using a limited number of variates through a variety of methods. We’ve showcased the power of feature engineering in adding predictive ability to a model, as well as the power of Facebook’s Prophet to simplify time series forecasting.

mostly for fun

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store