Lab 6 - Time Series Methods

SOLUTIONS

Introduction

In today’s lab, you’ll practice building workflows with recipes, parsnip models, rsample cross validations, and model comparison in the context of timeseries data.

Packages

The Data

Today we will be using electricity demand data, based on a paper by James W Taylor:

Taylor, J.W. (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society, 54, 799-805.

The data can be found in the timetk package as timetk::taylor_30_min, a tibble with demensions: 4,032 x 2

date: A date-time variable in 30-minute increments
value: Electricity demand in Megawatts

data <- timetk::taylor_30_min

Exercise 1: EDA

Plot the data using the functions timetk::plot_time_series, timetk::plot_acf_diagnostics (using 100 lags), and timetk::plot_seasonal_diagnostics.

SOLUTION:

# - plot the data
timetk::taylor_30_min |>
  timetk::plot_time_series(
    date
    , value
    , .title = "Short-term electricity demand (30 min)"
  )

# - plot the acf, pacf
timetk::taylor_30_min |>
  timetk::plot_acf_diagnostics(
    date
    , value
    , .lags = 100
    , .title = "Lag Diagnostics - Short-term electricity demand (30 min)")

# - plot the acf, pacf
timetk::taylor_30_min |>
  timetk::plot_seasonal_diagnostics(
    date
    , value
    , .title = "Seasonal Diagnostics - Short-term electricity demand (30 min)")

Exercise 2: Time scaling

The raw data has 30 minutes intervals between data points. Downscale the data to 60 minute intervals, using timetk::summarise_by_time, revising the electricity demand (value) variable by adding the two 30-minute intervals in each 60-minute interval. Assign the downscaled data to the variable taylor_60_min.

SOLUTION:

# downscale the data (down to a lower frequency of measurement)
set.seed(8740)
taylor_60_min <-
  timetk::taylor_30_min |>
  timetk::summarise_by_time(
    .date_var = date
    , .by = "hour"
    , value = sum(value)
  )

Exercise 3: Training and test datasets

Split the new (60 min) time series into training and test sets using timetk::time_series_split
1. set the training period (‘initial’) to ‘2 months’ and the assessment period to ‘1 weeks’
Prepare the data resample specification with timetk::tk_time_series_cv_plan() and plot it with timetk::plot_time_series_cv_plan
Separate the training and test data sets using rsample.

SOLUTION:

# split
splits <-
  taylor_60_min |>
  timetk::time_series_split(
    initial = "2 months"
    , assess = "1 weeks"
  )

# plot
splits |>
  timetk::tk_time_series_cv_plan() |>
  timetk::plot_time_series_cv_plan(
    .date_var = date
    , .value = value
    , .title = "Cross Validation Plan - Short-term electricity demand (30 min)")

# separate
train <- rsample::training(splits) 
test  <- rsample::testing(splits)

Exercise 4: recipes

Create a base recipe (base_rec) using the formula value ~ date and the training data. This will be used for non-regression models
Create a recipe (lm_rec) using the formula value ~ . and the training data. This will be used for regression models. For this recipe:
1. add time series signature features using timetk::step_timeseries_signature with the appropriate argument,
2. add a step to select the columns value, date_index.num, date_month.lbl, date_wday.lbl, date_hour ,
3. add a normalization step targeting date_index.num ,
4. add a step to mutate date_hour, changing it to a factor,
5. add a step to one-hot encode nominal predictors.

SOLUTION:

base_rec <- 
  train |>
  recipes::recipe(value ~ date)
  
lm_rec <- train |>
  recipes::recipe(value ~ .) |>
  timetk::step_timeseries_signature(date) |>
  recipes::step_select( value, date_index.num, date_month.lbl, date_wday.lbl, date_hour ) |>
  step_normalize(date_index.num) |>
  recipes::step_mutate(date_hour = date_hour |> as.factor()) |>
  recipes::step_dummy(all_nominal(), one_hot = TRUE)

Warning: `step_select()` was deprecated in recipes 1.3.0.
ℹ See `?select_select()` for recommended alternatives.

Exercise 5 models

Now we will create a several models to estimate electricity demand, as follows

Create a model specification for an exponential smoothing model using engine ‘ets’
Create a model specification for an arima model using engine ‘auto_arima’
Create a model specification for a linear model using engine ‘glmnet’ and penalty = 0.02, mixture = 0.5

SOLUTION:

model_ets <- modeltime::exp_smoothing() |>
  parsnip::set_engine(engine = "ets")

model_arima <- modeltime::arima_reg() |>
  parsnip::set_engine("auto_arima")

model_lm <- parsnip::linear_reg(penalty = 0.02, mixture = 0.5) |>
  set_engine("glmnet")

Exercise 6 model fitting

Create a workflow for each model using workflows::workflow.

Add a recipe to the workflow
1. the linear model uses the lm_rec recipe created above
2. the ets and arima models use the base_rec recipe created above
Add a model to each workflow
Fit with the training data

SOLUTION:

workflow_fit_ets <- workflows::workflow() |>
  workflows::add_recipe(base_rec) |>
  workflows::add_model(model_ets) |>
  parsnip::fit(train)
  
workflow_fit_arima <- workflows::workflow() |>
  workflows::add_recipe(base_rec) |>
  workflows::add_model(model_arima) |>
  parsnip::fit(train)
  
workflow_fit_lm <- workflows::workflow() |>
  workflows::add_recipe(lm_rec) |>
  workflows::add_model(model_lm) |>
  parsnip::fit(train)

Exercise 7: calibrate

In this exercise we’ll use the testing data with our fitted models.

Create a table with the fitted workflows using modeltime::modeltime_table
Using the table you just created, run a calibration on the test data with the function modeltime::modeltime_calibrate.
Compare the accuracy of the models using the modeltime::modeltime_accuracy() on the results of the calibration

Important

Which is the best model by the rmse metric?

SOLUTION:

model_tbl <- modeltime::modeltime_table(
  workflow_fit_ets
  , workflow_fit_arima
  , workflow_fit_lm
)

calibration_tbl <- model_tbl |>
  modeltime::modeltime_calibrate( test )
  
calibration_tbl |> modeltime::modeltime_accuracy()

# A tibble: 3 × 9
  .model_id .model_desc             .type   mae  mape  mase smape  rmse   rsq
      <int> <chr>                   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1         1 ETS(M,AD,M)             Test  6529. 10.6   2.68 11.0  7222. 0.735
2         2 ARIMA(3,0,0)(2,1,0)[24] Test  8071. 12.7   3.32 13.8  9511. 0.718
3         3 GLMNET                  Test  3022.  5.25  1.24  5.43 3573. 0.936

It looks like the linear model (via glmnet) is the best fit per the rmse metric. This is likely because the data shows the electricity demand is very periodic, and the linear model explicitly includes fourier (periodic) components in the model. The other two models are more general purpose.

Exercise 8: forecast - training data

Use the calibration table with modeltime::modeltime_forecast to graphically compare the fits to the testing data with the observed values.

SOLUTION:

calibration_tbl |>
  modeltime::modeltime_forecast(
    new_data    = test,
    actual_data = taylor_60_min
  ) |>
  modeltime::plot_modeltime_forecast()

Exercise 9: forecast - future

Now refit the models using the full data set (using the calibration table and modeltime::modeltime_refit). Save the result in the variable refit_tbl.

Use the refit data in the variable refit_tbl, along with modeltime::modeltime_forecast and argument h = ‘2 weeks’ (remember to also set the actual_data argument). This will use the models to forecast electricity demand two weeks into the future.
Plot the forecast with modeltime::plot_modeltime_forecast.

SOLUTION:

refit_tbl <- calibration_tbl |>
  modeltime::modeltime_refit(data = taylor_60_min)

refit_tbl |>
  modeltime::modeltime_forecast(h = "2 weeks", actual_data = taylor_60_min) %>%
  modeltime::plot_modeltime_forecast(
    .legend_max_width = 12, # For mobile screens
    .interactive      = TRUE
  )

Exercise 10: Seasonal Time Series Modeling for Retail Forecasting

You are working as a Senior Business Analyst for “TechMart,” a consumer electronics retailer with 50 stores across North America. The company is planning its inventory strategy for 2025 and needs accurate forecasts of monthly sales. The CFO has specifically requested that you use advanced decomposition methods that can handle the complex seasonal patterns in electronics retail.

You have been provided with 60 months of historical sales data (Jan 2019 - Dec 2023) showing strong monthly seasonality (holiday shopping peaks) and an underlying growth trend due to market expansion.

#
# retail data
retail_dat <- readr::read_csv('data/TechMart_sales.csv',show_col_types = FALSE)

# Display data summary
cat("TechMart Monthly Sales Data Summary:\n")

TechMart Monthly Sales Data Summary:

cat("====================================\n")

====================================

cat("Period:", format(min(retail_dat$date), "%b %Y"), "to", 
    format(max(retail_dat$date), "%b %Y"), "\n")

Period: Jan 2019 to Dec 2023

cat("Observations:", nrow(retail_dat), "\n")

Observations: 60

print(summary(retail_dat$monthly_sales))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.979   4.364   5.716   5.843   6.830  10.585

Using the modeltime and parsnip frameworks,

build and compare ETS(A,A,A) and ETS(M,MD,M) models. Use log(sales) as the outcome. Use workflows and modeltime::exp_smoothing, setting the error, trend, season, and damped argument explicitly for each model (see the help for modeltime::exp_smoothing ). For ETS(M,M,M), set damping = “damped” (NOTE: the default is ‘damped’).
forecast sales one month ahead, and extract the forecast for January 2024. Plot the forecasts.

SOLUTION Q10:

#
# SHOW ALL YOUR WORK

# create a recipe ----
base_rec <- 
  retail_dat |>
  recipes::recipe(monthly_sales ~ date) |> 
  recipes::step_log(monthly_sales)

# specify the models ----
model_ets_AAA <- 
  modeltime::exp_smoothing(
    error="additive", trend="additive", season="additive", damping = "none"
  ) |>
  parsnip::set_engine(engine = "ets")

model_ets_MMM <- 
  modeltime::exp_smoothing(
    error="multiplicative", trend="multiplicative", season="multiplicative", damping = "damped"
  ) |>
  parsnip::set_engine(engine = "ets")

# create the workflows ----
workflow_fit_AAA <- workflows::workflow() |>
  workflows::add_recipe(base_rec) |>
  workflows::add_model(model_ets_AAA ) |>
  parsnip::fit(retail_dat)

workflow_fit_MMM <- workflows::workflow() |>
  workflows::add_recipe(base_rec) |>
  workflows::add_model(model_ets_MMM) |>
  parsnip::fit(retail_dat)

# predict one month ahead ----
model_tbl <- modeltime::modeltime_table(
  workflow_fit_AAA, workflow_fit_MMM
)

calibration_tbl <- model_tbl |>
  modeltime::modeltime_calibrate( retail_dat )

predictions <- calibration_tbl |>
  modeltime::modeltime_forecast(h = "1 month", actual_data = retail_dat)

# plot the predictions ----
predictions |> 
  dplyr::slice_tail(n=2) |> 
  ggplot(aes(x = .model_desc, y = .value)) +
    geom_point(size = 3, color = "blue") +
    geom_errorbar(aes(ymin = .conf_lo, ymax = .conf_hi), width = 0.2, color = "blue") +
    labs(
      title = "Predictions with Error Bars",
      x = "Model",
      y = "Sales ($millions)"
    ) + theme_minimal()

The better model is the multiplicative ETS model, based on the confidence interval and the metrics from the calibration table (try checking that).

Grading

Total points available: 30 points.

Component	Points
Ex 1 - 10	30