Dealing with uncertainty is difficult. If we knew exactly what would happen, planning for events and outcomes would be a lot easier. In the case of a business, knowing what is going to happen can save a lot of money. Until we have a crystal ball that tells us everything, the best way to deal with future uncertainty is through time series forecasting.
What is a time series analysis?
A time series is a data set that looks at a certain metric over a period of time. An easy example of this would be the weather. If we want to predict what the temperature will be, we will have to analyze historical weather data over a period of time to learn the patterns in order to estimate what will happen. Certain variables that could influence temperature are the time of year, humidity levels, physical location in the world, altitude, and distance from a large body of water, to name a few. A time series analysis looks at the relationship between the dependent variable (in this case, temperature) in comparison to the independent variables (ex. time of year, latitude/longitude, humidity) to determine the impact of each. A time series analysis can also look at how timing impacts the dependent variable in the form of seasonality or overall upward or downward trends.
Seasonality are trends within the data that occur at specific times. In the northern hemisphere, it is expected for temperatures to drop from November through March. A time series analysis should be able to find that trend and incorporate it when forecasting temperature.
A time series should also be able to consider macro trends. In the weather example, a macro trend would be if a given area has been seeing an increase or decrease in temperature over time as a result of climate change.
We don’t have a crystal ball, but we do have a [FB] Prophet
One of my favorite forecasting tools is Facebook’s Prophet. FB Prophet is a forecasting package in both R and Python that was developed by Facebook’s data science research team. The goal of the package is to give business users a powerful and easy-to-use tool to help forecast business results without needing to be an expert in time series analysis. There is thorough documentation of the package, how to use it, and examples on the Prophet website as well as other, third-party sites, such as GitHub and Kaggle. Once the data is cleaned and set up in the proper schema, the actual package is extremely easy to use and can run in 4 lines of code.
The underlying algorithm is a generalized additive model that is decomposable into three main components: trend, seasonality, and holidays. As I mentioned above, seasonality and trend are two important, but difficult to quantify, components of a time series analysis and FB Prophet does a great job capturing both.
Because it is a decomposable model, it is relatively easy to extract the coefficients of the model to understand the impact of seasonality, trend, holidays, and other regressor variables. For example, if a business team is trying to forecast sales, they can extract the price coefficient to see how impactful price is to forecast sales. Decomposition helps the teams understand the drivers of the business. It also helps to identify reasons why a forecast is off. For example, if there is a last-minute price increase that was not accounted for during forecasting, it can be identified relatively quickly when evaluating the model vs. actual performance.
One caveat about FB Prophet is that it is great for stationary data. Stationary data is time series data that follow similar behavior and have the same statistical properties throughout time.
Below are visuals of what stationary vs. non-stationary data looks like. The stationary data is in blue. It is easy to see the patterns throughout the years and everything is in a similar range. The orange data is the non-stationary data. The non-stationary data may look like it’s following a trend at the start, but you can see at the end of the timeframe that the data changes patterns pretty quickly.
Prophet does not perform well on non-stationary data because it is difficult to find the actual seasonality and trend of the data if the patterns are inconsistent.
Using Prophet to Forecast Sales
Now to put Prophet into action! I found a Store Item Demand Forecasting dataset on Kaggle to use for this example. I will be forecasting store-level sales for the last year of the training dataset since I will have the actual numbers to compare the forecast to.
First, I need to import all of the functions and the data into my Juptyer notebook. I’m only importing pandas, numpy, and Prophet.
Next, I did some basic exploratory data analysis on the underlying data to see if I’m dealing with stationary or non-stationary data. The graph below visualizes overall sales by day.
The overall sales by day look pretty stationary and give me an indication that the Prophet model might be a good fit for forecasting sales at the store level. I’m going to take my analysis one step further and check to see if the patterns hold true for individual store trends.
Looks like the stationary nature of the data holds true even at the store level.
I also want to take a look at what the data looks like overall. I see that there are 5 years of data in the dataset and that the daily sales span from 0 to 231 for a given day for a given item.
The next step is to aggregate the data to the store level. For this example, I only want to look at the total sales, regardless of the item, for a given day.
To get ready for modeling, I’m going to split the data into a training and testing set. Since I have 5 years of data, I am going to train the data on 4 years and test the model on the last year. My models will be specific to each individual store. There are 10 stores total, meaning there will be 10 individual models.
I’m going to test to see how to apply Prophet on Store 1 sales. I’m going to filter down both the training and testing data down to just store 1. Next, I’m going to type 4 short lines of code to build the model. Yep, that’s it! How simple is that?!
The forecasting output looks something like the image below. It contains information for:
- The forecasted value (yhat)
- Range for the forecasted values (yhat_lower and yhat_upper)
- The overall trend for a given date (also incorporates seasonality)
- Additive terms to adjust the trend to get the forecasted value
To get the predicted value, you would do the following:
- yhat = trend + additive_terms
The next step would be to visualize the training set to see the fit of the model. The blue line is the forecasted values while the black dots are the actual values. We can see that the blue line fits the overall trend of the data with some outliers still remaining. It’s a pretty good fit!
Now we need to test how successful the model is on the data it hasn’t seen. The forecast data frame contains the predicted values for 2017. I need to add back the actual values to see the absolute errors and forecast accuracy of the model. Remember, the model has not seen any of the 2017 data when determining the values of the output.
When forecasting, I like to use 1 – weighted MAPE as an error metric to determine the fit of the model. I take the absolute error of the actual – predicted values at a daily level the aggregate all the errors up and divide by the total sales. For the Store 1 model, it has a forecast accuracy of 94.79%. Excellent!
Now to apply those steps for the rest of the stores. Instead of typing everything out, I’m going to get smart and put everything into a for loop to save me time. I created a list of unique stores and took the code from above and packed it into a loop. I like to have the loop print out the store number and forecast accuracy so I can see how each store is doing at a high level.
Lastly, I’m going to combine all of the model results together into one big file to take a look at how the forecast did overall.
After visualizing the results, it’s pretty obvious that the predicted (orange) fits the actual values (gray) extremely well! The overall forecast accuracy is 94.57%, which is extremely high!
Each store model is sitting around 94% forecast accuracy with a similar overall fit.
So what?
Time series are ubiquitous in the business world. Whether you are trying to plan what sales will be or how many people to staff on a given day, time series plays a key role. FB Prophet takes a historically complex topic and makes it easy to use for hard-core data scientists and business analysts alike.