Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitted values on re-fitted model seem to be affected by new data #798

Open
robjhyndman opened this issue May 1, 2019 · 5 comments
Open

Comments

@robjhyndman
Copy link
Owner

robjhyndman commented May 1, 2019

library(fpp2)
# Create training and two sets of test data
training <- subset(auscafe, end=length(auscafe)-61)
test2 <- test <- subset(auscafe, start=length(auscafe)-60)
test2[61] <- test[61] + 2

# Apply same model to all three time series
cafe.train <- Arima(training, order=c(2,1,1), seasonal=c(0,1,2), lambda=0)
cafe.test <- Arima(test, model=cafe.train)
cafe.test2 <- Arima(test2, model=cafe.train)

# Fitted values on each test set
cafe.test.fit <- fitted(cafe.test)
cafe.test2.fit <- fitted(cafe.test2)

window(cafe.test.fit, c(2017,1), c(2017,9)) 
#>           Jan      Feb      Mar      Apr      May      Jun      Jul
#> 2017 3.639529 3.313001 3.614441 3.571203 3.588871 3.483909 3.760809
#>           Aug      Sep
#> 2017 3.780694 3.748437
window(cafe.test2.fit, c(2017,1), c(2017,9))
#>           Jan      Feb      Mar      Apr      May      Jun      Jul
#> 2017 3.639529 3.313001 3.614441 3.571203 3.588871 3.483909 3.760809
#>           Aug      Sep
#> 2017 3.780694 3.812552

Created on 2019-05-01 by the reprex package (v0.2.1)

Why does the last fitted value change?

@Steviey
Copy link

Steviey commented Jun 3, 2019

+1

@mitchelloharawild
Copy link
Collaborator

I get the same result in fable - why is the change in fitted values unexpected?

@robjhyndman
Copy link
Owner Author

Because a fitted value is a one-step forecast, and should not be affected by the following observation.

@mitchelloharawild
Copy link
Collaborator

mitchelloharawild commented Jun 4, 2019

MRE directly from stats::arima(). Seems to only occur when a SMA is included in the model.

It also seems to happen within ARIMA_Like, which updates the model based on data when computing the fit - will need to dig deeper into this later.

# Create training and two sets of test data
training <- USAccDeaths[1:60]
test2 <- test <- USAccDeaths[61:72]
test2[12] <- test[12] + 2

fit <- stats::arima(log(training), order = c(0,1,0), seasonal = c(0,1,1))
refit1 <- stats::arima(log(test), order = c(0,1,0), seasonal = c(0,1,1), fixed = coef(fit))
refit2 <- stats::arima(log(test2), order = c(0,1,0), seasonal = c(0,1,1), fixed = coef(fit))

exp((log(test) - refit1$residuals)[10:12])
#> [1] 9270.875 9187.616 8740.719
exp((log(test2) - refit2$residuals)[10:12])
#> [1] 9270.875 9187.616 8740.807

Created on 2019-06-04 by the reprex package (v0.2.1)

@ncooder
Copy link

ncooder commented Jan 6, 2025

@robjhyndman @mitchelloharawild Looking at the code, this behavior actually makes sense. When you fit the model, it uses the Kalman filter from the stats::arima. This includes two steps: a forward pass (filtering) and a backward pass (smoothing). During the backward pass, the entire data set including future observations helps to refine the model estimates. Because of this, the fitted values can change once you add more data, even if the model parameters stay the same. This behavior is normal for the Kalman, and for sure it is not a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants