[ENH] using TFT without past target values #1585

mahaassr · 2024-07-18T07:02:04Z

-Hi,

I have a question regarding the use of the Temporal Fusion Transformer (TFT) model.
Is it possible to effectively use the TFT model without providing past target values in the known or unknown inputs? Specifically, I am only passing the target value as the target in the TimeSeriesDataset class and never include past target values in the known or unknown inputs.
Could you please provide some guidance in such scenarios?
Thank you for your assistance!
Best regards,

Maha

moogoofoo · 2024-09-13T06:01:46Z

Did you find an answer to this question? I have the same problem/question.

fkiraly · 2024-09-13T16:07:16Z

I think it is fixed by this: #1667

Generally, it is hard to understand the bug without minimal reproducible code - it would be appreciated if you could post code, or check whether the PR fixes the failure in your case.

moogoofoo · 2024-09-14T02:07:39Z

For my issue, I didn't want the target values being sent to the encoder, which for me causes leakage when there is some future aspect in the target values.. Not at all sure that this is the best approach but it seems like it might work for me.

class MyTimeSeriesDataSet(TimeSeriesDataSet):

def __getitem__(self, idx: int) -> Tuple[Dict[str, torch.Tensor], torch.Tensor]:
    """
    Get sample for model

    Args:
        idx (int): index of prediction (between ``0`` and ``len(dataset) - 1``)

    Returns:
        Tuple[Dict[str, torch.Tensor], torch.Tensor]: x and y for model
    """

[......]

at the end of the function I changed for my multi-target case:

    if self.multi_target:
        encoder_target = [t[:encoder_length] for t in target]
    # Added the following hack so that the encoder_target values are zeroed out and thus the encoder is not able to use them
        for each_encoder_target in encoder_target:
            each_encoder_target[:] = 0.0
        target = [t[encoder_length:] for t in target]
    else:
        encoder_target = target[0][:encoder_length]
        target = target[0][encoder_length:]
        target_scale = target_scale[0]

moogoofoo · 2024-09-14T15:29:46Z

More appropriately, shouldn’t there be some way of specifying which taget variables should not be sent to the encoder? As for the documentation it wasn’t at all clear to me this is what was happening and it took me a while to understand this. The documentation should be abundantly clear about this.

fkiraly · 2024-09-14T17:25:29Z

Does this issue summarize the documentation request well?
#1591

What would help a lot if (in #1591) you could point exactly to classes or methods, with import locations, where you think documentation is currently unclear, @moogoofoo.
(Pull requests, of course, are also always appreciated)

Further, if you think the interface should change to a specific target state, an explicit explanation in this issue would be helpful.

jpswensen · 2024-10-14T18:34:36Z

I was having the same issue. My target was based on looking up to 20 steps into the future, so this means that for the encoder_targets, I somehow needed to ignore the final 20 steps of of the encoder_targets in the getitem in order to avoid data leakage problems. I initially knew something was wrong because I was getting unreasonably high accuracies in the problem I was tackling, which caused me to go digging and see that the targets were getting used intermediately at the encoder output (I remembered reading this in the paper on TFT, but then had forgotten it).

For MSELoss problems (and similar), you can set them to NaN, as the PyTorch loss function is smart enough to know that those shouldn't factor into the loss.
For my case of CrossEntropy, you can set them to -100, which corresponds to the Pytorch CrossEntropyLoss 'ignore_index' parameters. This causes the loss to ignore the values where the target is -100.

I couldn't find a built-in way of doing this kind of masking easily. My solution was to just add a line at Line 1662 of the timeseries.py (right before the return), where I set the

encoder_targets[-20:] = -100 # (or NaN for MSELoss problems)

Here I replace "-20" with whatever my lookahead window was when computing my target value. In this manner, I think I am assuring that I don't have data leakage.

This is a very hacky solution, and I tried to see if there was a way to add a mask or lookahead window by subclassing TimeSeriesDataset. I got it working with the TimeSeriesDataset, but then couldn't figure out why that didn't also work when using the from_dataset static member function. I really need this to work for from_dataset for my val and test sets, so that I have the same normalization statistics derived from the training set.

I need to dig in more to find a permanent solution, and could potentially make a pull request once I get it sorted out.

jpswensen · 2024-10-15T04:47:29Z

Followup: Here is the derived TimeSeriesDataSet class that I came up with to do the masking I need. It seems to be working as I wanted it to. I'm sure there are more compact ways of doing the parameters with *args and **kwargs. This allows me to make my binary training targets be looking into the future, but ensure there is no data leakage for the intermediate encoder_targets. The one thing I hadn't realized until I was working on this is that the TFT can have variable encoder series lengths. What this means is that if I don't set the min_encoder_length to be larger than the encoder_mask_len, then I have the possibility of having all encoder_targets being masked off. I will probably try adding all three parameters of min_encoder_length, max_encoder_length, and encoder_mask_len to my optuna parameter optimization search.

class MaskedEncoderTimeSeriesDataSet(TimeSeriesDataSet):
    def __init__(self, 
        encoder_mask_len,
        data: pd.DataFrame,
        time_idx: str,
        target: Union[str, List[str]],
        group_ids: List[str],
        weight: Union[str, None] = None,
        max_encoder_length: int = 30,
        min_encoder_length: int = None,
        min_prediction_idx: int = None,
        min_prediction_length: int = None,
        max_prediction_length: int = 1,
        static_categoricals: List[str] = [],
        static_reals: List[str] = [],
        time_varying_known_categoricals: List[str] = [],
        time_varying_known_reals: List[str] = [],
        time_varying_unknown_categoricals: List[str] = [],
        time_varying_unknown_reals: List[str] = [],
        variable_groups: Dict[str, List[int]] = {},
        constant_fill_strategy: Dict[str, Union[str, float, int, bool]] = {},
        allow_missing_timesteps: bool = False,
        lags: Dict[str, List[int]] = {},
        add_relative_time_idx: bool = False,
        add_target_scales: bool = False,
        add_encoder_length: Union[bool, str] = "auto",
        target_normalizer: Union[None, str, List[str], Tuple[str], None] = "auto",
        categorical_encoders: Dict[str, str] = {},
        scalers: Dict[str, Union[str, None]] = {},
        randomize_length: Union[None, Tuple[float, float], bool] = False,
        predict_mode: bool = False,):
        
        # Save the new parameter
        self.encoder_mask_len = encoder_mask_len
        
        # Call the parent class's __init__ method with the remaining arguments
        super().__init__(data, 
                         time_idx, 
                         target, 
                         group_ids, 
                         weight, 
                         max_encoder_length,
                         min_encoder_length,
                         min_prediction_idx,
                         min_prediction_length,
                         max_prediction_length,
                         static_categoricals,
                         static_reals,
                         time_varying_known_categoricals,
                         time_varying_known_reals,
                         time_varying_unknown_categoricals,
                         time_varying_unknown_reals,
                         variable_groups,
                         constant_fill_strategy,
                         allow_missing_timesteps,
                         lags,
                         add_relative_time_idx,
                         add_target_scales,
                         add_encoder_length,
                         target_normalizer,
                         categorical_encoders,
                         scalers,
                         randomize_length,
                         predict_mode)

    def __getitem__(self, idx: int):
        """
        Fetch a window of data, mask the encoder target for the last K steps (future_target_steps),
        and return the original structure with masked encoder target.
        
        Args:
            idx (int): Index position for fetching a sample.
        
        Returns:
            Tuple: (
                dict: Containing x_cat, x_cont, encoder_length, decoder_length, encoder_target, etc.,
                Tuple: target and weight
            )
        """
        # Call the parent class to get the standard input/output structure
        x, (target, weight) = super().__getitem__(idx)

        # Mask the last K (future_target_steps) steps in the encoder target
        encoder_target = x['encoder_target'].clone()  # Clone to avoid modifying the original data
        
        # encoder_target[-self.encoder_mask_len:] = np.nan  # For MSELoss
        # print(f'Encoder target before masking: {encoder_target}')
        encoder_target[-self.encoder_mask_len:] = -100    # For CrossEntropyLoss
        # print(f'Encoder target after masking: {encoder_target}')
        
        # Update the x dictionary to include the modified encoder target
        x['encoder_target'] = encoder_target

        # Return the updated x dictionary and the original target, weight tuple
        return x, (target, weight)
    
    @classmethod
    def from_dataset(
        cls, encoder_mask_len, dataset, data: pd.DataFrame, stop_randomization: bool = False, predict: bool = False, **update_kwargs
    ):
        """
        Generate dataset with different underlying data but same variable encoders and scalers, etc.

        Calls :py:meth:`~from_parameters` under the hood.

        Args:
            encoder_mask_len: the length at the end of the encoder_target that should be ignored
            dataset (TimeSeriesDataSet): dataset from which to copy parameters
            data (pd.DataFrame): data from which new dataset will be generated
            stop_randomization (bool, optional): If to stop randomizing encoder and decoder lengths,
                e.g. useful for validation set. Defaults to False.
            predict (bool, optional): If to predict the decoder length on the last entries in the
                time index (i.e. one prediction per group only). Defaults to False.
            **kwargs: keyword arguments overriding parameters in the original dataset

        Returns:
            TimeSeriesDataSet: new dataset
        """
        return cls.from_parameters(
            encoder_mask_len, dataset.get_parameters(), data, stop_randomization=stop_randomization, predict=predict, **update_kwargs
        )

    @classmethod
    def from_parameters(
        cls,
        encoder_mask_len,
        parameters: Dict[str, Any],
        data: pd.DataFrame,
        stop_randomization: bool = None,
        predict: bool = False,
        **update_kwargs,
    ):
        """
        Generate dataset with different underlying data but same variable encoders and scalers, etc.

        Args:
            encoder_mask_len: the length at the end of the encoder_target that should be ignored
            parameters (Dict[str, Any]): dataset parameters which to use for the new dataset
            data (pd.DataFrame): data from which new dataset will be generated
            stop_randomization (bool, optional): If to stop randomizing encoder and decoder lengths,
                e.g. useful for validation set. Defaults to False.
            predict (bool, optional): If to predict the decoder length on the last entries in the
                time index (i.e. one prediction per group only). Defaults to False.
            **kwargs: keyword arguments overriding parameters

        Returns:
            TimeSeriesDataSet: new dataset
        """
        parameters = deepcopy(parameters)
        if predict:
            if stop_randomization is None:
                stop_randomization = True
            elif not stop_randomization:
                warnings.warn(
                    "If predicting, no randomization should be possible - setting stop_randomization=True", UserWarning
                )
                stop_randomization = True
            parameters["min_prediction_length"] = parameters["max_prediction_length"]
            parameters["predict_mode"] = True
        elif stop_randomization is None:
            stop_randomization = False

        if stop_randomization:
            parameters["randomize_length"] = None
        parameters.update(update_kwargs)

        # Remove 'new_param' if it exists in parameters to avoid duplication
        parameters.pop("encoder_mask_len", None)

        # Create new dataset instance
        new = cls(encoder_mask_len, data, **parameters)
        return new

fkiraly · 2024-10-18T10:35:24Z

@jpswensen, this is nice! Would you be able to contribute this in a pull request, and possibly a test case, to see whether this works and also does not break anything? Would be great!

minchao4git · 2024-12-29T14:26:29Z

Hi all, this question is quite interesting, any updates on this?

fkiraly · 2025-01-05T16:37:34Z

,I think the high-level update is two-fold:

the encoder/decoder structure seems problematic from an API design perspective, as not all torch based forecasting models in the wider ecosystem use it. There are ongoing discussions on avoiding this altogether if the user does not want it, and more generally, how the loaders/datasets should look in the long term. Contributions appreciated: [API] Redesign towards pytorch-forecasting 2.0 #1736
sktime interfaces TFT as PytorchForecastingTFT, which abstracts the encoder/decoder. The sktime API is stringent, in a way that - as far as I understand - prevents leakage through internal resampling.
- sktime documentation link with example code: https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.pytorchforecasting.PytorchForecastingTFT.html
- a review of the sktime code under the hood, with respect to the "no leakage" claim, would be appreciated.

fkiraly changed the title ~~Question on using TFT without past target values~~ [ENH] using TFT without past target values Sep 13, 2024

fkiraly added enhancement New feature or request feature request New feature or request labels Sep 13, 2024

moogoofoo mentioned this issue Sep 14, 2024

[DOC] Ensure clear documentation of base API - model.predict, decoder, test_dataset in TFT #1591

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] using TFT without past target values #1585

[ENH] using TFT without past target values #1585

mahaassr commented Jul 18, 2024

moogoofoo commented Sep 13, 2024

fkiraly commented Sep 13, 2024

moogoofoo commented Sep 14, 2024 •

edited

Loading

moogoofoo commented Sep 14, 2024 •

edited

Loading

fkiraly commented Sep 14, 2024 •

edited

Loading

jpswensen commented Oct 14, 2024

jpswensen commented Oct 15, 2024

fkiraly commented Oct 18, 2024

minchao4git commented Dec 29, 2024

fkiraly commented Jan 5, 2025 •

edited

Loading

[ENH] using TFT without past target values #1585

[ENH] using TFT without past target values #1585

Comments

mahaassr commented Jul 18, 2024

moogoofoo commented Sep 13, 2024

fkiraly commented Sep 13, 2024

moogoofoo commented Sep 14, 2024 • edited Loading

moogoofoo commented Sep 14, 2024 • edited Loading

fkiraly commented Sep 14, 2024 • edited Loading

jpswensen commented Oct 14, 2024

jpswensen commented Oct 15, 2024

fkiraly commented Oct 18, 2024

minchao4git commented Dec 29, 2024

fkiraly commented Jan 5, 2025 • edited Loading

moogoofoo commented Sep 14, 2024 •

edited

Loading

moogoofoo commented Sep 14, 2024 •

edited

Loading

fkiraly commented Sep 14, 2024 •

edited

Loading

fkiraly commented Jan 5, 2025 •

edited

Loading