dssg · mihirbhaskar · Mar 19, 2023
diff --git a/docs/sources/experiments/temporal-validation.md b/docs/sources/experiments/temporal-validation.md
@@ -2,14 +2,14 @@
 
 A temporal validation deep dive is currently available in the Dirty Duck tutorial. [Dirty Duck - Temporal Cross-validation](../../dirtyduck/triage_intro/#temporal-crossvalidation)
 
-You can produce the time graphs detailed in the Dirty Duck deep dive using the Triage CLI or through calling Python code directly. The graphs use matplotlib, so you'll need a matplotlib backend to use. Refer to the [matplotlib docs](https://matplotlib.org/faq/usage_faq.html) for more details.
+You can produce time graphs using the Triage CLI or through calling Python code directly. There are two options for graphing; static plots using matplotlib (as shown in the Dirty Duck), and an interactive plot using Plotly. Refer to the [matplotlib docs](https://matplotlib.org/faq/usage_faq.html) or [Plotly docs](https://plotly.com/python/) for more details.
 
 ## Python Code
 
-Plotting is supported through the `visualize_chops` function, which takes a fully configured Timechop object. You may store the configuration for this object in a YAML file if you wish and load from a file, but in this example we directly set the parameters as arguments to the Timechop object. This would enable faster iteration of time config in a notebook setting.
+Plotting is supported through the `visualize_chops` or `visualize_chops_plotly` functions, which take a fully configured Timechop object. You may store the configuration for this object in a YAML file if you wish and load from a file, but in this example we directly set the parameters as arguments to the Timechop object. This would enable faster iteration of time config in a notebook setting.
 
 ```
-from triage.component.timechop.plotting import visualize_chops
+from triage.component.timechop.plotting import visualize_chops, visualize_chops_plotly
 from triage.component.timechop import Timechop
 
 chopper = Timechop(
@@ -26,17 +26,22 @@ chopper = Timechop(
     test_label_timespans=['7day'] # time period across which outcomes are labeled in test matrices
 )
 
+visualize_chops_plotly(chopper)
 visualize_chops(chopper)
 ```
 
 ## Triage CLI
 
-The Triage CLI exposes the `showtimechops` command which just takes a YAML file as input. This YAML file is expected to have a `temporal_config` section with Timechop parameters. You can use a full experiment config, or just create a YAML file with only temporal config parameters; the temporal config just has to be present. Here, we use the [example_experiment_config.yaml](https://github.com/dssg/triage/blob/master/example/config/experiment.yaml) from the Triage repository root as an example.
+The Triage CLI exposes the `showtimechops` command which just takes a YAML file as input. Note that this only works for the static, Matplotib, version. This YAML file is expected to have a `temporal_config` section with Timechop parameters. You can use a full experiment config, or just create a YAML file with only temporal config parameters; the temporal config just has to be present. Here, we use the [example_experiment_config.yaml](https://github.com/dssg/triage/blob/master/example/config/experiment.yaml) from the Triage repository root as an example.
 
 `triage experiment example_experiment_config.yaml --show-timechops`
 
 ## Result
 
-Using either method, you should see output similar to this:
+For the interactive version, you should see output similar to [this notebook](https://colab.research.google.com/drive/1BjWZLEynQK-7DOSEP5zhT_-RefIb8gGS?usp=sharing)
+
+For the static graph, you should ee output similar to this:
 
 ![time chop visualization](timechops.png)
+
+
diff --git a/requirement/main.txt b/requirement/main.txt
@@ -27,6 +27,6 @@ matplotlib==3.5.1
 pandas==1.3.5 # pyup: ignore
 seaborn==0.11.2
 ohio==0.5.0
-
+plotly==5.13.1
 
 aequitas==0.42.0
diff --git a/src/triage/component/timechop/plotting.py b/src/triage/component/timechop/plotting.py
@@ -4,6 +4,11 @@
 import numpy as np
 from triage.util.conf import convert_str_to_relativedelta
 import matplotlib.pyplot as plt
+import plotly.express as px
+from plotly.subplots import make_subplots
+import plotly.graph_objects as go
+from datetime import datetime
+
 
 
 FIG_SIZE = (32, 16)
@@ -118,3 +123,243 @@ def visualize_chops(chopper, show_as_of_times=True, show_boundaries=True, save_t
     if save_target:
         plt.savefig(save_target)
     plt.show()
+
+def visualize_chops_plotly(chopper, selected_splits=None, show_label_timespans=True, show_boxes=True, show_annotations=True):
+    """Visualize time chops of a given Timechop object using plotly, to get an interactive output
+
+    Args:
+        chopper (triage.component.timechop.Timechop): A fully-configured Timechop object
+        selected_splits (list): Indices of train-val sets to plot. E.g. [0, 1, 2] plots the 3 most recent splits, [0,-1] plots the first and last splits.
+            Defaults to None, which plots all splits.
+        show_label_timespans (bool): Whether or not to draw horizontal lines to show label timespan
+            for as-of-times
+        show_boxes (bool): Whether or not to show a rectangle highlighting train-test matrices
+        show_annotations (bool): Whether or not to add annotations on the latest split, showing what each of the timechop parameters mean
+    """
+    chops = chopper.chop_time()
+    chops.reverse() # reverse to get the most recent set first
+
+    # Subset to relevant splits if arg specified, and generate titles for each split
+    if selected_splits is not None:
+      chops = [chops[i] for i in selected_splits]
+      titles = tuple(f"Train-Validation Split {i+1}" for i in selected_splits)
+    else:
+      titles = tuple(f"Train-Validation Split {i+1}" for i in range(len(chops)))
+
+    fig = make_subplots(rows=len(chops), 
+                        cols=1,
+                        shared_xaxes=True,
+                        shared_yaxes=True,
+                        vertical_spacing=0.05,
+                        subplot_titles=titles) # adds titles for each subplot
+
+    # For each train-val split
+    for idx, chop in enumerate(chops):
+        train_as_of_times = chop["train_matrix"]["as_of_times"]
+        test_as_of_times = chop["test_matrices"][0]["as_of_times"]
+
+        test_label_timespan = chop["test_matrices"][0]["test_label_timespan"]
+        training_label_timespan = chop["train_matrix"]["training_label_timespan"]
+
+        # Colors for train/test 
+        train_color = "rgba(3, 37, 126" # dark blue (left open because we add an opacity argument below)
+        test_color = "rgba(139, 0, 0" # magenta (left open because we add an opacity argument below)
+        as_of_date_marker_opacity = ', 1)' # the extra ', 1)' defines opacity. 100% solid for markers
+        label_line_opacity = ', 0.3)' # 30% opacity for the label lines
+        rectangle_fill_opacity = ', 0.15)' # 15% opacity for rectangle fill
+
+        train_as_of_date_color = train_color + as_of_date_marker_opacity
+        train_label_period_color = train_color + label_line_opacity
+        train_rectangle_fill = train_color + rectangle_fill_opacity
+        test_as_of_date_color = test_color + as_of_date_marker_opacity
+        test_label_period_color = test_color + label_line_opacity
+        test_rectangle_fill = test_color + rectangle_fill_opacity
+
+        # Show legend only if idx = 0 (i.e. first train-val set we are displaying)
+        if idx == 0:
+          # Train set as-of-date markers
+          fig.add_trace(
+              go.Scatter(x=[x.date() for x in train_as_of_times], 
+                        y=[x for x in range(len(train_as_of_times))], 
+                        mode='markers',
+                        marker=dict(color=train_as_of_date_color),
+                        name='Training as-of-date',
+                        showlegend=True,
+                        hovertemplate="%{x}<extra></extra>" # the extra extra tag gets rid of a default 'trace' line in the hover output and just shows 'x', the date
+                        ), 
+              row=idx+1, # row and column of the subplots to add this trace object to
+              col=1
+              )
+          # Validation set as-of-date markers
+          fig.add_trace(
+            go.Scatter(x=[x for x in test_as_of_times], 
+                      y=[x for x in range(len(test_as_of_times))], 
+                      mode='markers',
+                      name='Validation as-of-date',
+                      showlegend=True,
+                      marker=dict(color=test_as_of_date_color),
+                      hovertemplate="%{x}<extra></extra>"),
+            row=idx+1,
+            col=1
+            )
+        # Suppress legend if not the first subplot; only difference with above is showlegend=False (note, anytime we add a trace, we have to set showlegend=False to suppress useless info in the legend)
+        else:
+          # Train set as-of-date markers
+          fig.add_trace(
+              go.Scatter(x=[x.date() for x in train_as_of_times], 
+                        y=[x for x in range(len(train_as_of_times))], 
+                        mode='markers',
+                        marker=dict(color=train_as_of_date_color),
+                        name='Training as-of-date',
+                        showlegend=False,
+                        hovertemplate="%{x}<extra></extra>" # the extra extra tag gets rid of a default 'trace' line in the hover output and just shows 'x', the date
+                        ), 
+              row=idx+1, # row and column of the subplots to add this trace object to
+              col=1
+              )
+
+          # Validation set as-of-date markers
+          fig.add_trace(
+            go.Scatter(x=[x for x in test_as_of_times], 
+                      y=[x for x in range(len(test_as_of_times))], 
+                      mode='markers',
+                      name='Validation as-of-date',
+                      showlegend=False,
+                      marker=dict(color=test_as_of_date_color),
+                      hovertemplate="%{x}<extra></extra>"),
+            row=idx+1,
+            col=1
+            )
+
+
+        # Add test_durations annotation if option selected
+        if idx == 0 and show_annotations==True:
+
+          # Add a dashed line to show test_durations span
+          x0 = test_as_of_times[0]
+          x1 = test_as_of_times[-1]
+          x_mid = x0 + (x1-x0)/2
+          y = -1 # place the test durations labeling below the graph
+          fig.add_shape(type='line', x0=x0, x1=x1, y0=y, y1=y, line={'color': 'green'}, row=idx+1, col=1)
+          fig.add_annotation(x=x_mid, y=y-1, text=f"Test duration: {chop['test_matrices'][0]['test_duration']}", showarrow=False)
+
+        # Add label timespan lines if option selected
+        if show_label_timespans is True:
+
+          # For training as_of_dates
+          for i in range(len(train_as_of_times)):
+            fig.add_trace(
+                go.Scatter(
+                    x=[train_as_of_times[i].date(), train_as_of_times[i].date() + convert_str_to_relativedelta(training_label_timespan)],
+                    y=[i,i],
+                    marker=dict(color=train_label_period_color, line=dict(color=train_label_period_color)),
+                    hovertemplate="%{x}<extra></extra>",
+                    showlegend=False
+                ),
+              row=idx+1,
+              col=1 
+            )
+
+            # Add annotation showing train label timespan on first bar in first train-val set (if option specified) 
+            if i == len(train_as_of_times)-1 and idx == 0 and show_annotations==True:
+
+              # Have the x in between the label timespan
+              x0 = train_as_of_times[i].date()
+              x1 = train_as_of_times[i].date() + convert_str_to_relativedelta(training_label_timespan)
+              x_pos = x0 + (x1 - x0)/2
+
+              # Position at a y-value above the bar
+              y_pos = i
+              fig.add_annotation(x=x_pos, y=y_pos, text='Label timespan', showarrow=True, arrowhead=1, row=idx+1, col=1)
+
+          # For test as_of_dates
+          for i in range(len(test_as_of_times)):
+            fig.add_trace(
+                go.Scatter(
+                    x=[test_as_of_times[i].date(), test_as_of_times[i].date() + convert_str_to_relativedelta(test_label_timespan)],
+                    y=[i,i],
+                    marker=dict(color= test_label_period_color, line=dict(color= test_label_period_color)),
+                    showlegend=False,
+                    hovertemplate="%{x}<extra></extra>"),
+              row=idx+1,
+              col=1 
+            )
+
+            # Add annotation showing test label timespan on first bar in first train-val set (if option specified) 
+            if i == len(test_as_of_times)-1 and idx == 0 and show_annotations==True:
+
+                # Have the x in between the label timespan
+                x0 = test_as_of_times[i].date()
+                x1 = test_as_of_times[i].date() + convert_str_to_relativedelta(test_label_timespan)
+                x_pos = x0 + (x1 - x0)/2
+
+                # Position at a y-value above the bar
+                y_pos = i
+                fig.add_annotation(x=x_pos, y=y_pos, text='Label timespan', showarrow=True, arrowhead=1, row=idx+1, col=1)
+
+        # Add rectangles/boxes to mark train-test matrices
+        if show_boxes is True:
+
+          # Training matrix rectangle
+          # Rectangle params
+          x0 = min(train_as_of_times).date()
+          x1 = max(train_as_of_times).date() + convert_str_to_relativedelta(training_label_timespan)
+          y = max(len(test_as_of_times), len(train_as_of_times))
+
+          fig.add_trace(
+              go.Scatter(x =[x0,x0,x1,x1,x0], y=[0,y,y,0,0], 
+                        fill='toself', fillcolor=train_rectangle_fill,
+                        showlegend=False,
+                        marker=dict(color='rgba(0,255,0,0)', line=dict(color='rgba(0,255,0,0)')), # setting 0 opacity so we don't see the lines or markers
+                        hoverinfo='skip'), 
+              row=idx+1,
+              col=1,
+          )
+
+          # #Add annotated text to the middle of the training set rectangle -> this code works, but the positioning is a bit weird, so need to tweak
+          # middle_index = round(len(train_as_of_times)/2)
+          # x_middle = train_as_of_times[middle_index].date() + convert_str_to_relativedelta(training_label_timespan)
+          # fig.add_trace(
+          #     go.Scatter(x =[x_middle], y=[y-1], 
+          #               mode='text',
+          #               text="Training Data",
+          #               marker=dict(color='rgba(0,255,0,0)', line=dict(color='rgba(0,255,0,0)')), # setting 0 opacity so we don't see the lines 
+          #               hoverinfo='skip'),
+          #     row=idx+1,
+          #     col=1,
+          # )
+
+          # Test set rectangle
+
+          # Rectangle params
+          x0 = min(test_as_of_times).date()
+          x1 = max(test_as_of_times).date() + convert_str_to_relativedelta(test_label_timespan)
+          y = max(len(test_as_of_times), len(train_as_of_times))
+
+          fig.add_trace(
+              go.Scatter(x =[x0,x0,x1,x1,x0], y=[0,y,y,0,0], 
+                        fill='toself', fillcolor=test_rectangle_fill,
+                        showlegend=False,
+                        marker=dict(color='rgba(0,255,0,0)', line=dict(color='rgba(0,255,0,0)')), # setting 0 opacity so we don't see the lines 
+                        hoverinfo='skip'),
+              row=idx+1,
+              col=1,
+          )
+
+          # #Add annotated text to the test set rectangle
+          # middle_index = round(len(test_as_of_times)/2)
+          # x_middle = test_as_of_times[middle_index].date() + convert_str_to_relativedelta(test_label_timespan)
+          # fig.add_trace(
+          #     go.Scatter(x =[x_middle], y=[y-1],  
+          #               mode='text',
+          #               text="Test Data",
+          #               marker=dict(color='rgba(0,255,0,0)', line=dict(color='rgba(0,255,0,0)')), # setting 0 opacity so we don't see the lines 
+          #               hoverinfo='skip'),
+          #     row=idx+1,
+          #     col=1,
+          # )
+
+    fig.update_layout(height=500, width=900, showlegend=True)
+    fig.show()
+
+