You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found there is some problem with data preprocessing functions.
The problem is when we want to get result from our model for sequences and its id, when we use lib data_pipeline function for preprocessing our data. Ok, so to the point. data_pipeline function in wtte.pipelines module seems to return seq_ids in wrong order. So it causes problem with seq_index-to-seq_id mapping. The bug is in df_to_array function in its second instruction line: unique_ids = list(grouped.groups.keys()). Grouped seqneces aren't ordered by its ids so padded feature vector based on it can have different order than seq_ids returned from data_pipeline function. Its because data_pipeline returns sequences ordered by id_col in passed padnas dataframe, but df_to_array creates features sequences based on pandas groupby order which may be different, like in my case. My suggestion to fix this bug (the simplest one) is just to change unique_ids = list(grouped.groups.keys()) to unique_ids = df[id_col].unique() in df_to_array.
The text was updated successfully, but these errors were encountered:
Hi, I found there is some problem with data preprocessing functions.
The problem is when we want to get result from our model for sequences and its id, when we use lib data_pipeline function for preprocessing our data. Ok, so to the point. data_pipeline function in wtte.pipelines module seems to return seq_ids in wrong order. So it causes problem with seq_index-to-seq_id mapping. The bug is in df_to_array function in its second instruction line: unique_ids = list(grouped.groups.keys()). Grouped seqneces aren't ordered by its ids so padded feature vector based on it can have different order than seq_ids returned from data_pipeline function. Its because data_pipeline returns sequences ordered by id_col in passed padnas dataframe, but df_to_array creates features sequences based on pandas groupby order which may be different, like in my case. My suggestion to fix this bug (the simplest one) is just to change unique_ids = list(grouped.groups.keys()) to unique_ids = df[id_col].unique() in df_to_array.
The text was updated successfully, but these errors were encountered: