-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add distance selector #78
Conversation
update distance selector
update active learning for distance selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, but the documentation at this stage is either not clear or missing completely. It needs to be added to keep the code understandable. Also, some names of the variables are not self-explanatory and should be improved.
mlptrain/training/active.py
Outdated
if traj.final_frame.energy.true is None: | ||
traj.final_frame.single_point(method_name, n_cores=n_cores) | ||
if selector.check: | ||
logger.warning('distance selector, do backtracking') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide please a better logger message on what is happening. Not clear why backtracking is needed here. It looks like it is always performed with distance selector, is it the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it always performed with distance selector, changed to "currently applying distance selector, to avoid un-physical structures, do backtracking in the trajectory to find the first configuration in {selector.n_backtrack} steps recognised as outlier"
mlptrain/training/active.py
Outdated
back_traj.append(i) | ||
|
||
for i, frame in enumerate(back_traj): | ||
logger.info(f'check {i} th config') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please improve logger message. "Check" sounds a bit vague. What is being checked? Also, gerundium would work better, i.e., "checking" instead of check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to 'Starting to check {i} th configuration to determine whether it is the first configurations selected by the distance selector'
mlptrain/training/selection.py
Outdated
|
||
def Novelty (configuration: 'mlptrain.Configuration', | ||
configurations:'mlptrain.ConfigurationSet', | ||
d_reduaction: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly typo: d_reduaction -> d_reduction? Also not clear from the name what d is. Variable dim_reduction could work better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mlptrain/training/selection.py
Outdated
m1 = pca.fit_transform(m1) | ||
v1 = pca.transform(v1) | ||
|
||
clf = LocalOutlierFactor(n_neighbors=15, metric = distance_metric, novelty=True, contamination = 0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N_neighbors should be user user-defined variable. It is also not clear where n_neighbors and contamination defaults are coming from. Should be explained in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mlptrain/training/selection.py
Outdated
Selection criteria based on LOF | ||
----------------------------------------------------------------------- | ||
Arguments: | ||
pca: whether to do dimenstional reduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend the explanation of the class. Explain also why is PCA used, what is it doing and what are the advantages/disadvantages - will it increase the cost of the selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mlptrain/training/selection.py
Outdated
def Novelty (configuration: 'mlptrain.Configuration', | ||
configurations:'mlptrain.ConfigurationSet', | ||
d_reduaction: bool = False, | ||
distance_metric: str = "euclidean"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add return type and document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Veronika Juraskova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, suggested minor modifications of the function descriptions. I will commit them and merge the PR to main.
Modification of function descriptions
add distance selector, also relevant modification in AL to make sure no unphysical configurations will be selected