Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add distance selector #78

Merged
merged 30 commits into from
Feb 14, 2024
Merged

add distance selector #78

merged 30 commits into from
Feb 14, 2024

Conversation

Hanwen1018
Copy link
Member

add distance selector, also relevant modification in AL to make sure no unphysical configurations will be selected

Copy link
Member

@juraskov juraskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but the documentation at this stage is either not clear or missing completely. It needs to be added to keep the code understandable. Also, some names of the variables are not self-explanatory and should be improved.

if traj.final_frame.energy.true is None:
traj.final_frame.single_point(method_name, n_cores=n_cores)
if selector.check:
logger.warning('distance selector, do backtracking')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide please a better logger message on what is happening. Not clear why backtracking is needed here. It looks like it is always performed with distance selector, is it the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it always performed with distance selector, changed to "currently applying distance selector, to avoid un-physical structures, do backtracking in the trajectory to find the first configuration in {selector.n_backtrack} steps recognised as outlier"

back_traj.append(i)

for i, frame in enumerate(back_traj):
logger.info(f'check {i} th config')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please improve logger message. "Check" sounds a bit vague. What is being checked? Also, gerundium would work better, i.e., "checking" instead of check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to 'Starting to check {i} th configuration to determine whether it is the first configurations selected by the distance selector'

mlptrain/training/active.py Outdated Show resolved Hide resolved
mlptrain/training/selection.py Show resolved Hide resolved

def Novelty (configuration: 'mlptrain.Configuration',
configurations:'mlptrain.ConfigurationSet',
d_reduaction: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly typo: d_reduaction -> d_reduction? Also not clear from the name what d is. Variable dim_reduction could work better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

m1 = pca.fit_transform(m1)
v1 = pca.transform(v1)

clf = LocalOutlierFactor(n_neighbors=15, metric = distance_metric, novelty=True, contamination = 0.2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N_neighbors should be user user-defined variable. It is also not clear where n_neighbors and contamination defaults are coming from. Should be explained in a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Selection criteria based on LOF
-----------------------------------------------------------------------
Arguments:
pca: whether to do dimenstional reduction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend the explanation of the class. Explain also why is PCA used, what is it doing and what are the advantages/disadvantages - will it increase the cost of the selector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def Novelty (configuration: 'mlptrain.Configuration',
configurations:'mlptrain.ConfigurationSet',
d_reduaction: bool = False,
distance_metric: str = "euclidean"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add return type and document it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mlptrain/training/selection.py Show resolved Hide resolved
mlptrain/training/selection.py Show resolved Hide resolved
@juraskov juraskov self-requested a review December 16, 2023 00:07
Copy link
Member

@juraskov juraskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, suggested minor modifications of the function descriptions. I will commit them and merge the PR to main.

mlptrain/training/selection.py Outdated Show resolved Hide resolved
mlptrain/training/selection.py Outdated Show resolved Hide resolved
mlptrain/training/selection.py Outdated Show resolved Hide resolved
Modification of function descriptions
@juraskov juraskov merged commit c3f9c7e into duartegroup:main Feb 14, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants