Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for EnsembleMetrics #50

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
1eed1f8
Adding ensemble metrics to sidebars.
vmullig Feb 8, 2022
9156829
Updating main RosettaScripts page.
vmullig Feb 8, 2022
35cb717
Adding page for CentralTendency metric.
vmullig Feb 8, 2022
cc023e6
Updating auto-generated docs.
vmullig Feb 8, 2022
a18aca6
Adding auto-generated ensemble metric docs.
vmullig Feb 8, 2022
3e8cc7f
Updating CentralTendency ensemble metric doc.
vmullig Feb 8, 2022
0e5918f
Working on documentation for EnsembleMetrics.
vmullig Feb 10, 2022
b3730e1
Fleshing out EnsembleMetric documentation.
vmullig Feb 10, 2022
97d7892
Updating auto-generated docs.
vmullig Feb 10, 2022
ca9eebd
Adding note about accessing named values.
vmullig Feb 10, 2022
bbfdb09
Adding note about filtering.
vmullig Feb 10, 2022
9dd6e89
Revising text slightly.
vmullig Feb 10, 2022
9629b4b
Adding note about MPI mode.
vmullig Feb 10, 2022
3301900
Adding example of internal generation mode.
vmullig Feb 10, 2022
a5778bb
Adding note about multithreading.
vmullig Feb 10, 2022
a045962
Updating note about multi-threading.
vmullig Feb 11, 2022
eb93e85
Adding example for mode 3.
vmullig Feb 11, 2022
62a9158
Add EnsembleFilter docs to filter list.
vmullig Feb 11, 2022
d38608f
Moving some filters that were in the wrong folder.
vmullig Feb 11, 2022
11215f2
Adding documentation for EnsembleFilter.
vmullig Feb 11, 2022
b7e9ef7
Minor typos.
vmullig Feb 11, 2022
fc6e674
Expanding note about mode.
vmullig Feb 11, 2022
08dc81e
Minor tweak.
vmullig Feb 11, 2022
0b2cc38
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Feb 25, 2022
41dacf1
Updating CentralTendency and FragmentScore auto-generated docs.
vmullig Feb 25, 2022
22503cc
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Mar 11, 2022
99146f7
Updating auto-generated docs.
vmullig Mar 11, 2022
2549d1b
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Apr 27, 2022
f2cbac0
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Jul 2, 2022
802ce09
Merge remote-tracking branch 'origin/master' into vmullig/ensemble_me…
vmullig Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# CentralTendency Ensemble Metric
*Back to [[SimpleMetrics]] page.*
## CentralTendency Ensemble Metric

[[_TOC_]]

### Description

The Central Tendency metric accepts as input a real-valued [[SimpleMetric|SimpleMetrics]]. It then applies it to each pose in an ensemble, collecting a series of values. At reporting time, the metric computes measures of central tendency (mean, median, and mode), plus other descriptive statistics about the distribution of the measured value over the ensemble (standard deviation, standard error, min, max, range).

### Author and history

Created Tuesday, 8 February 2022 by Vikram K. Mulligan, Center for Computational Biology, Flatiron Institute ([email protected]). This was the first [[EnsembleMetric|EnsembleMetrics]] implemented

### Interface

[[include:ensemble_metric_CentralTendency_type]]

### Named values produced

Measure | Name (used for the [[EnsembleFilter]]) | Description
--------|----------------------------------------|------------
Mean | mean | The average of the values measured for the poses in the ensemble.
Median | median | When values measured from all of hte poses in the ensemble are listed in increasing order, this is the middle value. If the number of poses in the ensemble is even, the middle two values are averaged.
Mode | mode | The most frequently seen value in the values measured from the poses in the environment. If more than one value appears with equal frequency and this frequency is highest, the values are averaged.
Standard Deviation | stddev | Estimate of the standard deviation of the mean, defined as the sqrt( sum_i( S_i - mean )^2 / N ), where S_i is the ith sample, mean is the average of all the samples, and N is the number of samples.
Standard Error | stderr | Estimate of the standard error of the mean, defined by stddev / sqrt(N), where N is the number of samples.
Min | min | The minimum value seen.
Max | max | The maximum value seen.
Range | range | the largest value seen minus the smallest.

#### Note about mode

The mode of a set of floating-point numbers can be thrown off by floating-point error. For instance, two poses may have energies of -3.7641 kJ/mol, but the process of computing that energy may result in slightly different values at the 15th decimal point. This could prevent the filter from recognizing this is at the most frequent value. Mode is most useful as a metric when the "floating-point" values are actually integers (for instance, given a [[SimpleMetric|SimpleMetrics]] like the [[SelectedResidueCountMetric]], which returns integer counts).

##See Also

* [[SimpleMetrics]]: Available SimpleMetrics.
* [[EnsembleMetrics]]: Available EnsembleMetrics.
* [[I want to do x]]: Guide to choosing a tool in Rosetta.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@

* [[Filters|Filters-RosettaScripts]]

* [[Simple Metrics|SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Residue Selectors|ResidueSelectors]]

* [[PackerPalettes|PackerPalette]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@

* [[Residue Selectors|ResidueSelectors]]

* [[Simple Metrics|SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[PackerPalettes|PackerPalette]]

* [[Filters|Filters-RosettaScripts]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Filter | Description
**[[CompoundStatement|CompoundStatementFilter]]** | Uses previously defined filters with logical operations to construct a compound filter.
**[[CombinedValue|CombinedValueFilter]]** | Weighted sum of multiple filters.
**[[CalculatorFilter]]** | Combine multiple filters with a mathematical expression.
**[[EnsembleFilter]]** | Filter based, not on a property of a single pose, but on a property of an _ensemble_ of many poses.
**[[ReplicateFilter]]** | Repeat a filter multiple times and average.
**[[Boltzmann|BoltzmannFilter]]** | Boltzmann weighted sum of positive/negative filters.
**[[MoveBeforeFilter]]** | Apply a mover before applying the filter.
Expand Down
2 changes: 2 additions & 0 deletions scripting_documentation/RosettaScripts/Filters/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics | EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# EnsembleFilter
*Back to [[SimpleMetrics]] page.*
*Back to [[Filters | Filters-RosettaScripts]] page.*
## EnsembleFilter

Created by Vikram K. Mulligan ([email protected]) on 10 February 2022.

[[_TOC_]]

### Description

This filter takes as input an [[EnsembleMetric|EnsembleMetrics]] that has been used to evaluate some set of properties of an ensemble of filters, retrives a named floating-point value from the metric, and filters based on whether that value is greater than, equal to, or less than some threshold. (Note that [[EnsembleMetrics]] evaluate a property of a collection or _ensemble_ poses, not of a single pose. This makes this filter unusual: where most discard a trajectory based on the state of a single pose, this can discard a trajectory based on the state of large ensemble of poses -- for example, based on many sampled conformatinos of a single design.)


### Options

[[include:filter_SimpleMetricFilter_type]]

### Example:

In this example, we load one or more cyclic peptides (provided with the `-in:file:s` or `-in:file:l` commandline options), generate a conformational ensemble of slightly perturbed conformations for each peptide _in memory_, without writing all structures to disk, and perform ensemble analysis on that ensemble with the [[CentralTendency EnsembleMetric|CentralTendency]], filtering on the results with the EnsembleFilter. Only those peptides that have low-energy ensembles of perturbed conformations pass the filter.

```xml
<ROSETTASCRIPTS>
<!-- Example of using the EnsembleFilter to filter based on the properties of an ensemble of poses
generated from the current pose. -->
<SCOREFXNS>
<ScoreFunction name="r15" weights="ref2015.wts" />
</SCOREFXNS>
<MOVERS>
<!-- The movers that set up, perturb, and relax a cyclic peptide are set up here. We
later bundle the perturbation protocol in a ParsedProtocol: -->
<DeclareBond name="connect_termini" res1="8" res2="1" atom1="C" atom2="N" add_termini="true" />
<GeneralizedKIC name="perturb1" selector_scorefunction="r15" closure_attempts="200"
stop_when_n_solutions_found="1" selector="lowest_rmsd_selector"
>
<AddResidue res_index="3"/>
<AddResidue res_index="4"/>
<AddResidue res_index="5"/>
<AddResidue res_index="6"/>
<AddResidue res_index="7"/>
<SetPivots res1="3" atom1="CA" res2="5" atom2="CA" res3="7" atom3="CA" />
<AddPerturber effect="perturb_dihedral" >
<AddAtoms res1="3" atom1="N" res2="3" atom2="CA" />
<AddAtoms res1="3" atom1="CA" res2="3" atom2="C" />
<AddAtoms res1="4" atom1="N" res2="4" atom2="CA" />
<AddAtoms res1="4" atom1="CA" res2="4" atom2="C" />
<AddAtoms res1="5" atom1="N" res2="5" atom2="CA" />
<AddAtoms res1="5" atom1="CA" res2="5" atom2="C" />
<AddAtoms res1="6" atom1="N" res2="6" atom2="CA" />
<AddAtoms res1="6" atom1="CA" res2="6" atom2="C" />
<AddAtoms res1="7" atom1="N" res2="7" atom2="CA" />
<AddAtoms res1="7" atom1="CA" res2="7" atom2="C" />
<AddValue value="5.0"/>
</AddPerturber>
</GeneralizedKIC>
<GeneralizedKIC name="perturb2" selector_scorefunction="r15" closure_attempts="200"
stop_when_n_solutions_found="1" selector="lowest_rmsd_selector"
>
<AddResidue res_index="7"/>
<AddResidue res_index="1"/>
<AddResidue res_index="2"/>
<AddResidue res_index="3"/>
<AddResidue res_index="4"/>
<SetPivots res1="7" atom1="CA" res2="2" atom2="CA" res3="4" atom3="CA"></SetPivots>
<AddPerturber effect="perturb_dihedral" >
<AddAtoms res1="7" atom1="N" res2="7" atom2="CA" />
<AddAtoms res1="7" atom1="CA" res2="7" atom2="C" />
<AddAtoms res1="1" atom1="N" res2="1" atom2="CA" />
<AddAtoms res1="1" atom1="CA" res2="1" atom2="C" />
<AddAtoms res1="2" atom1="N" res2="2" atom2="CA" />
<AddAtoms res1="2" atom1="CA" res2="2" atom2="C" />
<AddAtoms res1="3" atom1="N" res2="3" atom2="CA" />
<AddAtoms res1="3" atom1="CA" res2="3" atom2="C" />
<AddAtoms res1="4" atom1="N" res2="4" atom2="CA" />
<AdmoverdAtoms res1="4" atom1="CA" res2="4" atom2="C" />
<AddValue value="5.0"/>
</AddPerturber>
</GeneralizedKIC>
<FastRelax name="frlx" repeats="1" scorefxn="r15" />
<!-- Bundling the perturbation steps together so that they can be passed
to the CentralTendency EnsembleMetric: -->
<ParsedProtocol name="ensemble_generating_protocol" >
<Add mover="perturb1" />
<Add mover="perturb2" />
<Add mover="frlx" />
</ParsedProtocol>
</MOVERS>
<SIMPLE_METRICS>
<!-- The SimpleMetric that will be passed to the CentralTendency EnsembleMetric: -->
<TotalEnergyMetric name="total_energy" scorefxn="r15" />
</SIMPLE_METRICS>
<ENSEMBLE_METRICS>
<!-- Setting up the EnsembleMetric with both a SimpleMetric and a
ParsedProtocol for generating the ensemble from a given pose: -->
<CentralTendency name="avg_energy" n_threads="0" real_valued_metric="total_energy"
output_mode="tracer_and_file" output_filename="report.txt"
ensemble_generating_protocol="ensemble_generating_protocol"
ensemble_generating_protocol_repeats="20"
/>
</ENSEMBLE_METRICS>
<FILTERS>
<!-- Set up the filter that can discard those peptides that yield an
ensemble with energy above a cutoff threshold: -->
<EnsembleFilter name="filter_on_avg_energy" ensemble_metric="avg_energy"
named_value="mean" filter_acceptance_mode="less_than_or_equal"
threshold="4.0"
/>
</FILTERS>
<PROTOCOLS>
<!-- Set up the peptide, but don't perturb it yet: -->
<Add mover="connect_termini" />
<!-- Accumulate data with the EnsembleMetric for every replicate of the
peturbation protocol (which in this case is run by the EnsembleMetric,
generating each member of the ensemble internally, in memory, without
exporting them): -->
<Add ensemble_metrics="avg_energy" />
<!-- Abandon the jobs that produce bad ensemble properties prior to
writing the structure back to disk: -->
<Add filter="filter_on_avg_energy" />
</PROTOCOLS>
<OUTPUT scorefxn="r15" />
</ROSETTASCRIPTS>
```

### See also

* [[EnsembleMetrics]]: Available SimpleMetrics
* [[SimpleMetrics]]: Available SimpleMetrics
* [[SimpleMetricFilter]]: Filter on an arbitrary SimpleMetric
* [[Movers|Movers-RosettaScripts]]: Available Movers
* [[I want to do x]]: Guide to choosing a Rosetta protocol.
2 changes: 2 additions & 0 deletions scripting_documentation/RosettaScripts/Movers/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
1 change: 1 addition & 0 deletions scripting_documentation/RosettaScripts/RosettaScripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fleishman SJ, Leaver-Fay A, Corn JE, Strauch EM, Khare SD, et al. (2011) Rosetta
- [[JumpSelectors |JumpSelectors]]
- [[PackerPalettes|PackerPalette]]
- [[SimpleMetrics]]
- [[EnsembleMetrics]]

---------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@

* [[Residue Selectors|ResidueSelectors]]

* [[Simple Metrics|SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[PackerPalettes|PackerPalette]]

* [[Task Operations|TaskOperations-RosettaScripts]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
* [[Task Operations|TaskOperations-RosettaScripts]]

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

Expand Down
2 changes: 2 additions & 0 deletions scripting_documentation/RosettaScripts/_Sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[FeaturesReporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* [[Simple Metrics | SimpleMetrics]]

* [[Ensemble Metrics|EnsembleMetrics]]

* [[Filters|Filters-RosettaScripts]]

* [[Features Reporters|Features-reporter-overview]]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!-- THIS IS AN AUTOGENERATED FILE: Don't edit it directly, instead change the schema definition in the code itself. -->

_Autogenerated Tag Syntax Documentation:_

---
An ensemble metric that takes a real-valued simple metric, applies it to all poses in an ensemble, and calculates measures of central tendency (mean, median, mode) and other statistics about the distribution (standard deviation, standard error of the mean, min, max, range, etc.). Values that this ensemble metric returns are referred to in scripts as: mean, median, mode, stddev, stderr, min, max, and range.

References and author information for the CentralTendency ensemble metric:

CentralTendencyEnsembleMetric SimpleMetric's author(s):
Vikram K. Mulligan, Systems Biology group, Center for Computational Biology, Flatiron Institute [[email protected]] (Created the ensemble metric framework and wote the CentralTendency ensemble metric.)

```xml
<CentralTendency name="(&string;)" label_prefix="(&string;)"
label_suffix="(&string;)" output_mode="(tracer &string;)"
output_filename="(&string;)" ensemble_generating_protocol="(&string;)"
ensemble_generating_protocol_repeats="(1 &non_negative_integer;)"
n_threads="(1 &non_negative_integer;)"
use_additional_output_from_last_mover="(false &bool;)"
real_valued_metric="(&string;)" />
```

- **label_prefix**: If provided, this prefix is prepended to the label for this ensemble metric (with an underscore after the prefix and before the ensemble metric name).
- **label_suffix**: If provided, this suffix is appended to the label for this ensemble metric (with an underscore after the ensemble metric name and before the suffix).
- **output_mode**: The output mode for reports from this ensemble metric. Default is 'tracer'. Allowed modes are: 'tracer', 'tracer_and_file', or 'file'.
- **output_filename**: The file to which the ensemble metric report will be written if output mode is 'tracer_and_file' or 'file'. Note that this filename will have the job name and number prepended so that each report is unique.
- **ensemble_generating_protocol**: An optional ParsedProtocol or other mover for generating an ensemble from the current pose. This protocol will be applied repeatedly (ensemble_generating_protocol_repeats times) to generate the ensemble of structures. Each generated pose will be measured by this metric, then discarded. The ensemble properties are then reported. If not provided, the current pose is measured and the report will be produced later (e.g. at termination with the JD2 rosetta_scripts application).
- **ensemble_generating_protocol_repeats**: The number of times that the ensemble_generating_protocol is applied. This is the maximum number of structures in the ensemble (though the actual number may be smaller if the protocol contains filters or movers that can fail for some attempts). Only used if an ensemble-generating protocol is provided with the ensemble_generating_protocol option. Defaults to 1.
- **n_threads**: The number of threads to request for generating ensembles in parallel. This is only used in multi-threaded compilations of Rosetta (compiled with extras=cxx11thread), and only when an ensemble-generating protocol is provided with the ensemble_generating_protocol option. A value of 0 means to use all available threads. In single-threaded builds, this must be set to 0 or 1. Defaults to 1. NOTE THAT MULTI-THREADING IS HIGHLY EXPERIMENTAL AND LIKELY TO FAIL FOR MANY ENSEMBLE-GENERATING PROTOCOLS. When in doubt, leave this set to 1.
- **use_additional_output_from_last_mover**: If true, this ensemble metric will use the additional output from the previous pose (assuming the previous pose generates multiple outputs) as the ensemble, analysing it and producing a report immediately. If false, then it will behave normally. False by default.
- **real_valued_metric**: (REQUIRED) The name of a real-valued simple metric defined previously. Required input.

---
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<!-- THIS IS AN AUTOGENERATED FILE: Don't edit it directly, instead change the schema definition in the code itself. -->

_Autogenerated Tag Syntax Documentation:_

---
A filter that filters based on some named float-valued property measured by an EnsembleMetric. Note that the value produced by the EnsembleMetric is based on an ensemble generated earlier in the protocol, presumably from the pose on which we are currently filtering.

References and author information for the EnsembleFilter filter:

EnsembleFilter Filter's author(s):
Vikram K. Mulligan, Systems Biology Group, Center for Computational Biology, Flatiron Institute. [[email protected]] (Wrote the EnsembleFilter.)

```xml
<EnsembleFilter name="(&string;)" ensemble_metric="(&string;)"
named_value="(&string;)" threshold="(0.0 &real;)"
filter_acceptance_mode="(less_than_or_equal &string;)"
confidence="(1.0 &real;)" />
```

- **ensemble_metric**: (REQUIRED) A previously-defined EnsembleMetric that produces at least one floating-point value. This filter will filter a pose based on that value.
- **named_value**: (REQUIRED) A named floating-point value produced by the EnsembleMetric, on which this filter will filter.
- **threshold**: The threshold for rejecting a pose.
- **filter_acceptance_mode**: The criterion for ACCEPTING a pose. For instance, if the value returned by the ensemble metric is greater than the threshold, and the mode is 'less_than_or_equal' (the default mode), then the pose is rejected. Allowed modes are: 'greater_than', 'less_than', 'greater_than_or_equal', 'less_than_or_equal', 'equal', and 'not_equal'.
- **confidence**: Probability that the pose will be filtered out if it does not pass this Filter

---
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Filter based on any score that can be calculated in fragment_picker.
outputs_name="(pose &string;)" csblast="(&string;)"
blast_pgp="(&string;)" placeholder_seqs="(&string;)"
sparks-x="(&string;)" sparks-x_query="(&string;)" psipred="(&string;)"
vall_path="(/scratch/benchmark/W.hojo-1/rosetta.Hojo-1/master/main/database//sampling/vall.jul19.2011.gz &string;)"
vall_path="(/home/vikram/rosetta_devcopy/Rosetta/main/database//sampling/vall.jul19.2011.gz &string;)"
frags_scoring_config="(&string;)" n_frags="(200 &non_negative_integer;)"
n_candidates="(1000 &non_negative_integer;)"
print_to_pdb="(false &xs:boolean;)"
Expand Down
Loading