This Python Jupyter notebook outputs PDBs with the escape scores as B factors.
Though we will want more elaborate series of commands to codify our visualization of these RBD structures colored by escape, the series of commands below, when executed in a PyMol
session with one of these PDBs open, will color the RBD surface according to escape scores.
For example, to normalize each structure colored by the max mut effect, we might want to have a white to red scale from 0 to 1:
create RBD, chain E
hide all; show cartoon, chain A; color gray20, chain A
show surface, RBD; spectrum b, white red, RBD, minimum=0, maximum=1
For something like total escape, maybe we want each structure normalized to the maximum total escape in that structure, in which case we can just leave the maximum argument empty:
create RBD, chain E
hide all; show cartoon, chain A; color gray20, chain A
show surface, RBD; spectrum b, white red, RBD, minimum=0
We write PDBs with B factors indicating the total site escape and maximum mutation escape at each site, and the same with these values normalized to the maximum for the full structure (the latter are easier to process in Chimera
).
First, import Python modules:
import collections
import copy
import os
import warnings
import Bio.PDB
import dms_variants.pdb_utils
from IPython.display import display, HTML
import pandas as pd
import yaml
Read the configuration file:
with open('config.yaml') as f:
config = yaml.safe_load(f)
Read configuration for outputting PDBs:
print(f"Reading PDB output configuration from {config['output_pdbs_config']}")
with open(config['output_pdbs_config']) as f:
output_pdbs_config = yaml.safe_load(f)
Reading PDB output configuration from data/output_pdbs_config.yaml
Make output directory:
os.makedirs(config['pdb_outputs_dir'], exist_ok=True)
Read escape fractions and compute total and maximum escape at each site, and also the total and maximum escape at each site normalized to be between 0 and 1 for each selection:
print(f"Reading escape fractions from {config['escape_fracs']}")
escape_fracs = (
pd.read_csv(config['escape_fracs'])
.query('library == "average"')
.assign(site=lambda x: x['label_site'])
.groupby(['selection', 'site'])
.aggregate(total_escape=pd.NamedAgg(config['mut_metric'], 'sum'),
max_escape=pd.NamedAgg(config['mut_metric'], 'max')
)
.reset_index()
.assign(max_total_escape=lambda x: x.groupby('selection')['total_escape'].transform('max'),
max_max_escape=lambda x: x.groupby('selection')['max_escape'].transform('max'),
norm_total_escape=lambda x: x['total_escape'] / x['max_total_escape'],
norm_max_escape=lambda x: x['max_escape'] / x['max_max_escape'])
)
display(HTML(escape_fracs.head().to_html(index=False)))
Reading escape fractions from results/escape_scores/escape_fracs.csv
selection | site | total_escape | max_escape | max_total_escape | max_max_escape | norm_total_escape | norm_max_escape |
---|---|---|---|---|---|---|---|
CAB-A17_139 | 331 | 0.016274 | 0.001090 | 0.577178 | 0.5368 | 0.028196 | 0.002031 |
CAB-A17_139 | 332 | 0.020121 | 0.002068 | 0.577178 | 0.5368 | 0.034861 | 0.003852 |
CAB-A17_139 | 333 | 0.017871 | 0.001585 | 0.577178 | 0.5368 | 0.030964 | 0.002953 |
CAB-A17_139 | 334 | 0.018437 | 0.001169 | 0.577178 | 0.5368 | 0.031944 | 0.002178 |
CAB-A17_139 | 335 | 0.018819 | 0.001354 | 0.577178 | 0.5368 | 0.032605 | 0.002522 |
Now map the escape metrics to the B-factors. For sites where no mutations have escape scores:
- In the RBD chain(s) fill the B-factor for non-normalized scores to -1 to enable collapsing to zero or callout as a a separate class, depending how we choose to color sites for different visualizations. For normalized scores, fill to 0.
- In other chains, always fill missing B factors to 0.
for name, specs in output_pdbs_config.items():
print(f"\nMaking PDB mappings for {name} to {specs['pdbfile']}")
assert os.path.isfile(specs['pdbfile'])
# get escape fracs just for conditions of interest
if isinstance(specs['conditions'], str) and specs['conditions'].upper() == 'ALL':
conditions = escape_fracs['selection'].unique().tolist()
else:
assert isinstance(specs['conditions'], list)
conditions = specs['conditions']
print(f"Making mappings for {len(conditions)} conditions.")
df = escape_fracs.query('selection in @conditions')
# get chains
assert isinstance(specs['chains'], list)
print('Mapping to the following chains: ' + ', '.join(specs['chains']))
df = pd.concat([df.assign(chain=chain) for chain in specs['chains']], ignore_index=True)
# make mappings for each condition and metric
for condition, df in df.groupby('selection'):
print(f" Writing B-factor re-assigned PDBs for {condition} to:")
for metric in ['total_escape', 'max_escape', 'norm_total_escape', 'norm_max_escape']:
# what do we assign to missing sites?
missing_metric = collections.defaultdict(lambda: 0) # non-RBD chains always fill to zero
for chain in specs['chains']:
if 'norm' in metric:
missing_metric[chain] = 0 # missing sites in RBD are 0 for normalized metric PDBs
else:
missing_metric[chain] = -1 # missing sites in RBD are -1 for non-normalized metric PDBs
fname = os.path.join(config['pdb_outputs_dir'], f"{condition}_{name}_{metric}.pdb")
print(f" {fname}")
dms_variants.pdb_utils.reassign_b_factor(input_pdbfile=specs['pdbfile'],
output_pdbfile=fname,
df=df,
metric_col=metric,
missing_metric=missing_metric)
Making PDB mappings for 6m0j to data/pdbs/6M0J.pdb
Making mappings for 3 conditions.
Mapping to the following chains: E
Writing B-factor re-assigned PDBs for CAB-A17_139 to:
results/pdb_outputs/CAB-A17_139_6m0j_total_escape.pdb
results/pdb_outputs/CAB-A17_139_6m0j_max_escape.pdb
results/pdb_outputs/CAB-A17_139_6m0j_norm_total_escape.pdb
results/pdb_outputs/CAB-A17_139_6m0j_norm_max_escape.pdb
Writing B-factor re-assigned PDBs for CAB-A49_222 to:
results/pdb_outputs/CAB-A49_222_6m0j_total_escape.pdb
results/pdb_outputs/CAB-A49_222_6m0j_max_escape.pdb
results/pdb_outputs/CAB-A49_222_6m0j_norm_total_escape.pdb
results/pdb_outputs/CAB-A49_222_6m0j_norm_max_escape.pdb
Writing B-factor re-assigned PDBs for CAB-C19_184 to:
results/pdb_outputs/CAB-C19_184_6m0j_total_escape.pdb
results/pdb_outputs/CAB-C19_184_6m0j_max_escape.pdb
results/pdb_outputs/CAB-C19_184_6m0j_norm_total_escape.pdb
results/pdb_outputs/CAB-C19_184_6m0j_norm_max_escape.pdb
Making PDB mappings for BA1_A17 to data/pdbs/BA1_A17.pdb
Making mappings for 1 conditions.
Mapping to the following chains: A
Writing B-factor re-assigned PDBs for CAB-A17_139 to:
results/pdb_outputs/CAB-A17_139_BA1_A17_total_escape.pdb
results/pdb_outputs/CAB-A17_139_BA1_A17_max_escape.pdb
results/pdb_outputs/CAB-A17_139_BA1_A17_norm_total_escape.pdb
results/pdb_outputs/CAB-A17_139_BA1_A17_norm_max_escape.pdb