Ant metrics speedups #402

jsdillon · 2021-11-21T17:47:41Z

This PR has a number of changes that make ant_metrics go faster:

Allows for subsets (or single) files to be loaded at once, saving memory while avoiding partial i/o along the baseline axis, which is slow.
Replaces dictionary-based storage and calculation of metrics with a matrix-based version with np.nans to represent flagged antennas.
Copies the output file rather than reproducing it each time.

codecov · 2021-11-21T17:56:00Z

Codecov Report

Merging #402 (61f919e) into master (64734f5) will decrease coverage by 0.02%.
The diff coverage is 98.97%.

@@            Coverage Diff             @@
##           master     #402      +/-   ##
==========================================
- Coverage   97.08%   97.05%   -0.03%     
==========================================
  Files          10       10              
  Lines        3288     3299      +11     
==========================================
+ Hits         3192     3202      +10     
- Misses         96       97       +1

Impacted Files	Coverage Δ
hera_qm/ant_metrics.py	`97.60% <98.96%> (-0.32%)`	⬇️
hera_qm/utils.py	`97.27% <100.00%> (+<0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64734f5...61f919e. Read the comment docs.

…dead antenna are not included

probably should find a way to test this functionality

jsdillon · 2021-11-23T03:47:53Z

For the record, while this PR tries to maintain the same algorithm I believe it is slightly different. Here's the output on four files from 2459122 before this PR (it took 22:30 with Nbls_per_load = 1000):

On iteration 1 we flag (93, 'Jee') with corr metric z of 0.02943618026168786.
On iteration 2 we flag (93, 'Jnn') with corr metric z of 0.0328214014831738.
On iteration 3 we flag (137, 'Jee') with corr metric z of 0.034222480949836216.
On iteration 4 we flag (155, 'Jnn') with corr metric z of 0.03435425748370755.
On iteration 5 we flag (121, 'Jnn') with corr metric z of 0.034578756824210716.
On iteration 6 we flag (123, 'Jnn') with corr metric z of 0.031120731769729754.
On iteration 7 we flag (84, 'Jnn') with corr metric z of 0.02723081545168903.
On iteration 8 we flag (121, 'Jee') with corr metric z of 0.03611046419979788.
On iteration 9 we flag (123, 'Jee') with corr metric z of 0.031510293552260954.
On iteration 10 we flag (84, 'Jee') with corr metric z of 0.027172671249161163.
On iteration 11 we flag (65, 'Jnn') with corr metric z of 0.036428282104434453.
On iteration 12 we flag (155, 'Jee') with corr metric z of 0.03692985521919178.
On iteration 13 we flag (116, 'Jnn') with corr metric z of 0.039510972653374346.
On iteration 14 we flag (37, 'Jnn') with corr metric z of 0.039914356351625845.
On iteration 15 we flag (38, 'Jnn') with corr metric z of 0.039191274619654076.
On iteration 16 we flag (137, 'Jnn') with corr metric z of 0.042726420274426545.
On iteration 17 we flag (142, 'Jnn') with corr metric z of 0.04441611847756106.
On iteration 18 we flag (116, 'Jee') with corr metric z of 0.04529112727166686.
On iteration 19 we flag (101, 'Jnn') with corr metric z of 0.04900494635535518.
On iteration 20 we flag (12, 'Jnn') with corr metric z of 0.05495740135505144.
On iteration 21 we flag (12, 'Jee') with corr metric z of 0.05576750421963197.
On iteration 22 we flag (180, 'Jnn') with corr metric z of 0.06829538068956799.
On iteration 23 we flag (87, 'Jnn') with cross-pol corr metric of -0.1403386730718551.
On iteration 23 we flag (87, 'Jee') with cross-pol corr metric of -0.1403386730718551.
On iteration 24 we flag (51, 'Jnn') with cross-pol corr metric of -0.12715766535816284.
On iteration 24 we flag (51, 'Jee') with cross-pol corr metric of -0.12715766535816284.
On iteration 25 we flag (180, 'Jee') with corr metric z of 0.36644834695844114.
On iteration 26 we flag (23, 'Jee') with corr metric z of 0.3884647108082185.
Now saving results to ./zen.2459122.34011.sum.ant_metrics.hdf5
Now saving results to ./zen.2459122.34034.sum.ant_metrics.hdf5
Now saving results to ./zen.2459122.34056.sum.ant_metrics.hdf5
Now saving results to ./zen.2459122.34079.sum.ant_metrics.hdf5

And here's the output from after this PR (it look only 1:45 with Nfiles_per_load=1):

On iteration 0 we flag (65, 'Jee') with corr metric z of 0.026201756308626933.
On iteration 1 we flag (93, 'Jee') with corr metric z of 0.02943618026168786.
On iteration 2 we flag (93, 'Jnn') with corr metric z of 0.032821401483173786.
On iteration 3 we flag (137, 'Jee') with corr metric z of 0.034222480949836216.
On iteration 4 we flag (155, 'Jnn') with corr metric z of 0.03435425748370755.
On iteration 5 we flag (121, 'Jnn') with corr metric z of 0.03457875682421072.
On iteration 6 we flag (123, 'Jnn') with corr metric z of 0.031120731769729768.
On iteration 7 we flag (84, 'Jnn') with corr metric z of 0.02723081545168903.
On iteration 8 we flag (121, 'Jee') with corr metric z of 0.03611046419979788.
On iteration 9 we flag (123, 'Jee') with corr metric z of 0.031510293552260954.
On iteration 10 we flag (84, 'Jee') with corr metric z of 0.02717267124916116.
On iteration 11 we flag (65, 'Jnn') with corr metric z of 0.03642828210443444.
On iteration 12 we flag (155, 'Jee') with corr metric z of 0.03692985521919179.
On iteration 13 we flag (116, 'Jnn') with corr metric z of 0.03951097265337433.
On iteration 14 we flag (37, 'Jnn') with corr metric z of 0.03991435635162586.
On iteration 15 we flag (38, 'Jnn') with corr metric z of 0.03919127461965408.
On iteration 16 we flag (137, 'Jnn') with corr metric z of 0.04272642027442654.
On iteration 17 we flag (142, 'Jnn') with corr metric z of 0.04441611847756106.
On iteration 18 we flag (116, 'Jee') with corr metric z of 0.04529112727166687.
On iteration 19 we flag (101, 'Jnn') with corr metric z of 0.04900494635535517.
On iteration 20 we flag (12, 'Jnn') with corr metric z of 0.054957401355051456.
On iteration 21 we flag (12, 'Jee') with corr metric z of 0.05576750421963195.
On iteration 22 we flag (180, 'Jnn') with corr metric z of 0.06829538068956797.
On iteration 23 we flag (87, 'Jee') with cross-pol corr metric of -0.15980675829479712.
On iteration 23 we flag (87, 'Jnn') with cross-pol corr metric of -0.15980675829479712.
On iteration 24 we flag (51, 'Jee') with cross-pol corr metric of -0.1627431573617316.
On iteration 24 we flag (51, 'Jnn') with cross-pol corr metric of -0.1627431573617316.
On iteration 25 we flag (180, 'Jee') with corr metric z of 0.3664483469584411.
On iteration 26 we flag (23, 'Jee') with corr metric z of 0.3884647108082183.
Now saving results to ./zen.2459122.34011.sum.ant_metrics.hdf5
Now saving results to ./zen.2459122.34034.sum.ant_metrics.hdf5
Now saving results to ./zen.2459122.34056.sum.ant_metrics.hdf5
Now saving results to ./zen.2459122.34079.sum.ant_metrics.hdf5

The differences in the corr_metric zscores are entirely attributable to the change to loading whole files and then computing the correlation statistics, rather than loading baselines for all files simultaneously. This speeds up I/O significantly. There also appeared to be a small (~few %) difference in the z-scores for the cross-pol metrics even when performed on a single file. However, since same antennas are being flagged in the same order, I'm inclined to trust the newer, simpler algorithm more.

dstorer

I did a more careful read-through and I feel comfortable merging this. I'll spend some time tomorrow updating a few of the tests to get a bit better coverage, but I think we can go ahead and start implementing this version now.

jsdillon · 2021-11-23T15:28:46Z

Thanks @dstorer! I've made an issue for adding more tests for you: #403.

Happy to review when you're ready!

see HERA-Team/hera_qm#402

jsdillon added 6 commits November 21, 2021 09:25

Add option for _load_corr_stats to load individual or groups of files

63db0c5

pass through parameters from init

87467c6

pass through new params in main function

416dd16

add Nfiles_per_load to argparser

4f2d13b

add test that calls Nfiles_per_load explicitly

b029505

copy files instead of re-running save_antenna_metrics

138b7eb

jsdillon marked this pull request as draft November 21, 2021 17:47

jsdillon added 17 commits November 21, 2021 10:10

update python versions tested to match pyuvdata

762d8cb

pre-cast as DataContainer

ba93a3a

condense saving of hera_cal libraries to AntennaMetrics object

2f7029a

change _load_corr_stats to _load_corr_matrices

1d98ba8

fix constructor and add sorts for consistency

79df50c

rework find_totally_dead_ants

574960b

add _flag_corr_matrices

caf524e

add _corr_metrics_per_ant

d1ff5c2

add _corr_cross_pol_metrics_per_ant

60f6866

update to use new matrix format

2fc8347

remove old, unused metrics calculation

59f7de6

use separate matrices for xpol calculation so that antennas with one …

c44a1c5

…dead antenna are not included

make sure to save finite final metrics

7861657

comment out old tests for now

b8e5abb

probably should find a way to test this functionality

fix test

5ba8c71

fix test

7fe63ee

comment out test that needs replacing

61f919e

jsdillon requested a review from dstorer November 23, 2021 03:37

jsdillon marked this pull request as ready for review November 23, 2021 03:37

dstorer approved these changes Nov 23, 2021

View reviewed changes

jsdillon mentioned this pull request Nov 23, 2021

Add new unit tests for ant_metrics #403

Open

jsdillon merged commit 1d2d156 into master Nov 23, 2021

jsdillon deleted the ant-metrics-speedups branch November 23, 2021 15:28

jsdillon added a commit to HERA-Team/hera_pipelines that referenced this pull request Nov 23, 2021

Update H5C RTP in response to ant_metrics speedup

fe13636

see HERA-Team/hera_qm#402

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ant metrics speedups #402

Ant metrics speedups #402

jsdillon commented Nov 21, 2021 •

edited

Loading

codecov bot commented Nov 21, 2021 •

edited

Loading

jsdillon commented Nov 23, 2021

dstorer left a comment

jsdillon commented Nov 23, 2021

Ant metrics speedups #402

Ant metrics speedups #402

Conversation

jsdillon commented Nov 21, 2021 • edited Loading

codecov bot commented Nov 21, 2021 • edited Loading

Codecov Report

jsdillon commented Nov 23, 2021

dstorer left a comment

Choose a reason for hiding this comment

jsdillon commented Nov 23, 2021

jsdillon commented Nov 21, 2021 •

edited

Loading

codecov bot commented Nov 21, 2021 •

edited

Loading