Skip to content

Commit

Permalink
Fix Max lookback issue (#178)
Browse files Browse the repository at this point in the history
* Do not cross metric day boundaries.

* Merge day boundary (#146)

* Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* Check point merge (#147)

* Do not cross metric day boundaries.

* add log file arg delete empty directories when done

* Tweak remove empty dir removal

* Tweak remove empty dir removal again

* Merge day boundary (#146)

* Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* rebase agaist last approved PR

* rebase against last approved PR

* rebase against last approved PR again, fix flake8

* Fix a bug in remove empty directories.

Co-authored-by: Joseph Areeda <[email protected]>

* Merge day boundary (#146)

* Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* minor doc changes

* Fix a bug where an xml.gz file could get compressed again in merge-with-gaps

* Fix a double gzip of ligolw files (#151)

* Do not cross metric day boundaries.

* Merge day boundary (#146)

* Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* Check point merge (#147)

* Do not cross metric day boundaries.

* add log file arg delete empty directories when done

* Tweak remove empty dir removal

* Tweak remove empty dir removal again

* Merge day boundary (#146)

* Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* rebase agaist last approved PR

* rebase against last approved PR

* rebase against last approved PR again, fix flake8

* Fix a bug in remove empty directories.

Co-authored-by: Joseph Areeda <[email protected]>

* Merge day boundary (#146)

* Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this

* Do not merge files if they overlap "metric days"

* Do not cross metric day boundaries.

Co-authored-by: Joseph Areeda <[email protected]>

* minor doc changes

* Fix a bug where an xml.gz file could get compressed again in merge-with-gaps

* Implement a periodic vacate to address permanent D-state (uninterupptible wait) causing jobs to fail to complete

* Always create a log file. If not specified put one in the output directory

* Fix a problem with periodic vacate.

* Up the periodic vacate time to 3 hrs

* Found a job killing typo

* Add time limits to post processing also

* Don't save segments.txt file if no sgments founds because we don't know if it's an issue of not finding them or a valid not analyzable state.

* disable periodic vacate to demo the problem.

* Fix reported version in some utilities. Only update segments.txt if omicron is actually run.

* Clarify relative imports. and add details to a few log messages

* Resolve flake8 issues

---------

Co-authored-by: Joseph Areeda <[email protected]>

* Resolve flake8 issues

* Update log format to use human readble date/time instead of gps
tweak logging to better underst guardian channel usage-

* remove old setup.cfg

* Work vonpytest failures. The remaining errors are the result of omicron segfaults if environment variable not set

* missing blank line, from flake8

* Fix a problem wit hmax-online-lookback not working properly in all pathes. Add some uman readable date/times to gps messages

* Fix logging problems from different gps time objects

* Better logging why online effort did not run

* Up default lookback window to 40 min

* Up default maximum lookback window to 60 min. Better logging of why we did not run.

* Fix flake8 and more logging updates

* Fix flake8 and more logging updates

* More logging updates

* More logging updates, paths through could cause error

* Trap and print errors from main()

* fix dag submission command

* add smart postscript to allow retries befor ignoring errors

* tst version of scitokens and smart post script

* tst version of scitokens and smart post script

* add arg to specify auth type (x5099, vault or apissuer)

* memopry units in the wrong place

* memory units in the wrong place, condor_run

* flake8 nit picked

* Again try to get periodic_release and periodic_remove correct

* Sort console scripts

* Typo in periodic_remove

* Better error message when programs not available

* implement cona run for all jobs in dag

* conda run complications with  cvmfs

* archive.py deals with renamed trigger files

* archive.py deals with renamed trigger files take 2

* condor run needed in all scripts.

* minor logging changes

* working on archive  issues

* working on archive  issues, keep "temporary" files to help debugging

* more logging

* Default max lookback changed to 30min. more logging tweaks

* Add omicron_utils to insta;; requirements

* Work on build and test workflow error.

* Still working on build and test workflow error. Remove Python 3.9 from workflows

* Set loglevel for OmicronConfig to Critical so --version command is clean

* Resolve all conversations

* try to deal with github error
```
Error: This request has been automatically failed because it uses a deprecated version of `actions/upload-artifact: v2`. Learn more: https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/

```

* try to deal with github error
```
Error: This request has been automatically failed because it uses a deprecated version of `actions/upload-artifact: v2`. Learn more: https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/

```

---------

Co-authored-by: Joseph Areeda <[email protected]>
  • Loading branch information
areeda and Joseph Areeda authored Sep 25, 2024
1 parent af50aad commit aeacc89
Show file tree
Hide file tree
Showing 8 changed files with 436 additions and 116 deletions.
11 changes: 5 additions & 6 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ jobs:
- macOS
- Ubuntu
python-version:
- "3.9"
- "3.10"
- "3.11"
runs-on: ${{ matrix.os }}-latest
Expand All @@ -49,12 +48,12 @@ jobs:

steps:
- name: Get source code
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Cache conda packages
uses: actions/cache@v2
uses: actions/cache@v4
env:
# increment to reset cache
CACHE_NUMBER: 0
Expand All @@ -64,7 +63,7 @@ jobs:
restore-keys: ${{ runner.os }}-conda-${{ matrix.python-version }}-

- name: Configure conda
uses: conda-incubator/setup-miniconda@v2
uses: conda-incubator/setup-miniconda@v3
with:
activate-environment: test
miniforge-variant: Mambaforge
Expand Down Expand Up @@ -111,14 +110,14 @@ jobs:
run: python -m coverage xml

- name: Publish coverage to Codecov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
files: coverage.xml
flags: Conda,${{ runner.os }},python${{ matrix.python-version }}

- name: Upload test results
if: always()
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v4
with:
name: pytest-conda-${{ matrix.os }}-${{ matrix.python-version }}
path: pytest.xml
24 changes: 17 additions & 7 deletions omicron/cli/archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def scandir(otrigdir):
def process_dir(dir_path, outdir, logger, keep_files):
"""
Copy all trigget files to appropriate directory
@param logger: program'sclogger
@param logger: program's logger
@param Path dir_path: input directory
@param Path outdir: top level output directory eg ${HOME}/triggers
@param boolean keep_files: Do not delete files after copying to archive
Expand All @@ -116,14 +116,17 @@ def process_dir(dir_path, outdir, logger, keep_files):
tspan = Segment(strt, strt + dur)

otrigdir = outdir / ifo / chan / str(int(strt / 1e5))

logger.debug(f'Trigger file:\n'
f' {tfile_path.name}\n'
f' ifo: [{ifo}], chan: [{chan}], strt: {strt}, duration: {dur} ext: [{ext}]\n'
f' outdir: {str(otrigdir.absolute())}')

if str(otrigdir.absolute()) not in dest_segs.keys():
dest_segs[str(otrigdir.absolute())] = scandir(otrigdir)

logger.debug(
f'ifo: [{ifo}], chan: [{chan}], strt: {strt}, ext: [{ext}] -> {str(otrigdir.absolute())}')

if dest_segs[str(otrigdir.absolute())].intersects_segment(tspan):
logger.warn(f'{tfile_path.name} ignored because it would overlap')
logger.warning(f'{tfile_path.name} ignored because it would overlap')
else:
otrigdir.mkdir(mode=0o755, parents=True, exist_ok=True)
shutil.copy(tfile, str(otrigdir.absolute()))
Expand All @@ -134,11 +137,14 @@ def process_dir(dir_path, outdir, logger, keep_files):


def main():
logging.basicConfig()
# global logger
log_file_format = "%(asctime)s - %(levelname)s - %(funcName)s %(lineno)d: %(message)s"
log_file_date_format = '%m-%d %H:%M:%S'
logging.basicConfig(format=log_file_format, datefmt=log_file_date_format)
logger = logging.getLogger(__process_name__)
logger.setLevel(logging.DEBUG)

home = os.getenv('HOME')
home = Path.home()
outdir_default = os.getenv('OMICRON_HOME', f'{home}/triggers')
parser = argparse.ArgumentParser(description=textwrap.dedent(__doc__),
formatter_class=argparse.RawDescriptionHelpFormatter,
Expand Down Expand Up @@ -169,6 +175,10 @@ def main():
else:
logger.setLevel(logging.DEBUG)

logger.debug("Command line args:")
for arg in vars(args):
logger.debug(f' {arg} = {str(getattr(args, arg))}')

indir = Path(args.indir)
outdir = Path(args.outdir)
if not outdir.exists():
Expand Down
137 changes: 137 additions & 0 deletions omicron/cli/omicron_post_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim: nu:ai:ts=4:sw=4

#
# Copyright (C) 2024 Joseph Areeda <[email protected]>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#

"""
The situation is that we run DAGs with many omicron jobs, some of which fail for data dependent reasons that
are valid and permanent but others are transient like network issues that could be resolved with a retry.
This program isun as a post script to allow us to retry the job but return a success code even if it fails
repeatedly so that the DAG completes.
"""
import textwrap
import time

start_time = time.time()

import argparse
import logging
from pathlib import Path
import sys
import traceback

try:
from ._version import __version__
except ImportError:
__version__ = '0.0.0'

__author__ = 'joseph areeda'
__email__ = '[email protected]'
__process_name__ = Path(__file__).name

logger = None


def parser_add_args(parser):
"""
Set up command parser
:param argparse.ArgumentParser parser:
:return: None but parser object is updated
"""
parser.add_argument('-v', '--verbose', action='count', default=1,
help='increase verbose output')
parser.add_argument('-V', '--version', action='version',
version=__version__)
parser.add_argument('-q', '--quiet', default=False, action='store_true',
help='show only fatal errors')
parser.add_argument('--return-code', help='Program return code')
parser.add_argument('--max-retry', help='condor max retry value')
parser.add_argument('--retry', help='current try starting at 0')
parser.add_argument('--log', help='Path for a copy of our logger output')


def main():
global logger

log_file_format = "%(asctime)s - %(levelname)s, %(pathname)s:%(lineno)d: %(message)s"
log_file_date_format = '%m-%d %H:%M:%S'
logging.basicConfig(format=log_file_format, datefmt=log_file_date_format)
logger = logging.getLogger(__process_name__)
logger.setLevel(logging.DEBUG)

epilog = textwrap.dedent("""
This progam is designed to be run as a post script in a Condor DAG. For available arguments see:
https://htcondor.readthedocs.io/en/latest/automated-workflows/dagman-scripts.html#special-script-argument-macros
A typical lne in the DAG might look like:
python omicron_post_script.py -vvv --return $(RETURN) --retry $(RETRY) --max-retry $(MAX_RETRIES) --log
<path_to_log>
""")

parser = argparse.ArgumentParser(description=__doc__, prog=__process_name__, epilog=epilog,
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser_add_args(parser)
args = parser.parse_args()
verbosity = 0 if args.quiet else args.verbose

if verbosity < 1:
logger.setLevel(logging.CRITICAL)
elif verbosity < 2:
logger.setLevel(logging.INFO)
else:
logger.setLevel(logging.DEBUG)

if args.log:
log = Path(args.log)
log.parent.mkdir(0o775, exist_ok=True, parents=True)
file_handler = logging.FileHandler(log, mode='a')
log_formatter = logging.Formatter(log_file_format, datefmt=log_file_date_format)
file_handler.setFormatter(log_formatter)
logger.addHandler(file_handler)

me = Path(__file__).name
logger.info(f'--------- Running {str(me)}')
# debugging?
logger.debug(f'{__process_name__} version: {__version__} called with arguments:')
for k, v in args.__dict__.items():
logger.debug(' {} = {}'.format(k, v))

ret = int(args.return_code)
retry = int(args.retry)
max_retry = int(args.max_retry)
ret = ret if retry < max_retry or ret == 0 else 0
logger.info(f'returning {ret}')
return ret


if __name__ == "__main__":
try:
ret = main()
except (ValueError, TypeError, OSError, NameError, ArithmeticError, RuntimeError) as ex:
print(ex, file=sys.stderr)
traceback.print_exc(file=sys.stderr)
ret = 21

if logger is None:
logging.basicConfig()
logger = logging.getLogger(__process_name__)
logger.setLevel(logging.DEBUG)
# report our run time
logger.info(f'Elapsed time: {time.time() - start_time:.1f}s')
sys.exit(ret)
Loading

0 comments on commit aeacc89

Please sign in to comment.