Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use task sessions in Core API [MD-509] #9860

Merged
merged 5 commits into from
Aug 23, 2024

Conversation

azhou-determined
Copy link
Contributor

@azhou-determined azhou-determined commented Aug 22, 2024

Ticket

Description

use task session tokens instead of user session tokens in the CoreContext and other internal tasks, since some experiments are long running and will expire within the 7 day user token deadline.

Test Plan

  • Make sure a normal experiment (mnist_pytorch for example) runs successfully
  • submit a script that checks the tokens being used in sessions:
import logging
import os

import determined as det
from determined import core


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG, format=det.LOG_FORMAT)
    with core.init() as core_context:
        core_session_token = core_context.train._session.token
        session_token_env = os.environ["DET_SESSION_TOKEN"]
        user_token_env = os.environ["DET_USER_TOKEN"]
        print(f"Token on `core.Context`: {core_context.train._session.token}")
        print(f"Task token environment variable: {os.environ['DET_SESSION_TOKEN']}")
        print(f"User token environment variable: {os.environ['DET_USER_TOKEN']}")
        assert core_session_token == session_token_env
        assert core_session_token != user_token_env

make sure that the session token is being used on the core context, and that it's not the user token

Checklist

  • Changes have been manually QA'd
  • New features have been approved by the corresponding PM
  • User-facing API changes have the "User-facing API Change" label
  • Release notes have been added as a separate file under docs/release-notes/
    See Release Note for details.
  • Licenses have been included for new code which was copied and/or modified from any external code

Copy link

netlify bot commented Aug 22, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit ab6b264
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/66c8d5e9f6939d0008dbd02e

Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 88.46154% with 6 lines in your changes missing coverage. Please review.

Project coverage is 54.71%. Comparing base (3a91552) to head (ab6b264).
Report is 9 commits behind head on main.

Files Patch % Lines
harness/determined/common/api/authentication.py 80.00% 2 Missing ⚠️
harness/determined/core/_context.py 0.00% 1 Missing ⚠️
harness/determined/exec/gc_checkpoints.py 0.00% 1 Missing ⚠️
harness/determined/exec/launch.py 0.00% 1 Missing ⚠️
harness/determined/exec/prep_container.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9860   +/-   ##
=======================================
  Coverage   54.71%   54.71%           
=======================================
  Files        1261     1261           
  Lines      155984   156013   +29     
  Branches     3589     3588    -1     
=======================================
+ Hits        85348    85367   +19     
- Misses      70504    70514   +10     
  Partials      132      132           
Flag Coverage Δ
backend 45.15% <ø> (-0.02%) ⬇️
harness 72.62% <88.46%> (+0.02%) ⬆️
web 54.47% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
harness/determined/common/api/__init__.py 100.00% <100.00%> (ø)
harness/determined/common/api/_session.py 83.84% <100.00%> (+0.38%) ⬆️
harness/determined/launch/deepspeed.py 91.41% <100.00%> (ø)
harness/determined/launch/horovod.py 94.50% <100.00%> (ø)
harness/tests/cli/test_auth.py 94.30% <100.00%> (+0.44%) ⬆️
harness/tests/launch/test_deepspeed.py 97.35% <100.00%> (ø)
harness/tests/launch/test_horovod.py 100.00% <100.00%> (ø)
harness/tests/launch/test_launch.py 100.00% <100.00%> (ø)
harness/tests/launch/test_torch_distributed.py 100.00% <100.00%> (ø)
harness/tests/launch/test_util.py 97.77% <100.00%> (+0.10%) ⬆️
... and 5 more

... and 4 files with indirect coverage changes

Comment on lines +245 to +248
session = authentication.login_from_task(
master_address=info.master_url,
cert=cert,
).with_retry(util.get_max_retries_config())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this still work for init of core api in detached mode? think we have to fall back to user auth when not in a task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think so? detached mode checks if it's on cluster or not, and if it is, it uses this login_from_task. if not, it uses the session from the sdk client. in _make_v2_context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, makes sense

@azhou-determined azhou-determined merged commit a55af74 into main Aug 23, 2024
82 of 95 checks passed
@azhou-determined azhou-determined deleted the core-context-session-tokens branch August 23, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants