-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reject reconnecting agents with different resource pool configuration #9815
Changes from 13 commits
d3caacd
b4a24f7
61384cf
2a4d4f2
8076a1b
cbf6884
47bea5b
d93bb75
556220b
b3eb2d0
69920f5
27825d2
178a096
4877be6
5342281
6af1bd5
f329db4
2cbf48f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
stages: | ||
- db: | ||
name: db | ||
port: 5434 | ||
|
||
- master: | ||
pre: | ||
- sh: make -C tools prep-root | ||
config_file: | ||
security: | ||
initial_user_password: $INITIAL_USER_PASSWORD | ||
db: | ||
host: localhost | ||
port: 5434 | ||
password: postgres | ||
user: postgres | ||
name: determined | ||
__internal: | ||
preemption_timeout: 60s | ||
checkpoint_storage: | ||
type: shared_fs | ||
host_path: /tmp | ||
storage_path: determined-cp | ||
log: | ||
level: debug | ||
root: tools/build | ||
cache: | ||
cache_dir: /tmp/determined-cache | ||
launch_error: false | ||
telemetry: | ||
enabled: false | ||
resource_manager: | ||
default_aux_resource_pool: default | ||
default_compute_resource_pool: default | ||
scheduler: | ||
fitting_policy: best | ||
type: fair_share | ||
type: agent | ||
resource_pools: | ||
- agent_reattach_enabled: true | ||
agent_reconnect_wait: 25s | ||
description: '' | ||
max_aux_containers_per_agent: 100 | ||
pool_name: default | ||
provider: null | ||
task_container_defaults: null | ||
- pool_name: pool1 | ||
max_slots: 8 | ||
scheduler: | ||
type: priority | ||
scim: | ||
enabled: true | ||
auth: | ||
type: basic | ||
username: determined | ||
password: password | ||
|
||
- custom: | ||
name: proxy | ||
cmd: ["socat", "-d", "-d", "TCP-LISTEN:8081,reuseaddr,fork", "TCP:localhost:8080"] | ||
post: | ||
- conncheck: | ||
port: 8081 | ||
|
||
- agent: | ||
name: agent1 | ||
config_file: | ||
master_host: 127.0.0.1 | ||
master_port: 8081 | ||
agent_id: agent1 | ||
container_master_host: $DOCKER_LOCALHOST | ||
agent_reconnect_attempts: 24 | ||
agent_reconnect_backoff: 5 | ||
container_auto_remove_disabled: true | ||
hooks: | ||
on_connection_lost: ["touch", "/tmp/agent1-connection-lost"] | ||
|
||
|
||
- agent: | ||
name: agent2 | ||
config_file: | ||
master_host: 127.0.0.1 | ||
master_port: 8081 | ||
agent_id: agent2 | ||
container_master_host: $DOCKER_LOCALHOST | ||
agent_reconnect_attempts: 24 | ||
agent_reconnect_backoff: 5 | ||
container_auto_remove_disabled: true | ||
|
||
- agent: | ||
name: agent10 # Copy of agent1, but with different resource pool. | ||
config_file: | ||
master_host: 127.0.0.1 | ||
master_port: 8081 | ||
agent_id: agent1 | ||
container_master_host: $DOCKER_LOCALHOST | ||
agent_reconnect_attempts: 24 | ||
agent_reconnect_backoff: 5 | ||
container_auto_remove_disabled: true | ||
resource_pool: pool1 | ||
|
||
- agent: | ||
name: agent20 # Copy of agent1, but with empty(default) resource pool. | ||
config_file: | ||
master_host: 127.0.0.1 | ||
master_port: 8081 | ||
agent_id: agent1 | ||
container_master_host: $DOCKER_LOCALHOST | ||
agent_reconnect_attempts: 24 | ||
agent_reconnect_backoff: 5 | ||
container_auto_remove_disabled: true |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4512,6 +4512,20 @@ workflows: | |
extra-pytest-flags: "--no-compare-stats" | ||
collect-det-job-logs: false | ||
|
||
- test-e2e: | ||
name: test-e2e-managed-devcluster-resource-pools | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Option 1 (DONE): Create a separate E2E test for the .circleci/devcluster/multi-resource-pools.devcluster.yaml config file to allow for broader test coverage in the future. In both options, the overall execution time of the CircleCI test-e2e runs remains unaffected since they run in parallel with other E2E tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think option 1 is good for now, when we need broader coverage we can decide how to split things up! |
||
context: | ||
- dev-ci-cluster-default-user-credentials | ||
requires: | ||
- build-go-ee | ||
parallelism: 4 | ||
resource-class: large | ||
tf2: true | ||
mark: managed_devcluster_resource_pools | ||
managed-devcluster: true | ||
extra-pytest-flags: "--no-compare-stats" | ||
collect-det-job-logs: false | ||
|
||
- test-e2e: | ||
name: test-e2e-multi-k8s | ||
context: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,7 @@ log: | |
log: | ||
level: trace | ||
color: true | ||
resource_pool: default | ||
slot_type: auto | ||
security: | ||
tls: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to configure your editor so it does not remove newlines at the end of files. We can afford the extra byte of data. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! I will configure my editor better. But surprisingly this .yaml file didn't have a trailing new line in the end of it, nor did I add/remove any. Will see what is wrong and correct it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use VS Code, there's this setting you can toggle:
For more context, this is a POSIX standard thing: all lines should be terminated with a newline. Generally, this means that files' last line should have a
\n
at the end as well. Git and Github show us when the file does not comply with the standard.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!