Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core key vault firewall should not be set to "Allow public access from all networks" #4260

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jonnyry
Copy link
Collaborator

@jonnyry jonnyry commented Jan 7, 2025

Resolves #4250

What is being addressed

  • Changes the core key vault firewall from Allow public access from all networks to Allow public access from specific virtual networks and IP addresses
  • Adds an IP exception to the key vault firewall for the deployment machine's internet IP (or the PUBLIC_DEPLOYMENT_IP_ADDRESS variable if set) during deployment
  • Removes the IP exception at the end of deployment (whether deployment succeeds or fails)

How is this addressed

  • Two new scripts add and remove the keyvault deployment IP exception:
    • devops/scripts/kv_add_network_exception.sh
    • devops/scripts/kv_remove_network_exception.sh
  • They are called from the following scenarios in order to provider access to KV:
    • core/terraform/deploy.sh
    • core/terraform/scripts/letsencrypt.sh
    • devops/scripts/destroy_env_no_terraform.sh
    • core/terraform/destroy.sh
    • devops/scripts/key_vault_list.sh
    • devops/scripts/set_contributor_sp_secrets.sh
  • The remove script uses a bash trap so that it runs regardless of whether the preceeding code fails or not, to ensure the IP exception is removed

A bug in azurerm provider was encountered which required the use of a terraform provisioner:

  1. A create provisioner on azurerm_key_vault was required to work around an azurerm provider bug which means if a key vault is being re-created (it was previously soft deleted), the network acls are not updated. This can be removed when the bug is fixed, or a different workaround found.

Updates since inital commit (as discussed with @marrobi):

  1. Remove use of tags and null provisioner to add tag.

Copy link

github-actions bot commented Jan 7, 2025

Unit Test Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit e9833c4.

♻️ This comment has been updated with latest results.

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 7, 2025

/test 8af920d

Copy link

github-actions bot commented Jan 7, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12660338621 (with refid 26f9d939)

(in response to this comment from @jonnyry)

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 7, 2025

/test-extended 8af920d

Copy link

github-actions bot commented Jan 7, 2025

🤖 pr-bot 🤖

🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/12661150197 (with refid 26f9d939)

(in response to this comment from @jonnyry)

#
resource "null_resource" "add_deployment_tag" {
triggers = {
always_run = timestamp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this always need to run? Once it's added once, it shouldn't get removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was so if the tag is removed in Azure, it will always be readded.

However as discussed, have removed the use of tags altogether, so the provisioner has been removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the Storage Account rules in this script be handles the same way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes certainly - planning to have a look at storage accounts after this.

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test-destroy-env

Copy link

github-actions bot commented Jan 8, 2025

Destroying PR test environment (RG: rg-tre26f9d939)... (run: https://github.com/microsoft/AzureTRE/actions/runs/12669260987)

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test 2970a5d

Copy link

github-actions bot commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12669597448 (with refid 26f9d939)

(in response to this comment from @jonnyry)

@jonnyry jonnyry force-pushed the jr/upstream-main/93-close-keyvault-firewall branch from 2970a5d to dcb0b8f Compare January 8, 2025 12:00
@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test 272589f

Copy link

github-actions bot commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12670289419 (with refid 26f9d939)

(in response to this comment from @jonnyry)

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test bf9fd32

Copy link

github-actions bot commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12670349633 (with refid 26f9d939)

(in response to this comment from @jonnyry)

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test-destroy-env

Copy link

github-actions bot commented Jan 8, 2025

Destroying PR test environment (RG: rg-tre26f9d939)... (run: https://github.com/microsoft/AzureTRE/actions/runs/12670413797)

@jonnyry jonnyry force-pushed the jr/upstream-main/93-close-keyvault-firewall branch from bf9fd32 to dcb0b8f Compare January 8, 2025 12:25
Copy link

github-actions bot commented Jan 8, 2025

PR test environment destroy complete (RG: rg-tre26f9d939)

@jonnyry jonnyry requested a review from tamirkamara January 8, 2025 13:04
@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test dcb0b8f

Copy link

github-actions bot commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12671159667 (with refid 26f9d939)

(in response to this comment from @jonnyry)

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test dcb0b8f

Copy link

github-actions bot commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12671848713 (with refid 26f9d939)

(in response to this comment from @jonnyry)

CHANGELOG.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to go over this the next few days and guess @marrobi is the same. Just wanted to point out we now have 2 vaults being used from the deployer point of view.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When CMK is enabled another vault is created in the mgmt resource group

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12671848713 (with refid 26f9d939)

(in response to this comment from @jonnyry)

Notes on test run starting with an empty environment:

KV exception added here:

https://github.com/microsoft/AzureTRE/actions/runs/12671848713/job/35314921879#step:3:432

Adding deployment network exception to key vault kv-***...
 Core resource group rg-*** not found

KV exception removed here:

https://github.com/microsoft/AzureTRE/actions/runs/12671848713/job/35314921879#step:3:8259

Removing deployment network exception to key vault kv-***...
 Deployment network exception removed

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

/test 135be76

Copy link

github-actions bot commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12674163834 (with refid 26f9d939)

(in response to this comment from @jonnyry)

@jonnyry
Copy link
Collaborator Author

jonnyry commented Jan 8, 2025

🤖 pr-bot 🤖

🏃 Running tests: https://github.com/microsoft/AzureTRE/actions/runs/12674163834 (with refid 26f9d939)

(in response to this comment from @jonnyry)

Notes on test run starting with an existing TRE:

KV exception added here:

https://github.com/microsoft/AzureTRE/actions/runs/12674163834/job/35322577601#step:3:456

 Adding deployment network exception to key vault kv-***...
 Keyvault kv-*** is now accessible

KV exception removed here:

 Removing deployment network exception to key vault kv-***...
 Deployment network exception removed

https://github.com/microsoft/AzureTRE/actions/runs/12674163834/job/35322577601#step:3:1181

CHANGELOG.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a pending ask to enable the deployer to access resources such as these keyvaults over private network only.

  1. Will this make this approach obsolete?
  2. If not, this means we will need to do all of this in a conditional way. Right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That would mean all TRE deployers would need to switch to private self hosted runners right? If so, yes this would be obsolete.
  2. If we want to support both deployment patterns - deployment from GitHub hosted runners + deployment from private self hosted runners with KV set to private networking - then yes we'd need to do it conditionally (in order to prevent the KV from being fully public).

I guess it depends on whether implementing keyvaults private networking only means switching off the ability to use Github hosted runners - is that the plan?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal I'm referring to that resources will be accessed via private network only which means private runners.

The question for you, is weather this PR comes from a similar place but doesn't go as far yet and just limits which public IPs can access. Or, in a situation where private agents / network is done, will you still need this method of limiting public IPs

Copy link
Collaborator Author

@jonnyry jonnyry Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep when the codebase switches to deployment from private runners ONLY, then this change won't be needed anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, in your deployment/usecase you would also like to use private runners?

Copy link
Collaborator

@tamirkamara tamirkamara Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just considering most orgs that are concerned with this might want to go all the way. This solution might not be enough in those cases.
Also considering the number of changes you had to do just for this resource... next up will be the ACR, storage account and yet another keyvault in the mgmt resource group. All in all it might not be worth the investment and complexities if the end goal is anyway to go private agents...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree private runners should be the objective, but when are they scheduled to be implemented? Is this a reasonable stop gap until then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to ensure the solution can still be deployed without private runners. Most orgs start out testing out the solution etc, and don't have the infrastructure for private runners.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that we might not want to offer this middle way of opening the deployer IP in all currently defined public resources due to the complexities of supporting all the resources I mentioned above.
It might be we offer public like today and secured via private runners. Testing out could be done via the current way and if you want to be secure then it would mean you need private agents.

Copy link
Collaborator Author

@jonnyry jonnyry Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting all resources might be too much (e.g. the ACR) but the KV is one that stands out as worthy of tightening network access on; though I understand that private runners will supersede this once implemented.

@microsoft microsoft deleted a comment from github-actions bot Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Core key vault firewall should not be set to "Allow public access from all networks"
3 participants