From b5836e6b0bd45d42c9654b10fc85293bc8e2b91f Mon Sep 17 00:00:00 2001
From: Andrea Grillo <andrea.grillo@pagopa.it>
Date: Mon, 20 Nov 2023 12:11:05 +0100
Subject: [PATCH] docs: Improve GitHub Federated Identity module documentation
 (#181)

---
 github_federated_identity/README.md    | 68 +++++++++++++++++---------
 kubernetes_cluster_udr/01_main.tf      |  4 +-
 kubernetes_cluster_udr/99_variables.tf |  8 +--
 kubernetes_cluster_udr/README.md       |  3 +-
 4 files changed, 53 insertions(+), 30 deletions(-)
diff --git a/github_federated_identity/README.md b/github_federated_identity/README.md
index 7b3052f8..511ce653 100644
--- a/github_federated_identity/README.md
+++ b/github_federated_identity/README.md
@@ -1,40 +1,62 @@
-# GitHub Federated Identity for Azure
+# GitHub Federated Identity for Azure Module
 
-This module allows the creation of a User Managed Identity federated with GitHub. Module is intended to be used against `infrastructure` repo.
+This module creates User Managed Identities federated with one or more GitHub repositories in order to use a passwordless authentication model between GitHub and Azure.
+This module should only be used in `<product>-infra` repositories.
 
-The module's output contains the identity data.
+> more info about this approach on [Confluence page](https://pagopa.atlassian.net/wiki/spaces/Technology/pages/734527975/GitHub+OIDC+OP)
+
+For debugging purposes, you might be useful module's output containing the brand new identities' data.
+
+## Glossary
+
+- `<prefix>`: product name, such as `io` or `selfc`
+- `<shortenv>`: environment name in short form, such as `d` or `p`
+- `<domain>`: optional, the product sub area, such as `sign`
+- `<idrole>`: the role of the identity, it can be either `ci` or `cd`
+- `<repo>`: the repository name, such as `io-infra`
+- `<scope>`: the federation type, such as `environment` or `branch` or `tag`
+- `<subject>`: the federation scope value, such as `dev` (for environments) or `v1.0` (for tags)
+
+## Design
+
+To avoid the creation of tons of similar identities, each subscription should have a single resource group which contains two user managed identities, one for Continuos Integration and the other for Continuos Delivery/Deployment workflows. Each user managed identity is federated with one or more repositories and with one or more GitHub environments.
+
+This module expects to find an existing resource group named `<prefix>-<shortenv>-<domain>-identity-rg`. Then, it creates a [user managed identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp) in it using the naming convention `<prefix>-<shortenv>-<domain>-github-<idrole>-identity`; the `idrole` value is obtained from the input variable `identity_role` and can be either `ci` or `cd`. Finally, the variable `github_federations` defines the list of the repositories and GitHub environments to create a federation with. The federation output name uses the form `<prefix>-<shortenv>-<domain>-${var.app_name}-github"-<repo>-<scope>-<subject>`.
+
+> Consume this module once for each identity. You are likely to invoke the module twice then, one time for CI identity and one time for CD identity.
+
+### The need of two identities
+
+Two scenarios have been identified. The first one is the Continuos Integration, where usually the agent performs a dry run over the current infrastructure. Since there is no write operation involved, the `ci` identity doesn't need privileged roles such as `Owner` or `Contributor` but some fine grained reader role depending on the kind of resources involved in the repository - reader role of KeyVault's in a particular subscription. For this reason, the module defaults on a generic subscription-wide `Reader` role. This setting can be however overridden.
+
+On the other hand, the `cd` identity actually needs to write things, so the module defaults on a subscription-wide `Contributor` role, but that can be overridden too.
+
+> This approach allows developers to match the minimum privilege principle.
+
+At this point, it might be thought that having a pair of identities for each repository would be a convenient approach, and in an ideal world it is; however, having plenty of identities is a risk for the governability of the cloud and the clearness of the code, which may cause reading and comprehension difficulties.
 
 ## How to use it
 
-Use the Terraform template in `./tests` as template for testing and getting advices.
+The Terraform template in `./tests` folder can be used as an example or a template. It contains some documentation and guidance about variables and values. It is a good starting point.
+
+### Requirements
 
-### Before using it
+As stated in the [Design](#design) section, you must define a new resource group before invoking this module. Look at the `./tests` to get an example. Remember: the resource group name should match the naming convention `<prefix>-<shortenv>-<domain>-identity-rg`, where `domain` can be empty.
 
-Ensure to create a resource group by using the naming convention `<prefix>-<shortenv>-<domain>` (`domain` can be empty). Module search this resource group and if it is not found, a failure is thrown.
+If the resource group is not found, an exception is thrown.
 
 ### RBAC roles
 
-You should create an identity for CI and another one for CD scenarios. By default, CI identites only have `Reader` access on the subscription, meanwhile CDs have `Contributor` role. This can be customized according to your needs by adding or removing roles with subscription or resource group scopes. However, the minimum privilege principle should be followed.
+As explained in the [Design](#design) section, you should invoke the module twice - once for the `ci` identity and another one for the `cd` identity.
 
-### Identity management
+You can customize identities' IAM roles both at subscription and resource group level using the variables `cX_rbac_roles`. In particular, the variable accepts:
 
-Each domain should use a single resource group.
-Each domain should use a single pair of identity (CI+CD).
-Each identity should have a different federated credential for each repository and environment.
+- a list of roles to assign to the _current_ subscription
+- a dictionary of resource group names and list of roles
 
-Example:
-`prefix`: `azrmtest`
-`env_short`: `9`
-`domain`: ``
-`identity_role`: `ci`
-`github.repository`: `terraform-azurerm-v3`
-`app_name`: `messages`
-`credentials_scope`: `environment`
-`subject`: `dev-ci`
+> probably, module can be improved by using a single variable for RBAC roles instead of having two identicals.
 
-Resource group name: `azrmtest-9-identity-rg`
-Identity name: `azrmtest-9-github-ci-identity` and `azrmtest-9-github-cd-identity`
-Federated credential: `azrmtest-9-messages-github-terraform-azurerm-v3-messages-environment-dev-ci`
+This granularity is useful in such scenario where is needed a writing-role on the Storage Account which contains Terraform state files but at the same time reading-only permissions on the others Storage Accounts.
 
 <!-- markdownlint-disable -->
 <!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
diff --git a/kubernetes_cluster_udr/01_main.tf b/kubernetes_cluster_udr/01_main.tf
index cefbde4d..e7ef16ad 100644
--- a/kubernetes_cluster_udr/01_main.tf
+++ b/kubernetes_cluster_udr/01_main.tf
@@ -70,8 +70,8 @@ resource "azurerm_kubernetes_cluster" "this" {
     for_each = var.network_profile != null ? [var.network_profile] : []
     iterator = p
     content {
-      network_plugin = p.value.network_plugin
-      outbound_type  = p.value.outbound_type
+      network_plugin      = p.value.network_plugin
+      outbound_type       = p.value.outbound_type
       network_plugin_mode = p.value.network_plugin_mode
     }
   }
diff --git a/kubernetes_cluster_udr/99_variables.tf b/kubernetes_cluster_udr/99_variables.tf
index 813b8ded..be16f408 100644
--- a/kubernetes_cluster_udr/99_variables.tf
+++ b/kubernetes_cluster_udr/99_variables.tf
@@ -264,13 +264,13 @@ variable "api_server_authorized_ip_ranges" {
 
 variable "network_profile" {
   type = object({
-    network_plugin = string # e.g. 'azure'. Network plugin to use for networking. Currently supported values are azure and kubenet
-    outbound_type  = string # e.g. 'loadBalancer'. The outbound (egress) routing method which should be used for this Kubernetes Cluster. Possible values are loadBalancer, userDefinedRouting, managedNATGateway and userAssignedNATGateway. Defaults to loadBalancer
+    network_plugin      = string # e.g. 'azure'. Network plugin to use for networking. Currently supported values are azure and kubenet
+    outbound_type       = string # e.g. 'loadBalancer'. The outbound (egress) routing method which should be used for this Kubernetes Cluster. Possible values are loadBalancer, userDefinedRouting, managedNATGateway and userAssignedNATGateway. Defaults to loadBalancer
     network_plugin_mode = string
   })
   default = {
-    network_plugin = "azure"
-    outbound_type  = "userDefinedRouting"
+    network_plugin      = "azure"
+    outbound_type       = "userDefinedRouting"
     network_plugin_mode = "Overlay"
   }
   description = "See variable description to understand how to use it, and see examples"
diff --git a/kubernetes_cluster_udr/README.md b/kubernetes_cluster_udr/README.md
index 08a72fc9..76233ffd 100644
--- a/kubernetes_cluster_udr/README.md
+++ b/kubernetes_cluster_udr/README.md
@@ -702,7 +702,7 @@ No modules.
 | <a name="input_log_analytics_workspace_id"></a> [log\_analytics\_workspace\_id](#input\_log\_analytics\_workspace\_id) | The ID of the Log Analytics Workspace which the OMS Agent should send data to. | `string` | `null` | no |
 | <a name="input_microsoft_defender_log_analytics_workspace_id"></a> [microsoft\_defender\_log\_analytics\_workspace\_id](#input\_microsoft\_defender\_log\_analytics\_workspace\_id) | Specifies the ID of the Log Analytics Workspace where the audit logs collected by Microsoft Defender should be sent to | `string` | `null` | no |
 | <a name="input_name"></a> [name](#input\_name) | (Required) Cluster name | `string` | n/a | yes |
-| <a name="input_network_profile"></a> [network\_profile](#input\_network\_profile) | See variable description to understand how to use it, and see examples | <pre>object({<br>    network_plugin = string # e.g. 'azure'. Network plugin to use for networking. Currently supported values are azure and kubenet<br>    outbound_type  = string # e.g. 'loadBalancer'. The outbound (egress) routing method which should be used for this Kubernetes Cluster. Possible values are loadBalancer, userDefinedRouting, managedNATGateway and userAssignedNATGateway. Defaults to loadBalancer<br>  })</pre> | <pre>{<br>  "network_plugin": "azure",<br>  "outbound_type": "userDefinedRouting"<br>}</pre> | no |
+| <a name="input_network_profile"></a> [network\_profile](#input\_network\_profile) | See variable description to understand how to use it, and see examples | <pre>object({<br>    network_plugin      = string # e.g. 'azure'. Network plugin to use for networking. Currently supported values are azure and kubenet<br>    outbound_type       = string # e.g. 'loadBalancer'. The outbound (egress) routing method which should be used for this Kubernetes Cluster. Possible values are loadBalancer, userDefinedRouting, managedNATGateway and userAssignedNATGateway. Defaults to loadBalancer<br>    network_plugin_mode = string<br>  })</pre> | <pre>{<br>  "network_plugin": "azure",<br>  "network_plugin_mode": "Overlay",<br>  "outbound_type": "userDefinedRouting"<br>}</pre> | no |
 | <a name="input_outbound_ip_address_ids"></a> [outbound\_ip\_address\_ids](#input\_outbound\_ip\_address\_ids) | The ID of the Public IP Addresses which should be used for outbound communication for the cluster load balancer. | `list(string)` | `[]` | no |
 | <a name="input_private_cluster_enabled"></a> [private\_cluster\_enabled](#input\_private\_cluster\_enabled) | (Optional) Provides a Private IP Address for the Kubernetes API on the Virtual Network where the Kubernetes Cluster is located. | `bool` | `false` | no |
 | <a name="input_rbac_enabled"></a> [rbac\_enabled](#input\_rbac\_enabled) | Is Role Based Access Control Enabled? | `bool` | `true` | no |
@@ -742,6 +742,7 @@ No modules.
 | <a name="input_user_node_pool_vm_size"></a> [user\_node\_pool\_vm\_size](#input\_user\_node\_pool\_vm\_size) | (Required) The size of the Virtual Machine, such as Standard\_B4ms or Standard\_D4s\_vX. See https://pagopa.atlassian.net/wiki/spaces/DEVOPS/pages/134840344/Best+practice+su+prodotti | `string` | n/a | yes |
 | <a name="input_vnet_id"></a> [vnet\_id](#input\_vnet\_id) | (Required) Virtual network id, where the k8s cluster is deployed. | `string` | n/a | yes |
 | <a name="input_vnet_subnet_id"></a> [vnet\_subnet\_id](#input\_vnet\_subnet\_id) | (Optional) The ID of a Subnet where the Kubernetes Node Pool should exist. Changing this forces a new resource to be created. | `string` | `null` | no |
+| <a name="input_vnet_user_subnet_id"></a> [vnet\_user\_subnet\_id](#input\_vnet\_user\_subnet\_id) | (Optional) The ID of a Subnet where the Kubernetes Node Pool should exist. Changing this forces a new resource to be created. | `string` | `null` | no |
 
 ## Outputs