From f631b818c22f1a503382276b53c50d60a0ea0be7 Mon Sep 17 00:00:00 2001 From: Mohamed Chiheb Ben Jemaa Date: Mon, 30 Dec 2024 17:16:01 +0100 Subject: [PATCH] Write proposal for multi cluster deployment --- docs/proposals/multi-proxmox-clusters.md | 286 +++++++++++++---------- 1 file changed, 168 insertions(+), 118 deletions(-) diff --git a/docs/proposals/multi-proxmox-clusters.md b/docs/proposals/multi-proxmox-clusters.md index 1b6e35d2..8c1475d0 100644 --- a/docs/proposals/multi-proxmox-clusters.md +++ b/docs/proposals/multi-proxmox-clusters.md @@ -21,28 +21,26 @@ status: experimental - [Glossary](#glossary) - [Summary](#summary) - [Motivation](#motivation) - - [Goals](#goals) - - [Non-Goals/Future Work](#non-goalsfuture-work) + - [Goals](#goals) + - [Non-Goals/Future Work](#non-goalsfuture-work) - [Proposal](#proposal) - - [User Stories](#user-stories) - - [Story 1](#story-1) - - [Story 2](#story-2) - - [Requirements (Optional)](#requirements-optional) - - [Functional Requirements](#functional-requirements) - - [FR1](#fr1) - - [FR2](#fr2) - - [Non-Functional Requirements](#non-functional-requirements) - - [NFR1](#nfr1) - - [NFR2](#nfr2) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) - - [Security Model](#security-model) - - [Risks and Mitigations](#risks-and-mitigations) + - [ProxmoxCluster CRD](#proxmoxcluster-crd) + - [ProxmoxMachine CRD](#proxmoxmachine-crd) + - [Proxmox Client](#proxmox-client) + - [The client interface](#the-client-interface) + - [The client implementation](#the-client-implementation) + - [Security Model](#security-model) + - [Risks and Mitigations](#risks-and-mitigations) - [Alternatives](#alternatives) - [Upgrade Strategy](#upgrade-strategy) - [Additional Details](#additional-details) - - [Test Plan [optional]](#test-plan-optional) - - [Graduation Criteria [optional]](#graduation-criteria-optional) - - [Version Skew Strategy [optional]](#version-skew-strategy-optional) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) - [Implementation History](#implementation-history) @@ -50,6 +48,7 @@ status: experimental ## Glossary - **CAPMOX**: Cluster API Provider for Proxmox +- **Proxmox**: Proxmox Virtual Environment ## Summary @@ -64,166 +63,217 @@ This proposal aims to enable provisioning Kubernetes clusters on multiple proxmo ## Motivation -This section is for explicitly listing the motivation, goals and non-goals of this proposal. +Proxmox is a great virtualization platform for running Kubernetes clusters, as it provides a simple and easy way to manage virtual machines. +However, it is not possible to provision a single Kubernetes cluster on multiple proxmox clusters. -- Describe why the change is important and the benefits to users. -- The motivation section can optionally provide links to [experience reports](https://github.com/golang/go/wiki/ExperienceReports) - to demonstrate the interest in a proposal within the wider Kubernetes community. +For the following reasons, it is important to enable provisioning Kubernetes clusters on multiple proxmox clusters: + +- **Resource Utilization**: By provisioning a single Kubernetes cluster on multiple proxmox clusters, we can better utilize the resources of the proxmox clusters. +- **High Availability**: By provisioning a single Kubernetes cluster on multiple proxmox clusters, we can achieve high availability. ### Goals -- List the specific high-level goals of the proposal. -- How will we know that this has succeeded? +- To design a solution for provisioning Kubernetes Clusters on multiple proxmox clusters. +- To distribute machines to various proxmox clusters. +- To enable high availability of Kubernetes clusters by provisioning them on multiple proxmox clusters. ### Non-Goals/Future Work -- What high-levels are out of scope for this proposal? -- Listing non-goals helps to focus discussion and make progress. +- To manage the lifecycle of the Proxmox clusters instances. +- To manage the security of the Proxmox clusters instances. +- To maintain the state of the Proxmox clusters instances. +- To provide the credentials for the Proxmox clusters instances. ## Proposal -This is where we get down to the nitty gritty of what the proposal actually is. - -- What is the plan for implementing this feature? -- What data model changes, additions, or removals are required? -- Provide a scenario, or example. -- Use diagrams to communicate concepts, flows of execution, and states. +This document introduces the concept of a Multi Proxmox Clusters provisioning, a new way to provision HA clusters. +This feature require the user to set a specific credentials and configuration in the Secret and ProxmoxCluster. -[PlantUML](http://plantuml.com) is the preferred tool to generate diagrams, -place your `.plantuml` files under `images/` and run `make diagrams` from the docs folder. ### User Stories -- Detail the things that people will be able to do if this proposal is implemented. -- Include as much detail as possible so that people can understand the "how" of the system. -- The goal here is to make this feel real for users without getting bogged down. - #### Story 1 +As a developer or Cluster operator, I would like to Provision K8s Cluster on multiple Proxmox Clusters, +so that I can better utilize the resources of the proxmox clusters. + #### Story 2 -### Requirements (Optional) +As a developer or Cluster operator, I would like CAPMOX to expose configurations to enable provisioning Kubernetes clusters on multiple proxmox clusters. + +#### Story 3 + +As a developer or Cluster operator, I would like CAPMOX to use a specific schema for Proxmox credentials that is used to provision the Kubernetes cluster on multiple proxmox clusters. + +#### Implementation Details/Notes/Constraints + +The proposed solution is designed to make the provision of workload clusters as easy as possible while keeping backward compatible with the existing CAPMOX. -Some authors may wish to use requirements in addition to user stories. -Technical requirements should derived from user stories, and provide a trace from -use case to design, implementation and test case. Requirements can be prioritised -using the MoSCoW (MUST, SHOULD, COULD, WON'T) criteria. +##### ProxmoxCluster CRD -The FR and NFR notation is intended to be used as cross-references across a CAEP. +We propose to introduce a new field `spec.settings` in ProxmoxCluster CRD. +The goal of this field is to determine which mode to use for provisioning the Kubernetes cluster. + + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 +kind: ProxmoxCluster +metadata: + name: test +spec: + settings: + mode: Default + instances: + - name: pmox-txl + nodes: [pmox-txl-1, pmox-txl-2] + template: + sourceNode: pmox-txl-1 + templateID: 1000 + credentialsRef: + name: "pmox-txl-proxmox-credentials" + - name: pmox-fra + nodes: [pmox-fra-1, pmox-fra-2, pmox-fra-3] + template: + templateSelector: + matchTags: + - capi + - v1.30.5 + credentialsRef: + name: "pmox-fra-proxmox-credentials" + controlPlaneEndpoint: + host: 10.0.0.10 + port: 6443 + ipv4Config: + addresses: [10.0.0.11-10.0.0.50] + prefix: 24 + gateway: 10.0.0.1 + dnsServers: [10.0.0.1] + allowedNodes: [] + credentialsRef: + name: "test-proxmox-credentials" +``` -The difference between goals and requirements is that between an executive summary -and the body of a document. Each requirement should be in support of a goal, -but narrowly scoped in a way that is verifiable or ideally - testable. +The `spec.settings.mode` field will have the following values: +* `Default`: This is the default mode. It will provision the Kubernetes cluster on a single proxmox cluster. +* `SingleInstance`: This mode will provision the Kubernetes cluster on a single proxmox cluster. +* `MultiInstance`: This mode will provision the Kubernetes cluster on multiple proxmox clusters. -#### Functional Requirements +The `spec.settings.instances` field will have the list of proxmox clusters to provision the Kubernetes cluster on. -Functional requirements are the properties that this design should include. +* `spec.settings.mode=Default`: if this mode is set and the `spec.settings.instances` field is not set, the Kubernetes cluster will be provisioned on the default proxmox cluster, + and when instances is set it will provision the Kubernetes cluster on the first proxmox instance. -##### FR1 +* `spec.settings.mode=SingleInstance`: if this mode is set, the Kubernetes cluster will be provisioned on a single proxmox cluster chosen randomly from the list of proxmox clusters in the `spec.settings.instances` field. -##### FR2 +* `spec.settings.mode=MultiInstance`: if this mode is set, the Kubernetes cluster will be provisioned on multiple proxmox clusters from the list of proxmox clusters in the `spec.settings.instances` field. -#### Non-Functional Requirements +Notes: -Non-functional requirements are user expectations of the solution. Include -considerations for performance, reliability and security. +* `spec.allowedNodes` can be deprecated and have no effect. +* `spec.credentialsRef` will be used only for the default proxmox cluster (can be deprecated in future). +* If no `mode` is set the Cluster will be provisioned on the default proxmox cluster. -##### NFR1 +This will make sure to distribute the machines to the different proxmox clusters. -##### NFR2 +##### ProxmoxMachine CRD -### Implementation Details/Notes/Constraints +First, in order to enable multi cluster deployment, we need the feature of selecting the template by tags https://github.com/ionos-cloud/cluster-api-provider-proxmox/pull/343 +The tags, should match tags set on VM templates in Proxmox for all instances. -- What are some important details that didn't come across above. -- What are the caveats to the implementation? -- Go in to as much detail as necessary here. -- Talk about core concepts and how they relate. +In order to have a way of manually setting an instance in machine level for example, when using a MachineDeployment. +we can add the field `instance` to the ProxmoxMachine CRD. -### Security Model +```yaml +kind: ProxmoxMachineTemplate +apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1 +metadata: + name: "test-worker-1" +spec: + template: + spec: + templateSelector: + matchTags: + - capi + - v1.30.5 + instance: pmox-txl +``` -Document the intended security model for the proposal, including implications -on the Kubernetes RBAC model. Questions you may want to answer include: +##### Proxmox Client -* Does this proposal implement security controls or require the need to do so? - * If so, consider describing the different roles and permissions with tables. -* Are their adequate security warnings where appropriate (see https://adam.shostack.org/ReederEtAl_NEATatMicrosoft.pdf for guidance). -* Are regex expressions going to be used, and are their appropriate defenses against DOS. -* Is any sensitive data being stored in a secret, and only exists for as long as necessary? +The new model of having multiple proxmox clusters will require changes in the Proxmox client to support multiple proxmox clusters. -### Risks and Mitigations +###### The client interface -- What are the risks of this proposal and how do we mitigate? Think broadly. -- How will UX be reviewed and by whom? -- How will security be reviewed and by whom? -- Consider including folks that also work outside the SIG or subproject. +The client interface will stay the same, any required changes will be adapted. -## Alternatives +###### The client implementation -The `Alternatives` section is used to highlight and record other possible approaches to delivering the value proposed by a proposal. +The api client struct will now have a map of clients instead of a single client. +The map will have the name of the proxmox cluster as the key and the client as the value. -## Upgrade Strategy +```go +// APIClient Proxmox API client object. +type APIClient struct { + clients map[string]*proxmox.Client + logger logr.Logger +} +``` + +* In default mode, the client will use the default client key. + +The function `NewAPIClient()` will be initialized in the controller. -If applicable, how will the component be upgraded? Make sure this is in the test plan. +The controller will determine which instance to use and provide the name to the APIClient. +The APIClient functions should have a param that specify which instance to use, based on instance name. -Consider the following in developing an upgrade strategy for this enhancement: -- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to keep previous behavior? -- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to make use of the enhancement? +```go -## Additional Details +// selectNextInstance selects a client based on the request. +func (c *APIClient) CloneVM(, templateID int, clone capmox.VMCloneRequest, instance string) (capmox.VMCloneResponse, error) { + +} +``` -### Test Plan [optional] -**Note:** *Section not required until targeted at a release.* +**State** +The proxmox instance will be saved in ProxmoxMachine as `spec.instance`. +The state of distributing the machines will be saved in the ProxmoxCluster as `status.instances`. -Consider the following in developing a test plan for this enhancement: -- Will there be e2e and integration tests, in addition to unit tests? -- How will it be tested in isolation vs with other components? -No need to outline all of the test cases, just the general strategy. -Anything that would count as tricky in the implementation and anything particularly challenging to test should be called out. +### Security Model -All code is expected to have adequate tests (eventually with coverage expectations). -Please adhere to the [Kubernetes testing guidelines][testing-guidelines] when drafting this test plan. +This proposal will add a new fields to ProxmoxCluster and ProxmoxMachine CRDs. +The permissions will stay the same. +This feature will also may require the user to manage multiple secrets. -[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md +### Risks and Mitigations -### Graduation Criteria [optional] +* The new feature introduce new secrets for proxmox credentials. -**Note:** *Section not required until targeted at a release.* +## Alternatives -Define graduation milestones. +The alternative is to use the existing CAPMOX and provision the Kubernetes cluster on a single proxmox cluster. -These may be defined in terms of API maturity, or as something else. Initial proposal should keep -this high-level with a focus on what signals will be looked at to determine graduation. +## Upgrade Strategy -Consider the following in developing the graduation criteria for this enhancement: -- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels] -- [Deprecation policy][deprecation-policy] +since there are changes within the API and/or deprecations, we need to create a new version. +`v1alpha2` or `v1alpha3` will be the new version of the ProxmoxCluster and ProxmoxMachine CRDs. -Clearly define what graduation means by either linking to the [API doc definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning), -or by redefining what graduation means. +CAPMOX will support multiple versions at the same time. +only when the end of life of the the new version must be communicated with the users. -In general, we try to use the same stages (alpha, beta, GA), regardless how the functionality is accessed. +## Additional Details -[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions -[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ +### Test Plan -### Version Skew Strategy [optional] +- Unit test coverage for the Proxmox instance selection and distributions. +- E2e tests if possible, to test the distribution of the machines to the different proxmox clusters. -If applicable, how will the component handle version skew with other components? What are the guarantees? Make sure -this is in the test plan. +### Graduation Criteria -Consider the following in developing a version skew strategy for this enhancement: -- Does this enhancement involve coordinating behavior in the control plane and in the kubelet? How does an n-2 kubelet without this feature available behave when this feature is used? -- Will any other components on the node change? For example, changes to CSI, CRI or CNI may require updating that component before the kubelet. +This feature will be released in a new version CRD `v1alpha3` to CAPMOX. ## Implementation History -- [ ] MM/DD/YYYY: Proposed idea in an issue or [community meeting] -- [ ] MM/DD/YYYY: Compile a Google Doc following the CAEP template (link here) -- [ ] MM/DD/YYYY: First round of feedback from community -- [ ] MM/DD/YYYY: Present proposal at a [community meeting] -- [ ] MM/DD/YYYY: Open proposal PR - - -[community meeting]: https://docs.google.com/document/d/1ushaVqAKYnZ2VN_aa3GyKlS4kEd6bSug13xaXOakAQI/edit#heading=h.pxsq37pzkbdq +- [ ] 2024-12-19: Open proposal PR +- [ ] MM/DD/YYYY: Present proposal at a community meeting