Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCP support #144

Merged
merged 1 commit into from
Dec 21, 2024
Merged

Add GCP support #144

merged 1 commit into from
Dec 21, 2024

Conversation

anson627
Copy link
Contributor

This pull request includes several changes to the Makefile, Go source files, and documentation, primarily focusing on adding new functionality, updating configurations, and improving documentation. The most important changes are grouped by theme below:

Makefile Enhancements:

  • Added a new image-clean target to remove Docker images.
  • Updated the clean target to remove additional binaries specified by CONTRIB_BINARIES.

Go Source File Updates:

  • Introduced the ApplyPriorityLevelConfiguration function in cmd/kperf/commands/utils/helper.go to apply Kubernetes PriorityLevelConfiguration using kubectl.
  • Updated cmd/kperf/commands/virtualcluster/nodepool.go to call the new ApplyPriorityLevelConfiguration function.

Documentation Improvements:

  • Added instructions for obtaining KubeConfig for Azure, AWS, and GCP in docs/getting-started.md.
  • Updated runner group specifications and example commands in docs/getting-started.md to reflect current configurations and image versions. [1] [2]
  • Revised benchmark scenario descriptions and options in docs/runkperf.md to reflect updated configurations and image versions. [1] [2] [3]

Helm Chart Updates:

  • Added labels and annotations to Helm templates for FlowSchema in manifests/runnergroup/server/templates/flowcontrol.yaml and manifests/virtualcluster/nodecontrollers/templates/flowcontrol.yaml. [1] [2]

@anson627
Copy link
Contributor Author

tested

 runkperf -v 3 bench   --kubeconfig $HOME/.kube/config   --runner-image ghcr.io/azure/kperf:
0.1.8   node10_job1_pod100 --total 1000
I1221 21:54:01.124231 2743920 utils.go:287] Fetching apiserver's cores
I1221 21:54:01.124286 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config cluster-info
E1221 21:54:01.301342 2743920 utils.go:322] "failed to get cores" err="failed to get metrics for ip 35.196.245.108: failed to create a new mount namespace: operation not permitted" ip="35.196.245.108"
I1221 21:54:01.301763 2743920 utils.go:152] "Load Profile" config=<
        count: 1
        loadProfile:
          version: 1
          description: node10-job1-pod100
          spec:
            rate: 10
            total: 1000
            conns: 10
            client: 10
            contentType: json
            disableHTTP2: false
            maxRetries: 0
            requests:
            - shares: 1000
              staleList:
                group: ""
                version: v1
                resource: pods
                namespace: ""
                limit: 0
                seletor: ""
                fieldSelector: ""
            - shares: 100
              quorumList:
                group: ""
                version: v1
                resource: pods
                namespace: ""
                limit: 1000
                seletor: ""
                fieldSelector: ""
            - shares: 100
              quorumList:
                group: ""
                version: v1
                resource: events
                namespace: ""
                limit: 1000
                seletor: ""
                fieldSelector: ""
        nodeAffinity:
          node.kubernetes.io/instance-type:
          - Standard_D16s_v3
          - m4.4xlarge
          - n1-standard-16
 >
I1221 21:54:01.302081 2743920 utils.go:97] "Deploying virtual nodepool" name="node10job1pod100"
I1221 21:54:01.302098 2743920 utils.go:115] "Trying to delete nodepool if necessary" name="node10job1pod100"
I1221 21:54:01.302151 2743920 utils.go:388] [CMD] /usr/local/bin/kperf vc nodepool --kubeconfig=/home/ansonqian/.kube/config delete node10job1pod100
I1221 21:54:01.478761 2743920 utils.go:388] [CMD] /usr/local/bin/kperf vc nodepool --kubeconfig=/home/ansonqian/.kube/config add node10job1pod100 --nodes=100 --cpu=32 --memory=96 --max-pods=110 --affinity=node.kubernetes.io/instance-type=Standard_D8s_v3,m4.2xlarge,n1-standard-8
I1221 21:54:17.626476 2743920 utils.go:236] "Deploying runner group" config="/tmp/temp1227027314"
I1221 21:54:17.626509 2743920 utils.go:240] Deleting existing runner group
I1221 21:54:17.626567 2743920 utils.go:46] Repeat to create job with 3k pods
I1221 21:54:17.626606 2743920 utils.go:388] [CMD] /usr/local/bin/kperf rg --kubeconfig=/home/ansonqian/.kube/config delete
I1221 21:54:17.626730 2743920 utils.go:62] Creating namespace job1pod100
I1221 21:54:17.626827 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config create namespace job1pod100
I1221 21:54:17.793976 2743920 utils.go:388] [CMD] /usr/local/bin/kperf rg --kubeconfig=/home/ansonqian/.kube/config run --runnergroup=file:///tmp/temp1227027314 --runner-image=ghcr.io/azure/kperf:0.1.8 --affinity=node.kubernetes.io/instance-type=Standard_D16s_v3,m4.4xlarge,n1-standard-16 --runner-flowcontrol=workload-low:1000
I1221 21:54:21.173162 2743920 utils.go:251] Waiting runner group
I1221 21:54:21.173270 2743920 utils.go:388] [CMD] /usr/local/bin/kperf rg --kubeconfig=/home/ansonqian/.kube/config result
I1221 21:54:22.784115 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config -n job1pod100 apply -f /tmp/temp821955404
I1221 21:54:23.083833 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config -n job1pod100 wait --for=condition=complete --timeout=15m job/batchjobs
E1221 21:55:21.228889 2743920 utils.go:267] "failed to fetch runner group's result" err=<
        failed to invoke /usr/local/bin/kperf rg --kubeconfig=/home/ansonqian/.kube/config result:
         (output: ): signal: killed
 >
I1221 21:55:21.229000 2743920 utils.go:388] [CMD] /usr/local/bin/kperf rg --kubeconfig=/home/ansonqian/.kube/config result
I1221 21:56:03.200705 2743920 utils.go:270] "Runner group's result" data=<
        {
          "total": 1000,
          "duration": "1m40.016519689s",
          "errorStats": {
            "unknownErrors": null,
            "netErrors": null,
            "responseCodes": null,
            "http2Errors": {}
          },
          "totalReceivedBytes": 5648656886,
          "percentileLatencies": [
            [
              0,
              0.024603586
            ],
            [
              0.5,
              0.098262593
            ],
            [
              0.9,
              0.139573687
            ],
            [
              0.95,
              0.162021473
            ],
            [
              0.99,
              0.210638021
            ],
            [
              1,
              0.433002973
            ]
          ],
          "percentileLatenciesByURL": {
            "https://34.118.224.1:443/api/v1/events?limit=1000\u0026timeout=1m0s": [
              [
                0,
                0.024603586
              ],
              [
                0.5,
                0.030703546
              ],
              [
                0.9,
                0.040356291
              ],
              [
                0.95,
                0.048135445
              ],
              [
                0.99,
                0.074674283
              ],
              [
                1,
                0.074674283
              ]
            ],
            "https://34.118.224.1:443/api/v1/pods?limit=1000\u0026timeout=1m0s": [
              [
                0,
                0.092264801
              ],
              [
                0.5,
                0.153941227
              ],
              [
                0.9,
                0.196386019
              ],
              [
                0.95,
                0.271004627
              ],
              [
                0.99,
                0.329401482
              ],
              [
                1,
                0.329401482
              ]
            ],
            "https://34.118.224.1:443/api/v1/pods?resourceVersion=0\u0026timeout=1m0s": [
              [
                0,
                0.05193836
              ],
              [
                0.5,
                0.098406446
              ],
              [
                0.9,
                0.124743188
              ],
              [
                0.95,
                0.136332168
              ],
              [
                0.99,
                0.165001933
              ],
              [
                1,
                0.433002973
              ]
            ]
          }
        }
 >
I1221 21:56:03.201104 2743920 utils.go:277] Deleting runner group
I1221 21:56:03.201175 2743920 utils.go:388] [CMD] /usr/local/bin/kperf rg --kubeconfig=/home/ansonqian/.kube/config delete
E1221 21:56:03.814836 2743920 utils.go:95] "failed to wait" err=<
        failed to invoke /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config -n job1pod100 wait --for=condition=complete --timeout=15m job/batchjobs:
         (output: ): signal: killed
 > job="workload/100pod.job.yaml"
I1221 21:56:03.814928 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config -n job1pod100 delete -f /tmp/temp821955404
E1221 21:56:03.814946 2743920 utils.go:100] "failed to delete" err=<
        failed to invoke /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config -n job1pod100 delete -f /tmp/temp821955404:
         (output: ): context canceled
 > job="workload/100pod.job.yaml"
I1221 21:56:08.819143 2743920 utils.go:80] Stop creating job
I1221 21:56:08.819200 2743920 utils.go:69] Cleanup namespace job1pod100
I1221 21:56:08.819273 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config delete namespace job1pod100
I1221 21:56:14.520149 2743920 utils.go:388] [CMD] /usr/local/bin/kperf vc nodepool --kubeconfig=/home/ansonqian/.kube/config delete node10job1pod100
I1221 21:56:15.985835 2743920 utils.go:287] Fetching apiserver's cores
I1221 21:56:15.985910 2743920 utils.go:388] [CMD] /usr/local/bin/kubectl --kubeconfig /home/ansonqian/.kube/config cluster-info
E1221 21:56:16.147100 2743920 utils.go:322] "failed to get cores" err="failed to get metrics for ip 35.196.245.108: failed to create a new mount namespace: operation not permitted" ip="35.196.245.108"
{
  "description": "\nEnvironment: 100 virtual nodes managed by kwok-controller,\nWorkload: Deploy 1 job with 3,000 pods repeatedly. The parallelism is 100. The interval is 5s",
  "loadSpec": {
    "count": 1,
    "loadProfile": {
      "version": 1,
      "description": "node10-job1-pod100",
      "spec": {
        "rate": 10,
        "total": 1000,
        "conns": 10,
        "client": 10,
        "contentType": "json",
        "disableHTTP2": false,
        "maxRetries": 0,
        "Requests": [
          {
            "shares": 1000,
            "staleList": {
              "group": "",
              "version": "v1",
              "resource": "pods",
              "namespace": "",
              "limit": 0,
              "seletor": "",
              "fieldSelector": ""
            }
          },
          {
            "shares": 100,
            "quorumList": {
              "group": "",
              "version": "v1",
              "resource": "pods",
              "namespace": "",
              "limit": 1000,
              "seletor": "",
              "fieldSelector": ""
            }
          },
          {
            "shares": 100,
            "quorumList": {
              "group": "",
              "version": "v1",
              "resource": "events",
              "namespace": "",
              "limit": 1000,
              "seletor": "",
              "fieldSelector": ""
            }
          }
        ]
      }
    },
    "nodeAffinity": {
      "node.kubernetes.io/instance-type": [
        "Standard_D16s_v3",
        "m4.4xlarge",
        "n1-standard-16"
      ]
    }
  },
  "result": {
    "total": 1000,
    "duration": "1m40.016519689s",
    "errorStats": {
      "unknownErrors": null,
      "netErrors": null,
      "responseCodes": null,
      "http2Errors": {}
    },
    "totalReceivedBytes": 5648656886,
    "percentileLatencies": [
      [
        0,
        0.024603586
      ],
      [
        0.5,
        0.098262593
      ],
      [
        0.9,
        0.139573687
      ],
      [
        0.95,
        0.162021473
      ],
      [
        0.99,
        0.210638021
      ],
      [
        1,
        0.433002973
      ]
    ],
    "percentileLatenciesByURL": {
      "https://34.118.224.1:443/api/v1/events?limit=1000\u0026timeout=1m0s": [
        [
          0,
          0.024603586
        ],
        [
          0.5,
          0.030703546
        ],
        [
          0.9,
          0.040356291
        ],
        [
          0.95,
          0.048135445
        ],
        [
          0.99,
          0.074674283
        ],
        [
          1,
          0.074674283
        ]
      ],
      "https://34.118.224.1:443/api/v1/pods?limit=1000\u0026timeout=1m0s": [
        [
          0,
          0.092264801
        ],
        [
          0.5,
          0.153941227
        ],
        [
          0.9,
          0.196386019
        ],
        [
          0.95,
          0.271004627
        ],
        [
          0.99,
          0.329401482
        ],
        [
          1,
          0.329401482
        ]
      ],
      "https://34.118.224.1:443/api/v1/pods?resourceVersion=0\u0026timeout=1m0s": [
        [
          0,
          0.05193836
        ],
        [
          0.5,
          0.098406446
        ],
        [
          0.9,
          0.124743188
        ],
        [
          0.95,
          0.136332168
        ],
        [
          0.99,
          0.165001933
        ],
        [
          1,
          0.433002973
        ]
      ]
    }
  },
  "info": {
    "apiserver": {
      "cores": {
        "after": {},
        "before": {}
      }
    }
  }
}```

@anson627 anson627 merged commit b839e02 into main Dec 21, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant