Skip to content

Commit

Permalink
d
Browse files Browse the repository at this point in the history
  • Loading branch information
weizhoublue committed Dec 14, 2023
1 parent 5f1c3c6 commit 7a38732
Show file tree
Hide file tree
Showing 8 changed files with 23 additions and 48 deletions.
2 changes: 1 addition & 1 deletion README-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ underlay CNI 主要指 macvlan、ipvlan、SR-IOV 等能够直接访问宿主机

* eBPF 增强

kube-proxy replacement 技术极大加速了访问 service 场景,同节点上的 socket 短路技术加速了本地 Pod 的通信效率。相比 kube proxy 解析方式,[网络延时有最大 25% 的改善,网络吞吐有 50% 的提高]((./docs/concepts/io-performance-zh_CN.md))
kube-proxy replacement 技术极大加速了访问 service 场景,同节点上的 socket 短路技术加速了本地 Pod 的通信效率。相比 kube proxy 解析方式,[网络延时有最大 25% 的改善,网络吞吐有 50% 的提高](./docs/concepts/io-performance-zh_CN.md)

* RDMA

Expand Down
2 changes: 2 additions & 0 deletions docs/usage/rdma-ib-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Spiderpool 赋能了 [IB-SRIOV](https://github.com/k8snetworkplumbingwg/ib-sriov

2. 基于 [IPoIB CNI](https://github.com/Mellanox/ipoib-cni) 给 POD 提供 IPoIB 的网卡,它并不提供 RDMA 网卡通信能力,适用于需要 TCP/IP 通信的常规应用,因为它不需要提供 SRIOV 网卡,因此能让主机上运行更多 POD

并且,在 RDMA 通信场景下,对于基于 clusterIP 进行通信的应用,为了实现让 RDMA 流量通过 underlay 网卡转发,可在容器网络命名空间内基于 cgroup eBPF 实现的 clusterIP 的解析,具体可参考 [cgroup eBPF 解析 clusterIP](./underlay_cni_service-zh_CN.md)

### 基于 IB-SRIOV 提供 RDMA 网卡

以下步骤演示在具备 2 个节点的集群上,如何基于 [IB-SRIOV](https://github.com/k8snetworkplumbingwg/ib-sriov-cni) 使得 Pod 接入 SRIOV 网卡,并提供网络命名空间隔离的 RDMA 设备:
Expand Down
2 changes: 2 additions & 0 deletions docs/usage/rdma-ib.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Different from RoCE, Infiniband network cards are proprietary devices based on I

2. [IPoIB CNI]https://github.com/mellanox/ipoib-cni )provides an IPoIB network card for POD, without RDMA device. It is suitable for conventional applications that require TCP/IP communication, as it does not require an SRIOV network card, allowing more PODs to run on the host

Moreover, in the RDMA communication scenario, for applications based on clusterIP communication, in order to enable RDMA traffic to be forwarded through the underlay network card, the resolution of clusterIP based on cgroup eBPF can be implemented in the container network namespace. For specific details, please refer to [cgroup eBPF Resolving ClusterIP](./underlay_cni_service zh-CN. md)

### RDMA network card based on IB-SRIOV

The following steps demonstrate how to use [IB-SRIOV](https://github.com/k8snetworkplumbingwg/ib-sriov-cni) on a cluster with 2 nodes. It enable Pod to own SR-IOV network card and RDMA devices with network namespace isolation:
Expand Down
2 changes: 2 additions & 0 deletions docs/usage/rdma-roce-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ RDMA 网卡,也可以基于 SR-IOV CNI 来使用 exclusive 模式的网卡。

在 exclusive 模式下,Spiderpool 使用了 [SR-IOV CNI](https://github.com/k8snetworkplumbingwg/sriov-network-operator) 来暴露宿主机上的 RDMA 网卡给 Pod 使用,暴露 RDMA 资源。使用 [RDMA CNI](https://github.com/k8snetworkplumbingwg/rdma-cni) 来完成 RDMA 设备隔离。

并且,在 RDMA 通信场景下,对于基于 clusterIP 进行通信的应用,为了实现让 RDMA 流量通过 underlay 网卡转发,可在容器网络命名空间内基于 cgroup eBPF 实现的 clusterIP 的解析,具体可参考 [cgroup eBPF 解析 clusterIP](./underlay_cni_service-zh_CN.md)

### 基于 macvlan 或 ipvlan 共享使用 RDMA RoCE 网卡

以下步骤演示在具备 2 个节点的集群上,如何基于 macvlan CNI 使得 Pod 共享使用 RDMA 设备:
Expand Down
2 changes: 2 additions & 0 deletions docs/usage/rdma-roce.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ In shared mode, Spiderpool leverages macvlan or ipvlan CNI to expose RoCE networ

In exclusive mode, Spiderpool utilizes [SR-IOV CNI](https://github.com/k8snetworkplumbingwg/sriov-network-operator) to expose RDMA cards on the host machine for Pods, providing access to RDMA resources. [RDMA CNI](https://github.com/k8snetworkplumbingwg/rdma-cni) is used to ensure isolation of RDMA devices.

Moreover, in the RDMA communication scenario, for applications based on clusterIP communication, in order to enable RDMA traffic to be forwarded through the underlay network card, the resolution of clusterIP based on cgroup eBPF can be implemented in the container network namespace. For specific details, please refer to [cgroup eBPF Resolving ClusterIP](./underlay_cni_service zh-CN. md)

### Shared usage of RoCE-capable NIC with macvlan or ipvlan

The following steps demonstrate how to enable shared usage of RDMA devices by Pods in a cluster with two nodes via macvlan CNI:
Expand Down
12 changes: 7 additions & 5 deletions docs/usage/underlay_cni_service-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@
目前社区中大多数 Underlay 类型的 CNI(如 Macvlan、IPVlan、Sriov-CNI 等)一般对接底层网络,往往并不原生支持访问集群的 Service 。这大多是因为 underlay Pod 访问 Service 需要经过交换机的网关转发,
但网关上并没有去往 Service 的路由,造成无法正确路由访问 Service 的报文,从而丢包。Spiderpool 提供以下两种的方案解决 Underlay CNI 访问 Service 的问题:

- 通过 `Spiderpool coordinator` + `kube-proxy` 实现 Underlay CNI 访问 Service
- 通过 `Cilium Without Kube-proxy` 实现 Underlay CNI 访问 Service
- 通过 `kube-proxy` 实现 Underlay CNI 访问 Service
- 通过 `kube-proxy replacement with cgroup eBPF` 实现 Underlay CNI 访问 Service

这两种方案都解决了 Underlay CNI 无法访问 Service 的问题,但实现原理有些不同。下面我们将介绍这两种方式:

## 基于 Spiderpool coordinator + kube-proxy
## 基于 kube-proxy 实现 service 访问

Spiderpool 内置 `coordinator` 插件,它可以帮助我们无缝对接 `kube-proxy` 以实现 Underlay CNI 访问 Service。 根据不同的场景,`coordinator` 可以运行在 `underlay``overlay` 模式,虽然实现方式稍显不同,但
核心原理都是将 Pod 访问 Service 的流量劫持的主机网络协议栈上,再经过 Kube-proxy 创建的 IPtables 规则做转发。
Expand Down Expand Up @@ -94,7 +94,7 @@ default via 10.6.0.1 dev net1

这些策略路由确保多网卡场景下,Underlay Pod 也能够正常访问 Service。

## 通过 Cilium Without Kube-proxy 实现 Underlay CNI 访问 Service
## 基于 cgroup eBPF 实现 service 访问

上面我们介绍了在 Spiderpool 中, 我们通过 `coordinator` 将 Pod 访问 Service 的流量劫持到主机转发, 再经过主机上 Kube-proxy 设置的 iptables 规则 DNAT (将目标地址改为目标 Pod) 之后,再转发至目标 Pod。
这可以虽然解决问题,但可能延长了数据访问路径,造成一定的性能损失。
Expand All @@ -105,6 +105,8 @@ default via 10.6.0.1 dev net1

![cilium_kube_proxy](../images/withou_kube_proxy.png)

经过测试,相比 kube proxy 解析方式,cgroup eBPF 方式的[网络延时有最大 25% 的改善,网络吞吐有 50% 的提高](./docs/concepts/io-performance-zh_CN.md)

以下步骤演示在具备 2 个节点的集群上,如何基于 Macvlan CNI + Cilium 加速访问 Service:

> 注意: 需要确保集群节点的内核版本至少大于 4.19
Expand Down Expand Up @@ -294,4 +296,4 @@ tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 byt

## 结论

Underlay CNI 访问 Service 有以上两种方案解决。kube-proxy 的方式更加常用稳定,大部分环境都可以稳定使用。 Cilium Without Kube-Proxy 为 Underlay CNI 访问 Service 提供了另一种可选方案,并且加速了 Service 访问,尽管这有一定使用限制及门槛,但在特定场景下能够满足用户的需求。
Underlay CNI 访问 Service 有以上两种方案解决。kube-proxy 的方式更加常用稳定,大部分环境都可以稳定使用。 cgroup eBPF 为 Underlay CNI 访问 Service 提供了另一种可选方案,并且加速了 Service 访问,尽管这有一定使用限制及门槛,但在特定场景下能够满足用户的需求。
12 changes: 7 additions & 5 deletions docs/usage/underlay_cni_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ correctly, resulting in packet loss.

Spiderpool provides the following two solutions to solve the problem of Underlay CNI accessing Service:

- Underlay CNI access Service via `Spiderpool coordinator` + `kube-proxy`
- Use `Cilium Without Kube-proxy` to access Underlay CNI Service
- Underlay CNI access Service via `kube-proxy`
- Use `cgroup eBPF` to access Underlay CNI Service

Both of these ways solve the problem that Underlay CNI cannot access Service, but the implementation principle is
somewhat different.

Below we will introduce these two ways:

## Underlay CNI access Service via `Spiderpool coordinator` + `kube-proxy`
## Access service by kube-proxy

Spiderpool has a built-in plugin called `coordinator`, which helps us seamlessly integrate with `kube-proxy` to achieve Underlay CNI access to Service.
Depending on different scenarios, the `coordinator` can run in either `underlay` or `overlay` mode. Although the implementation methods are slightly different,
Expand Down Expand Up @@ -107,7 +107,7 @@ default via 10.6.0.1 dev net1

These policy routes ensure that Underlay Pods can also normally access Service in multi-network card scenarios.

## Accessing Service with Cilium Without Kube-proxy for Underlay CNI
## Access service by cgroup eBPF

In Spiderpool, we hijack the traffic of Pods accessing Services through a `coordinator` that forwards it to the host and then through the iptables rules set up by the host's Kube-proxy.
This can solve the problem but may extend the data access path and cause some performance loss.
Expand All @@ -119,6 +119,8 @@ under the Underlay CNI through it.

![cilium_kube_proxy](../images/withou_kube_proxy.png)

After testing, compared with kube-proxy manner, cgroup eBPF solution has [an improvement of the performance Up to 25% on network delay, up to 50% on network throughput](./docs/concepts/io-performance.md).

The following steps demonstrate how to accelerate access to a Service on a cluster with 2 nodes based on Macvlan CNI + Cilium:

> NOTE: Please ensure that the kernel version of the cluster nodes is at least greater than 4.19
Expand Down Expand Up @@ -299,4 +301,4 @@ According to the results, after Cilium kube-proxy replacement, access to the ser

## Conclusion

There are two solutions to the Underlay CNI Access Service. The kube-proxy method is more commonly used and stable, and can be used stably in most environments. Cilium Without Kube-Proxy provides an alternative option for Underlay CNI to access the Service and accelerates Service access. Although there are certain restrictions and thresholds for use, it can meet the needs of users in specific scenarios.
There are two solutions to the Underlay CNI Access Service. The kube-proxy method is more commonly used and stable, and can be used stably in most environments. cgroup eBPF is an alternative option for Underlay CNI to access the Service and accelerates Service access. Although there are certain restrictions and thresholds for use, it can meet the needs of users in specific scenarios.
37 changes: 0 additions & 37 deletions images/spiderpool-plugins/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,43 +39,6 @@ INSTALL_IB_SRIOV_PLUGIN=${INSTALL_IB_SRIOV_PLUGIN:-false}
INSTALL_IPOIB_PLUGIN=${INSTALL_IPOIB_PLUGIN:-false}
INSTALL_CNI_PLUGINS=${INSTALL_CNI_PLUGINS:-false}

# Parse parameters given as arguments to this script.
if [ -n "$1" ]; then
while [ "$1" != "" ]; do
PARAM=`echo $1 | awk -F= '{print $1}'`
VALUE=`echo $1 | awk -F= '{print $2}'`
case $PARAM in
-h | --help)
usage
exit
;;
--install-cni)
INSTALL_CNI_PLUGINS=$VALUE
;;
--install-ovs)
INSTALL_OVS_PLUGIN=$VALUE
;;
--install-rdma)
INSTALL_RDMA_PLUGIN=$VALUE
;;
--install-ib-sriov)
INSTALL_IB_SRIOV_PLUGIN=$VALUE
;;
--install-ipoib)
INSTALL_IPOIB_PLUGIN=$VALUE
;;
--copy-dst-dir)
COPY_DST_DIR=$VALUE
;;
*)
warn "unknown parameter \"$PARAM\""
;;
esac
shift
done
fi


mkdir -p ${COPY_DST_DIR} || true

if [ "$INSTALL_CNI_PLUGINS" = "true" ]; then
Expand Down

0 comments on commit 7a38732

Please sign in to comment.