Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect IP conflicts before gateway detection to fix communication fail #4474

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

cyclinder
Copy link
Collaborator

@cyclinder cyclinder commented Dec 31, 2024

Detect IP conflicts before gateway detection to fix gateway detection could potentially update the arp table causing communication fail

Thanks for contributing!

Notice:

What issue(s) does this PR fix:

Fixes #4475

Special notes for your reviewer:

@cyclinder cyclinder added release/bug cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. labels Dec 31, 2024
Copy link

codecov bot commented Dec 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 6.79%. Comparing base (415359e) to head (c7c347a).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #4474   +/-   ##
=====================================
  Coverage   6.79%   6.79%           
=====================================
  Files         59      59           
  Lines       7177    7177           
=====================================
  Hits         488     488           
  Misses      6654    6654           
  Partials      35      35           
Flag Coverage Δ
unittests 6.79% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@cyclinder cyclinder changed the title Detect IP conflicts before gateway detection to fix gateway detection… Detect IP conflicts before gateway detection to fix communication fail Dec 31, 2024
@cyclinder cyclinder force-pushed the coordinator/serial_detect branch from 6ec9f62 to eca35ed Compare December 31, 2024 05:43
@@ -189,8 +189,56 @@ func CmdAdd(args *skel.CmdArgs) (err error) {

logger.Sugar().Infof("Get coordinator config: %+v", c)

errg := errgroup.Group{}
// we do detect gateway connection firstly
errgConflict := errgroup.Group{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充 code comment,关于 每一步先后循序 原因说明
arp 问题,引用 issue来详细说明

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cyclinder cyclinder force-pushed the coordinator/serial_detect branch 3 times, most recently from 1c5c3db to 0a14822 Compare December 31, 2024 09:47
if err != nil {
return fmt.Errorf("failed to run NewIPChecker: %w", err)
}
ipc.DoIPConflictChecking(prevResult.IPs, c.currentInterface, &errgConflict)
Copy link
Collaborator

@weizhoublue weizhoublue Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个超时控制 在哪体现的
比如 最多 100ms 内要给出 测试结论

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipc, err = ipchecking.NewIPChecker(conf.DetectOptions.Retry, conf.DetectOptions.Interval, conf.DetectOptions.TimeOut, c.hostNs, c.netns, logger)

Copy link
Collaborator

@weizhoublue weizhoublue Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) 这默认超时时间多少,在代码上 comment 下,这个变量 不容易 查源头的值
(2) 在 ipchecking.NewIPChecker 前打一行 有关超时 时间 的日志

(3) 缺省值 是 1s ?太长了

func ValidateDelectOptions(config *DetectOptions) (*DetectOptions, error) {
	if config == nil {
		return &DetectOptions{
			Interval: "1s",
			TimeOut:  "3s",
			Retry:    3,
		}, nil
	}

	if config.Retry == 0 {
		config.Retry = 3
	}

	if config.Interval == "" {
		config.Interval = "1s"
	}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些信息会在 final configuration 中打印,类似:

{"level":"DEBUG","ts":"2024-12-31T09:41:43.783Z","logger":"coordinator","caller":"cmd/command_add.go:84","msg":"final configuration","Action":"ADD","ContainerID":"17dc097c71526a6ab4ac53c07e48cc43e30d9b13bcac603ac2891eb23921dd2a","Netns":"/var/run/netns/cni-************************************","IfName":"net1","PodName":"dep-name-4ef71f347c-5496b8784b-r9s2v","PodNamespace":"ns-7dff59fa8a","conf":{"cniVersion":"0.3.1","name":"test-multus-6b86b94379","type":"coordinator","ipam":{},"dns":{},"detectGateway":false,"podDefaultCniNic":"eth0","overlayPodCIDR":["***********/18","fd00:10:243::/112"],"serviceCIDR":["**********/18","fd00:10:233::/116"],"hijackCIDR":["***********/16"],"tunePodRoutes":true,"mode":"overlay","hostRuleTable":500,"podRPFilter":0,"txQueueLen":0,"detectIPConflict":true,"detectOptions":{"retries":3,"interval":"1s","timeout":"3s"},"logOptions":{"logLevel":"debug","logFile":"/var/log/spidernet/coordinator.log","logMaxSize":0,"logMaxAge":0,"logMaxCount":0}}}

缺省值 是 1s ?太长了

多少合适?100ms?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100ms 足够够了,并且最好要 实测下

if err = errg.Wait(); err != nil {
logger.Error("failed to detect gateway and ip checking", zap.Error(err))
if errors.Is(err, constant.ErrIPConflict) || errors.Is(err, constant.ErrGatewayUnreachable) {
if err = errgGateway.Wait(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理 检测 网关的 超时控制,在哪

@cyclinder cyclinder force-pushed the coordinator/serial_detect branch from 0a14822 to 0c49bf6 Compare December 31, 2024 12:00
… could potentially update the arp table causing communication fail

Signed-off-by: Cyclinder Kuo <[email protected]>
@cyclinder cyclinder force-pushed the coordinator/serial_detect branch from 0c49bf6 to c7c347a Compare December 31, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. release/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coordinator: Parallel execution of gateway and IP conflicts may cause arp table entry anomalies
2 participants