Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nil pointer dereference after upgrade from 1.5 to 1.7 #24846

Open
epachirkov opened this issue Jan 13, 2025 · 2 comments · May be fixed by #24893
Open

nil pointer dereference after upgrade from 1.5 to 1.7 #24846

epachirkov opened this issue Jan 13, 2025 · 2 comments · May be fixed by #24893

Comments

@epachirkov
Copy link

epachirkov commented Jan 13, 2025

Nomad version

Nomad v1.7.5
BuildDate 2024-02-13T15:10:13Z
Revision 5f5d464

Operating system and Environment details

CPU i9-13900. Ubuntu 22.04, 3-node nomad cluster.

Issue

When i try to plan job:
failed to process eval: runtime error: invalid memory address or nil pointer dereference
In a test virtual machine (kvm) with the same version of nomad, all jobs work fine, without errors.

Reproduction steps

nomad job plan resec.hcl

Expected Result

plan and run job

Actual Result

    2025-01-13T06:06:47.267Z [ERROR] nomad.job.service_sched: processing eval panicked scheduler - please report this as a bug!: eval_id=fb6c73ec-e08f-ee71-103c-86391bcfad1f job_id=resec-redis namespace=default eval_id=fb6c73ec-e08f-ee71-103c-86391bcfad1f error="runtime error: invalid memory address or nil pointer dereference"
   stack_trace=
   | goroutine 1085289 [running]:
   | runtime/debug.Stack()
   | \truntime/debug/stack.go:24 +0x5e
   | github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process.func1()
   | \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:153 +0x58
   | panic({0x2a88140?, 0x4f5ea50?})
   | \truntime/panic.go:914 +0x21f
   | github.com/hashicorp/nomad/nomad/structs.(*Allocation).NextRescheduleTime(0xc0010bfc00)
   | \tgithub.com/hashicorp/nomad/nomad/structs/structs.go:10938 +0x10e
   | github.com/hashicorp/nomad/scheduler.updateByReschedulable(0xc0010bfc00, {0x0?, 0x0?, 0x523b080?}, {0xc0016b8c90, 0x24}, 0x0?, 0x0)
   | \tgithub.com/hashicorp/nomad/scheduler/reconcile_util.go:515 +0x2ae
   | github.com/hashicorp/nomad/scheduler.allocSet.filterByRescheduleable(0xc003443080?, 0xd0?, 0x0, {0xc0027c8e00?, 0x5?, 0x523b080?}, {0xc0016b8c90, 0x24}, 0x5?)
   | \tgithub.com/hashicorp/nomad/scheduler/reconcile_util.go:423 +0x30a
   | github.com/hashicorp/nomad/scheduler.(*allocReconciler).computeGroup(0xc003443080, {0xc0007d4e2b, 0x5}, 0x0?)
   | \tgithub.com/hashicorp/nomad/scheduler/reconcile.go:456 +0x337
   | github.com/hashicorp/nomad/scheduler.(*allocReconciler).computeDeploymentComplete(0xc003443080?, 0xc0055ac080?)
   | \tgithub.com/hashicorp/nomad/scheduler/reconcile.go:253 +0x70
   | github.com/hashicorp/nomad/scheduler.(*allocReconciler).Compute(0xc003443080)
   | \tgithub.com/hashicorp/nomad/scheduler/reconcile.go:244 +0x291
   | github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computeJobAllocs(0xc003442e70)
   | \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:389 +0x365
   | github.com/hashicorp/nomad/scheduler.(*GenericScheduler).process(0xc003442e70)
   | \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:289 +0x49a
   | github.com/hashicorp/nomad/scheduler.retryMax(0x5, 0xc0055ac8d8, 0xc0055ac8c8)
   | \tgithub.com/hashicorp/nomad/scheduler/util.go:96 +0x49
   | github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process(0xc003442e70, 0xc004262180)
   | \tgithub.com/hashicorp/nomad/scheduler/generic_sched.go:188 +0x55f
   | github.com/hashicorp/nomad/nomad.(*Job).Plan(0xc000bcd680, 0xc003409260, 0xc003d44540)
   | \tgithub.com/hashicorp/nomad/nomad/job_endpoint.go:1926 +0xa7f
   | reflect.Value.call({0xc000dbac00?, 0xc000dbe380?, 0x7ff5887cff28?}, {0x3068196, 0x4}, {0xc003a57538, 0x3, 0x0?})
   | \treflect/value.go:596 +0xce7
   | reflect.Value.Call({0xc000dbac00?, 0xc000dbe380?, 0x4a7675?}, {0xc003a57538?, 0xc003a57588?, 0xa1a187?})
   | \treflect/value.go:380 +0xb9
   | net/rpc.(*service).call(0xc000bbf700, 0x2de3480?, 0x0?, 0x0, 0xc000dafe80, 0x40?, {0x2f328a0?, 0xc003409260?, 0x0?}, {0x2c03020, ...}, ...)
   | \tnet/rpc/server.go:382 +0x211
   | net/rpc.(*Server).ServeRequest(0x1c?, {0x3816038, 0xc0058e7080})
   | \tnet/rpc/server.go:503 +0x165
   | github.com/hashicorp/nomad/nomad.(*Server).RPC(0xc000005500, {0x307bd3a, 0x8}, {0x2f328a0?, 0xc0034091a0}, {0x2c03020?, 0xc003d444d0})
   | \tgithub.com/hashicorp/nomad/nomad/server.go:2019 +0xeb
   | github.com/hashicorp/nomad/command/agent.(*Agent).RPC(0xc0016413b0?, {0x307bd3a?, 0xc0027e4000?}, {0x2f328a0?, 0xc0034091a0?}, {0x2c03020?, 0xc003d444d0?})
   | \tgithub.com/hashicorp/nomad/command/agent/agent.go:1281 +0x11b
   | github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobPlan(0xc0016413b0, {0x3812930, 0xc0034090e0}, 0xc0027e4000, {0xc0016b887d, 0xb})
   | \tgithub.com/hashicorp/nomad/command/agent/job_endpoint.go:189 +0x324
   | github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest(0x0?, {0x3812930, 0xc0034090e0}, 0xc0027e4000)
   | \tgithub.com/hashicorp/nomad/command/agent/job_endpoint.go:84 +0x405
   | github.com/hashicorp/nomad/command/agent.(*HTTPServer).registerHandlers.(*HTTPServer).wrap.func4({0x3812930, 0xc0034090e0}, 0xc0027e4000)
   | \tgithub.com/hashicorp/nomad/command/agent/http.go:716 +0x168
   | net/http.HandlerFunc.ServeHTTP(0xd2a42c?, {0x3812930?, 0xc0034090e0?}, 0xc006c1bcc0?)
   | \tnet/http/server.go:2136 +0x29
   | net/http.(*ServeMux).ServeHTTP(0x0?, {0x3812930, 0xc0034090e0}, 0xc0027e4000)
   | \tnet/http/server.go:2514 +0x142
   | github.com/hashicorp/nomad/command/agent.NewHTTPServers.CompressHandler.CompressHandlerLevel.func3({0x380b070?, 0xc0018221c0}, 0xc0027e4000)
   | \tgithub.com/gorilla/[email protected]/compress.go:141 +0x547
   | net/http.HandlerFunc.ServeHTTP(0x415345?, {0x380b070?, 0xc0018221c0?}, 0xc001822101?)
   | \tnet/http/server.go:2136 +0x29
   | net/http.serverHandler.ServeHTTP({0x3802518?}, {0x380b070?, 0xc0018221c0?}, 0x6?)
   | \tnet/http/server.go:2938 +0x8e
   | net/http.(*conn).serve(0xc001176630, {0x3815438, 0xc001e0f260})
   | \tnet/http/server.go:2009 +0x5f4
   | created by net/http.(*Server).Serve in goroutine 7697
   | \tnet/http/server.go:3086 +0x5cb
        2025-01-13T06:06:47.267Z [ERROR] http: request failed: method=POST path=/v1/job/resec-redis/plan error="failed to process eval: runtime error: invalid memory address or nil pointer dereference" code=500
tgross added a commit that referenced this issue Jan 17, 2025
When upgrading from older versions of Nomad, the reschedule policy block may be
nil. There is logic to handle this safely in the `NextRescheduleTimeByTime` used
for allocs on disconnected clients, but it's missing from the
`NextRescheduleTime` method used by more typical allocations. Return an empty
time object in this case.

Fixes: #24846
@tgross
Copy link
Member

tgross commented Jan 17, 2025

Hi @epachirkov! I took a quick look at this and it seems like we're missing a nil pointer check when checking for the next reschedule time for an allocation that's not on a disconnected client. I've got a draft PR up with a fix here: #24893 (but note this won't be backported to 1.7.x CE, only the 1.9.x series gets backported bug fixes in CE).

@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Jan 17, 2025
@tgross tgross self-assigned this Jan 17, 2025
@tgross
Copy link
Member

tgross commented Jan 17, 2025

@epachirkov I want to make sure we've root-caused this panic. Can you share the jobspec you were trying to plan? And was it an existing job that you were updating, or a new job? Does the panic happen every time you plan this jobspec on this particular cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants