Skip to content
This repository has been archived by the owner on Apr 2, 2019. It is now read-only.

running an infinite loop gave a malloc deadlock panic #26

Open
ilackarms opened this issue Feb 4, 2016 · 10 comments
Open

running an infinite loop gave a malloc deadlock panic #26

ilackarms opened this issue Feb 4, 2016 · 10 comments
Labels

Comments

@ilackarms
Copy link
Contributor

unexpected error occurred when running a test app with gorump (was observed with both xen & qemu):

application code:

package main

import (
    "fmt"
    "net/http"
)

var someText string
var count int

func main() {
    go func(){
    fmt.Println("this is a goroutine")
    for i := 0; i<10; i++ {
      fmt.Printf("%v\n", i)
    }
  }()
    http.HandleFunc("/", handler)
    go http.ListenAndServe(":3000", nil)
  for {
    someText = fmt.Sprintf("count: %v\n", count)
    fmt.Printf("count: %v\n", count)
    count++
  }
}

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, `
    <html><head>
    <title>foobar!</title>
    <body>
    <center>
    <h3>Goroutine is running: %s</h3>
    </center>
    </body>
    </html>
    `, someText)
}

console output:

Xen Minimal OS!
  start_info: 0xd00000(VA)
    nr_pages: 0x6a400
  shared_inf: 0x7de6d000(MA)
     pt_base: 0xd03000(VA)
nr_pt_frames: 0xb
    mfn_list: 0x9ae000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda1 ro 4
  stack:      0x96d840-0x98d840
MM: Init
      _text: 0x0(VA)
     _etext: 0x7dc7d(VA)
   _erodata: 0x9a000(VA)
     _edata: 0x9fce0(VA)
  Booting 'rumprun-program_xen.bin.ec2dir'

root (hd0)
 Filesystem type is ext2fs, using whole disk
kernel /boot/program_xen.bin _RUMPRUN_ROOTFSCFG=/json.cfg

============= Init TPM Front ================
Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront initialization! error = ENOENT
Tpmfront:Info Shutting down tpmfront
close blk: backend=/local/domain/0/backend/vbd/24517/2049 node=device/vbd/2049
close blk: backend=/local/domain/0/backend/vbd/24517/2051 node=device/vbd/2051
  start_info: 0xc6c000(VA)
    nr_pages: 0x6a400
  shared_inf: 0x7de6d000(MA)
     pt_base: 0xc6f000(VA)
nr_pt_frames: 0xb
    mfn_list: 0x91a000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: _RUMPRUN_ROOTFSCFG=/json.cfg
  stack:      0x8830c0-0x9030c0
MM: Init
      _text: 0x0(VA)
     _etext: 0x3eecba(VA)
   _erodata: 0x6c3000(VA)
     _edata: 0x6d42ec(VA)
stack start: 0x8830c0(VA)
       _end: 0x9139a8(VA)
  start_pfn: c7d
    max_pfn: 6a400
Mapping memory range 0xc7d000 - 0x6a400000
setting 0x0-0x6c3000 readonly
skipped 0x1000
MM: Initialise page allocator for fc8000(fc8000)-6a400000(6a400000)
MM: done
Demand map pfns at 6a401000-206a401000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x6a401000.
xenbus initialised on irq 1 mfn 0x104c5f6
Failed to read device/pci/0/backend-id.
no backend
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 7.99.21 (RUMP-ROAST)
total memory = 842 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
cpu0 at thinair0: rump virtual cpu
root file system type: rumpfs
kern.module.path=/stand/amd64/7.99.21/modules
mainbus0 (root)
timecounter: Timecounter "bmktc" frequency 1000000000 Hz quality 100
mounted tmpfs on /tmp
rumprun: call to ``_sys___sigprocmask14'' ignored
blkfront: node=device/vbd/2049 backend=/local/domain/0/backend/vbd/24517/2049
Failed to read /local/domain/0/backend/vbd/24517/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/24517/2049/feature-flush-cache.
blkfront: 2097152 sectors
blkfront detached: node=device/vbd/2049
blkfront: node=device/vbd/2049 backend=/local/domain/0/backend/vbd/24517/2049
Failed to read /local/domain/0/backend/vbd/24517/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/24517/2049/feature-flush-cache.
blkfront: 2097152 sectors
blkfront detached: node=device/vbd/2049
blkfront: node=device/vbd/2049 backend=/local/domain/0/backend/vbd/24517/2049
Failed to read /local/domain/0/backend/vbd/24517/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/24517/2049/feature-flush-cache.
blkfront: 2097152 sectors
/dev/sda1: hostpath XENBLK_sda1 (1024 MB)
net TX ring size 256
net RX ring size 256
netfront: node=device/vif/0 backend=/local/domain/0/backend/vif/24517/0
netfront: MAC 12:bc:82:98:e6:51
xenif0: Ethernet address 12:bc:82:98:e6:51
dhcp: xenif0: adding IP address 172.31.52.88/20
dhcp: xenif0: adding route to 172.31.48.0/20
dhcp: xenif0: adding default route via 172.31.48.1
lease time: 3600 seconds

=== calling "program_xen.bin" main() ===

this is a goroutine
0
1
2
3
4
5
6
7
8
9
count: 0
count: 1
count: 2
count: 3
count: 4
count: 5
count: 6
count: 7
count: 8
count: 9
count: 10
count: 11
panic: fatal error: malloc deadlock

runtime stack:
runtime.throw(0x4e2bf0, 0xf)
    /usr/local/go/src/runtime/panic.go:527 +0x90
runtime.mallocgc(0x21, 0x0, 0x3, 0x37d71f)
    /usr/local/go/src/runtime/malloc.go:516 +0x15e
runtime.rawstring(0x21, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:264 +0x70
runtime.rawstringtmp(0x0, 0x21, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:107 +0xb7
runtime.concatstrings(0x0, 0x68040af8, 0x2, 0x2, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:48 +0x1c2
runtime.concatstring2(0x0, 0x4e43d0, 0xf, 0x50e490, 0x12, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:58 +0x67
runtime.(*errorString).Error(0x182c030, 0x0, 0x0)
    <autogenerated>:2 +0xde
runtime.printany(0x4804a0, 0x182c030)
    /usr/local/go/src/runtime/error.go:75 +0x1b1
runtime.gopanic(0x4804a0, 0x182c030)
    /usr/local/go/src/runtime/panic.go:352 +0x7a
runtime.panicindex()
    /usr/local/go/src/runtime/panic.go:12 +0x49
runtime.mHeap_Grow(0x781740, 0x10, 0x0)
    /usr/local/go/src/runtime/mheap.go:647 +0x26a
runtime.mHeap_AllocSpanLocked(0x781740, 0xb, 0x1686b50)
    /usr/local/go/src/runtime/mheap.go:532 +0x5de
runtime.mHeap_Alloc_m(0x781740, 0xb, 0x100000000, 0x1686b50)
    /usr/local/go/src/runtime/mheap.go:425 +0x1ac
runtime.mHeap_Alloc.func1()
    /usr/local/go/src/runtime/mheap.go:484 +0x41
runtime.systemstack(0x68040df8)
    /usr/local/go/src/runtime/asm_amd64.s:278 +0xab
runtime.mHeap_Alloc(0x781740, 0xb, 0x100000000, 0x1)
    /usr/local/go/src/runtime/mheap.go:485 +0x63
runtime.largeAlloc(0x15858, 0x3, 0x1686b50)
    /usr/local/go/src/runtime/malloc.go:748 +0xb3
runtime.mallocgc.func3()
    /usr/local/go/src/runtime/malloc.go:637 +0x33
runtime.systemstack(0x68040ea8)
    /usr/local/go/src/runtime/asm_amd64.s:262 +0x79
runtime.mstart()
    /usr/local/go/src/runtime/proc1.go:674

goroutine 17 [running, locked to thread]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:216 fp=0x185fb40 sp=0x185fb38
runtime.mallocgc(0x15858, 0x0, 0x3, 0x0)
    /usr/local/go/src/runtime/malloc.go:638 +0x9c4 fp=0x185fc10 sp=0x185fb40
runtime.rawstring(0x15858, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:264 +0x70 fp=0x185fc58 sp=0x185fc10
runtime.rawstringtmp(0x0, 0x15858, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:107 +0xb7 fp=0x185fc90 sp=0x185fc58
runtime.concatstrings(0x0, 0x185fe20, 0x2, 0x2, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:48 +0x1c2 fp=0x185fdc8 sp=0x185fc90
runtime.concatstring2(0x0, 0x1a02000, 0x1584e, 0x182dbd0, 0xa, 0x0, 0x0)
    /usr/local/go/src/runtime/string.go:58 +0x67 fp=0x185fe18 sp=0x185fdc8
main.main()
    /opt/code/exampleapp.go:34 +0x1d4 fp=0x185fec8 sp=0x185fe18
main.gomaincaller()
    /opt/code/gomaincaller.go:7 +0x14 fp=0x185fed0 sp=0x185fec8
runtime.call32(0x0, 0x68040ed8, 0x68040f6f, 0x0)
    /usr/local/go/src/runtime/asm_amd64.s:437 +0x3e fp=0x185fef8 sp=0x185fed0
runtime.cgocallbackg1()
    /usr/local/go/src/runtime/cgocall.go:252 +0x10c fp=0x185ff30 sp=0x185fef8
runtime.cgocallbackg()
    /usr/local/go/src/runtime/cgocall.go:177 +0xd7 fp=0x185ff90 sp=0x185ff30
runtime.cgocallback_gofunc(0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/asm_amd64.s:801 +0x60 fp=0x185ffa0 sp=0x185ff90
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0x185ffa8 sp=0x185ffa0

goroutine 9 [IO wait]:
net.runtime_pollWait(0x1785580, 0x72, 0x182c140)
    /usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0x18661b0, 0x72, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0x18661b0, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0x1866150, 0x182f000, 0x1000, 0x1000, 0x0, 0x1780050, 0x182c140)
    /usr/local/go/src/net/fd_unix.go:232 +0x23a
net.(*conn).Read(0x183e040, 0x182f000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:172 +0xe4
net/http.noteEOFReader.Read(0x17856d8, 0x183e040, 0x18b6108, 0x182f000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:1370 +0x67
net/http.(*noteEOFReader).Read(0x18625a0, 0x182f000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    <autogenerated>:126 +0xd0
bufio.(*Reader).fill(0x1830480)
    /usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Peek(0x1830480, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(*persistConn).readLoop(0x18b60b0)
    /usr/local/go/src/net/http/transport.go:876 +0xf7
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:685 +0xc78

goroutine 18 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 6 [IO wait]:
net.runtime_pollWait(0x17854c0, 0x72, 0x182c140)
    /usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0x1866220, 0x72, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0x1866220, 0x0, 0x0)
    /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).accept(0x18661c0, 0x0, 0x17856b0, 0x1862540)
    /usr/local/go/src/net/fd_unix.go:408 +0x27c
net.(*TCPListener).AcceptTCP(0x183e038, 0x184ede0, 0x0, 0x0)
    /usr/local/go/src/net/tcpsock_posix.go:254 +0x4d
net/http.tcpKeepAliveListener.Accept(0x183e038, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:2135 +0x41
net/http.(*Server).Serve(0x1830420, 0x1785678, 0x183e038, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:1887 +0xb3
net/http.(*Server).ListenAndServe(0x1830420, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:1877 +0x136
net/http.ListenAndServe(0x4d6f90, 0x5, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:1967 +0x8f
created by main.main
    /opt/code/exampleapp.go:32 +0xa1

goroutine 10 [select]:
net/http.(*persistConn).writeLoop(0x18b60b0)
    /usr/local/go/src/net/http/transport.go:1009 +0x40c
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:686 +0xc9d

=== ERROR: _exit(2) called ===
panic: releasing non-pertinent lwp
rump kernel halting...
PANIC: rumpuser panic
port 4 still bound!
port 5 still bound!
minios: halting, reason=0
@eyberg eyberg added the bug label Feb 4, 2016
@anttikantee
Copy link
Contributor

So what, it manages only 11 goroutines with gigs of memory? I wonder if something is overallocating memory for each goroutine.

@ilackarms
Copy link
Contributor Author

there's only two goroutines in this code; one that runs the http server, one that does a for 0->9 for loop. the crash happens during the infinite for loop at the end which is not run in a goroutine

@eyberg
Copy link
Member

eyberg commented Feb 5, 2016

I haven't looked at this at all yet - I'll try to find some time this weekend - 2 thoughts though:

  1. go routines are extremely cheap - 8k/per

  2. this code is racy and count is global - I'm sure this exposes an underlying problem that needs to be fixed - that's not a question to me, but does it go away if count is protected w/a mutex ? go's memory model is such that all bets are off if you have racy code

@ilackarms
Copy link
Contributor Author

i will try trimming down the code to the bare minimum that reproduces this error and post results

@eyberg
Copy link
Member

eyberg commented Feb 5, 2016

btw - if either of you think there is a memory allocation issue which there could be - it'd be useful to modify that print statement to show the growth on each iteration

@anttikantee
Copy link
Contributor

Ah. I don't speak go, so i was just guessing what the code does without reading it carefully ;-)

What's the behaviour of the code if run on a regular OS? Ian's theory sounds a bit sci-fi ;-) (but then again, I really don't know go)

@eyberg
Copy link
Member

eyberg commented Feb 7, 2016

took a very brief look at this - a few things:

  1. in your example you weren't doing export/linking ?

  2. I didn't see anys substantial mem growth - was logging that out - w/your count I went up to around 31,000

  3. explicit calls to time.Sleep seemed to totally block everything else which is another issue - there's still quite a bit left un-implemented

@anttikantee
Copy link
Contributor

re 3: yea I don't know about sleep without digging into it. I'm not sure why it blocks everything else, but I have a hard time imagining how the wakeup works if go handles sleep internally.

@eyberg eyberg mentioned this issue Mar 7, 2016
@yuval-k
Copy link
Contributor

yuval-k commented Jun 20, 2016

this was fixed in upstream go.
see:
golang/go#13143

Specifically, these are the relevant commits:

@eyberg
Copy link
Member

eyberg commented Jun 20, 2016

we'd definitely like to track upstream closer - if you want to pull down latest and attempt to merge that'd be great! better yet - would be getting to 1.6 or 1.7!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants