'THCudaCheck FAIL' Using Cuda7.5 Docker Image #1

spadavec · 2016-07-07T03:04:14Z

After installing the NVIDIA docker image, and loading the Torch RNN docker via:

nvidia-docker run --rm -ti crisbal/torch-rnn:cuda7.5 bash

and preprocessing via

root@3da15ad69af8:~/torch-rnn# python scripts/preprocess.py --input_txt data/library.txt --output_h5 data/library.h5 --output_json data/library.json

Attempting to train the system results in the following:

root@3da15ad69af8:~/torch-rnn# th train.lua -input_h5 data/library.h5 -input_json data/library.json
Running with CUDA on GPU 0
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9234/cutorch/lib/THC/THCGeneral.c line=608 error=8 : invalid device function
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
./LSTM.lua:128: cuda runtime error (8) : invalid device function at /tmp/luarocks_cutorch-scm-1-9234/cutorch/lib/THC/THCGeneral.c:608
stack traceback:
[C]: in function 'resize'
./LSTM.lua:128: in function <./LSTM.lua:118>
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:130: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
train.lua:187: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:130: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
train.lua:187: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

The text was updated successfully, but these errors were encountered:

crisbal · 2016-07-08T06:45:23Z

I think this is an issue that one needs to report to the main torch-rnn repo (https://github.com/jcjohnson/torch-rnn) and not on this one.

First of all, are you for sure running a CUDA video card?
If yes, let's try something, what happens if you run nvidia-smi inside the container?
Does it show any relevant info?

spadavec · 2016-07-11T19:18:45Z

@crisbal thanks for the heads up--i will post this to the torch-rnn repo instead. For what its worth, i do have a gpu installed:

root@9be35619d034:~/torch-rnn# nvidia-smi
Mon Jul 11 19:17:26 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 On | N/A |
| 28% 41C P8 7W / 180W | 725MiB / 8113MiB | 1% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

crisbal · 2016-07-11T19:21:41Z

Let me know if in the end it is my fault or their :)

One random thought I had: since you have a 1080 maybe it uses some new kind of CUDA that maybe it is not well supported by either nvidia-docker or torch.

spadavec · 2016-07-12T04:50:44Z

@crisbal it looks like the issue is that a newer version of CUDA is needed:

jcjohnson/torch-rnn#122

Did you have any plans to make a CUDA8 version of the docker? Thanks for all the work you've done!

crisbal · 2016-07-12T06:42:12Z

As soon as I get my hands on a Cuda machine and on fast Internet I will.
Sorry I can't do it ASAP.

On Tue, Jul 12, 2016, 06:50 spadavec [email protected] wrote:

@crisbal https://github.com/crisbal it looks like the issue is that a
newer version of CUDA is needed:

jcjohnson/torch-rnn#122
jcjohnson/torch-rnn#122

Did you have any plans to make a CUDA8 version of the docker? Thanks for
all the work you've done!

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACmgZhqgcoWja9U4O3BL8Clff0Bd7u2iks5qUx0kgaJpZM4JGsle
.

xoryouyou · 2017-02-17T17:19:57Z

@spadavec I had the same issue and build this today https://hub.docker.com/r/xoryouyou/torch-rnn/

HandsomeDevilv112 · 2017-04-18T06:48:13Z

I got this error today as I'm using a 1080 and have cuda 8 installed.
@xoryouyou, I tried the command on the page you posted, but I'm getting an error
docker pull xoryouyou/torch-rnn Using default tag: latest Error response from daemon: manifest for xoryouyou/torch-rnn:latest not found

xoryouyou · 2017-04-18T08:06:49Z

@HandsomeDevilv112 yeah the images it was only tagged as 1.0 and not latest I updated it.

HandsomeDevilv112 · 2017-04-20T05:06:19Z

@xoryouyou: Cool! Much obliged. That seems to have done the trick. My apologies if there was a way for me to fix that myself and I just didn't catch it.

valentinvieriu · 2017-08-23T07:48:20Z

@xoryouyou Do you think you can share the Docker file also? I want to have a look on how you build your image.
I'm trying to use https://github.com/crisbal/docker-torch-rnn/blob/master/CUDA/8.0/Dockerfile but it does not compile
It fails at this section:

RUN git clone https://github.com/jcjohnson/torch-rnn && \
    pip install -r torch-rnn/requirements.txt

xoryouyou · 2017-08-23T07:55:15Z

@valentinvieriu sorry I currently don't have access to that machine where I build the torch-rnn but i'll see if I can recreate your issue.

valentinvieriu · 2017-08-23T08:05:35Z

This is the issue that pops out:
'''
copying h5py/tests/hl/test_file.py -> build/lib.linux-x86_64-2.7/h5py/tests/hl
running build_ext
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-MyYa9y/h5py/setup.py", line 140, in
cmdclass = CMDCLASS,
File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/dist-packages/wheel/bdist_wheel.py", line 179, in run
self.run_command('build')
File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/distutils/command/build.py", line 128, in run
self.run_command(cmd_name)
File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/tmp/pip-build-MyYa9y/h5py/setup_build.py", line 140, in run
from Cython.Build import cythonize
ImportError: No module named Cython.Build
'''
This is by uisng https://github.com/crisbal/docker-torch-rnn/blob/master/CUDA/8.0/Dockerfile

as said it fails at the:

RUN git clone https://github.com/jcjohnson/torch-rnn && \
    pip install -r torch-rnn/requirements.txt

section

Any help is appreciated. I'm not very familiar with the dependencies, I plan only to use this as a tool.

Thank you @xoryouyou

valentinvieriu · 2017-08-23T08:41:04Z

Ok for future references, this fixed the building issue on ubuntu 16.04
replace

RUN git clone https://github.com/jcjohnson/torch-rnn && \
    pip install -r torch-rnn/requirements.txt

from https://github.com/crisbal/docker-torch-rnn/blob/master/CUDA/8.0/Dockerfile
with

#torch-rnn and python requirements
# we use https://github.com/jcjohnson/torch-rnn/blob/master/requirements.txt as a quideline
WORKDIR /root
RUN apt-get install -y cython
RUN pip install --upgrade pip
RUN pip install Cython==0.23.4
RUN pip install numpy==1.10.4
RUN pip install argparse==1.2.1
RUN HDF5_DIR=/usr/lib/x86_64-linux-gnu/hdf5/serial/ pip install h5py==2.5.0
RUN pip install six==1.10.0
RUN git clone https://github.com/jcjohnson/torch-rnn

I will work on a Docker image and share it with the rest when it's finished

xoryouyou · 2017-08-23T08:42:50Z

@valentinvieriu I am currently building with the crisbal/docker-torch-rnn image on arch and it looks to build fine. Will report when done.

xoryouyou · 2017-08-23T09:21:42Z

Build on Linux 4.12.8-2-ARCH #1 SMP PREEMPT Fri Aug 18 14:08:02 UTC 2017 x86_64 GNU/Linux with Docker version 17.06.0-ce, build 3dfb8343
build_log.txt

This was referenced Apr 21, 2018

Add Dockerfile for CUDA 9.1 #13

Merged

fix building in CUDA 7.5 Dockerfile #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'THCudaCheck FAIL' Using Cuda7.5 Docker Image #1

'THCudaCheck FAIL' Using Cuda7.5 Docker Image #1

spadavec commented Jul 7, 2016

crisbal commented Jul 8, 2016

spadavec commented Jul 11, 2016

crisbal commented Jul 11, 2016 •

edited

Loading

spadavec commented Jul 12, 2016

crisbal commented Jul 12, 2016

xoryouyou commented Feb 17, 2017

HandsomeDevilv112 commented Apr 18, 2017

xoryouyou commented Apr 18, 2017

HandsomeDevilv112 commented Apr 20, 2017

valentinvieriu commented Aug 23, 2017

xoryouyou commented Aug 23, 2017

valentinvieriu commented Aug 23, 2017

valentinvieriu commented Aug 23, 2017

xoryouyou commented Aug 23, 2017

xoryouyou commented Aug 23, 2017

'THCudaCheck FAIL' Using Cuda7.5 Docker Image #1

'THCudaCheck FAIL' Using Cuda7.5 Docker Image #1

Comments

spadavec commented Jul 7, 2016

crisbal commented Jul 8, 2016

spadavec commented Jul 11, 2016

crisbal commented Jul 11, 2016 • edited Loading

spadavec commented Jul 12, 2016

crisbal commented Jul 12, 2016

xoryouyou commented Feb 17, 2017

HandsomeDevilv112 commented Apr 18, 2017

xoryouyou commented Apr 18, 2017

HandsomeDevilv112 commented Apr 20, 2017

valentinvieriu commented Aug 23, 2017

xoryouyou commented Aug 23, 2017

valentinvieriu commented Aug 23, 2017

valentinvieriu commented Aug 23, 2017

xoryouyou commented Aug 23, 2017

xoryouyou commented Aug 23, 2017

crisbal commented Jul 11, 2016 •

edited

Loading