Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错:AttributeError: 'MMapIndexedDataset' object has no attribute '_bin_buffer_mmap' #52

Open
Macaron-Lawrence opened this issue Oct 30, 2023 · 0 comments

Comments

@Macaron-Lawrence
Copy link

Macaron-Lawrence commented Oct 30, 2023

用的是云显卡,NVIDIA GeForce RTX 4090。

环境如下:
PyTorch 1.10.0
Python 3.8(ubuntu20.04)
Cuda 11.3

pytorch-lightning==1.9.2
torch==1.13.1
deepspeed==0.7.0

控制台如下:

python train.py \
>     --load_model "/rwkv/RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096.pth" \
>     --proj_dir "/rwkv/output" \
>     --data_file "/rwkv/binidx/mission_text_document" \
>     --data_type binidx \
>     --vocab_size 50277 \
>     --ctx_len 1024 \
>     --accumulate_grad_batches 8 \
>     --epoch_steps 200 \
>     --epoch_count 20 \
>     --epoch_begin 0 \
>     --epoch_save 2 \
>     --micro_bsz 8 \
>     --n_layer 24 \
>     --n_embd 2048 \
>     --pre_ffn 0 \
>     --head_qk 0 \
>     --lr_init 1e-5 \
>     --lr_final 1e-5 \
>     --warmup_steps 0 \
>     --beta1 0.9 \
>     --beta2 0.999 \
>     --adam_eps 1e-8 \
>     --accelerator gpu \
>     --devices 1 \
>     --precision bf16 \
>     --strategy deepspeed_stage_2 \
>     --grad_cp 1 \
>     --lora \
>     --lora_r 8 \
>     --lora_alpha 32 \
>     --lora_dropout 0.01 \
>     --lora_parts=att,ffn,time,ln
########## work in progress ##########

############################################################################
#
# RWKV-4 BF16 on 1x1 GPU, bsz 1x1x8=8, deepspeed_stage_2 with grad_cp
#
# Data = /rwkv/binidx/mission_text_document (binidx), ProjDir = /rwkv/output
#
# Epoch = 0 to 19 (will continue afterwards), save every 2 epoch
#
# Each "epoch" = 200 steps, 1600 samples, 1638400 tokens
#
# Model = 24 n_layer, 2048 n_embd, 1024 ctx_len
# LoRA = enabled, 8 r, 32.0 alpha, 0.01 dropout, on att,ffn,time,ln
#
# Adam = lr 1e-05 to 1e-05, warmup 0 steps, beta (0.9, 0.999), eps 1e-08
#
# Found torch 1.13.1+cu117, recommend 1.13.1+cu117 or newer
# Found deepspeed 0.7.0, recommend 0.7.0 (faster than newer versions)
# Found pytorch_lightning 1.9.2, recommend 1.9.1 or newer
#
############################################################################

{'load_model': '/rwkv/RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096.pth', 'wandb': '', 'proj_dir': '/rwkv/output', 'random_seed': -1, 'data_file': '/rwkv/binidx/mission_text_document', 'data_type': 'binidx', 'vocab_size': 50277, 'ctx_len': 1024, 'epoch_steps': 200, 'epoch_count': 20, 'epoch_begin': 0, 'epoch_save': 2, 'micro_bsz': 8, 'n_layer': 24, 'n_embd': 2048, 'dim_att': 2048, 'dim_ffn': 8192, 'pre_ffn': 0, 'head_qk': 0, 'tiny_att_dim': 0, 'tiny_att_layer': -999, 'lr_init': 1e-05, 'lr_final': 1e-05, 'warmup_steps': 0, 'beta1': 0.9, 'beta2': 0.999, 'adam_eps': 1e-08, 'grad_cp': 1, 'my_pile_stage': 0, 'my_pile_shift': -1, 'my_pile_edecay': 0, 'layerwise_lr': 1, 'ds_bucket_mb': 200, 'my_img_version': 0, 'my_img_size': 0, 'my_img_bit': 0, 'my_img_clip': 'x', 'my_img_clip_scale': 1, 'my_img_l1_scale': 0, 'my_img_encoder': 'x', 'my_sample_len': 0, 'my_ffn_shift': 1, 'my_att_shift': 1, 'my_pos_emb': 0, 'load_partial': 0, 'magic_prime': 0, 'my_qa_mask': 0, 'my_testing': '', 'lora': True, 'lora_load': '', 'lora_r': 8, 'lora_alpha': 32.0, 'lora_dropout': 0.01, 'lora_parts': 'att,ffn,time,ln', 'logger': False, 'enable_checkpointing': False, 'default_root_dir': None, 'gradient_clip_val': 1.0, 'gradient_clip_algorithm': None, 'num_nodes': 1, 'num_processes': None, 'devices': '1', 'gpus': None, 'auto_select_gpus': None, 'tpu_cores': None, 'ipus': None, 'enable_progress_bar': True, 'overfit_batches': 0.0, 'track_grad_norm': -1, 'check_val_every_n_epoch': 100000000000000000000, 'fast_dev_run': False, 'accumulate_grad_batches': 8, 'max_epochs': -1, 'min_epochs': None, 'max_steps': -1, 'min_steps': None, 'max_time': None, 'limit_train_batches': None, 'limit_val_batches': None, 'limit_test_batches': None, 'limit_predict_batches': None, 'val_check_interval': None, 'log_every_n_steps': 100000000000000000000, 'accelerator': 'gpu', 'strategy': 'deepspeed_stage_2', 'sync_batchnorm': False, 'precision': 'bf16', 'enable_model_summary': True, 'num_sanity_val_steps': 0, 'resume_from_checkpoint': None, 'profiler': None, 'benchmark': None, 'reload_dataloaders_every_n_epochs': 0, 'auto_lr_find': False, 'replace_sampler_ddp': False, 'detect_anomaly': False, 'auto_scale_batch_size': False, 'plugins': None, 'amp_backend': None, 'amp_level': None, 'move_metrics_to_cpu': False, 'multiple_trainloader_mode': 'max_size_cycle', 'inference_mode': True, 'my_timestamp': '2023-10-30-18-23-23', 'betas': (0.9, 0.999), 'real_bsz': 8, 'run_name': '50277 ctx1024 L24 D2048'}

!!!!! LoRA Warning: Gradient Checkpointing requires JIT off, disabling it
RWKV_MY_TESTING 
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/wkv_1024_bf16/build.ninja...
Building extension module wkv_1024_bf16...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module wkv_1024_bf16...
Current vocab size = 50277 (make sure it's correct)
Traceback (most recent call last):
  File "train.py", line 284, in <module>
    train_data = MyDataset(args)
  File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/dataset.py", line 31, in __init__
    self.data = MMapIndexedDataset(args.data_file)
  File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 179, in __init__
    self._do_init(path, skip_warmup)
  File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 189, in _do_init
    self._index = self.Index(index_file_path(self._path), skip_warmup)
  File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 105, in __init__
    with open(path, "rb") as stream:
FileNotFoundError: [Errno 2] No such file or directory: '/rwkv/binidx/mission_text_document.idx'
Exception ignored in: <function MMapIndexedDataset.Index.__del__ at 0x7fc7c90c4790>
Traceback (most recent call last):
  File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 150, in __del__
    self._bin_buffer_mmap._mmap.close()
AttributeError: 'Index' object has no attribute '_bin_buffer_mmap'
Exception ignored in: <function MMapIndexedDataset.__del__ at 0x7fc7c90c4d30>
Traceback (most recent call last):
  File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 202, in __del__
    self._bin_buffer_mmap._mmap.close()
AttributeError: 'MMapIndexedDataset' object has no attribute '_bin_buffer_mmap'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant