We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
用的是云显卡,NVIDIA GeForce RTX 4090。
环境如下: PyTorch 1.10.0 Python 3.8(ubuntu20.04) Cuda 11.3
pytorch-lightning==1.9.2 torch==1.13.1 deepspeed==0.7.0
控制台如下:
python train.py \ > --load_model "/rwkv/RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096.pth" \ > --proj_dir "/rwkv/output" \ > --data_file "/rwkv/binidx/mission_text_document" \ > --data_type binidx \ > --vocab_size 50277 \ > --ctx_len 1024 \ > --accumulate_grad_batches 8 \ > --epoch_steps 200 \ > --epoch_count 20 \ > --epoch_begin 0 \ > --epoch_save 2 \ > --micro_bsz 8 \ > --n_layer 24 \ > --n_embd 2048 \ > --pre_ffn 0 \ > --head_qk 0 \ > --lr_init 1e-5 \ > --lr_final 1e-5 \ > --warmup_steps 0 \ > --beta1 0.9 \ > --beta2 0.999 \ > --adam_eps 1e-8 \ > --accelerator gpu \ > --devices 1 \ > --precision bf16 \ > --strategy deepspeed_stage_2 \ > --grad_cp 1 \ > --lora \ > --lora_r 8 \ > --lora_alpha 32 \ > --lora_dropout 0.01 \ > --lora_parts=att,ffn,time,ln ########## work in progress ########## ############################################################################ # # RWKV-4 BF16 on 1x1 GPU, bsz 1x1x8=8, deepspeed_stage_2 with grad_cp # # Data = /rwkv/binidx/mission_text_document (binidx), ProjDir = /rwkv/output # # Epoch = 0 to 19 (will continue afterwards), save every 2 epoch # # Each "epoch" = 200 steps, 1600 samples, 1638400 tokens # # Model = 24 n_layer, 2048 n_embd, 1024 ctx_len # LoRA = enabled, 8 r, 32.0 alpha, 0.01 dropout, on att,ffn,time,ln # # Adam = lr 1e-05 to 1e-05, warmup 0 steps, beta (0.9, 0.999), eps 1e-08 # # Found torch 1.13.1+cu117, recommend 1.13.1+cu117 or newer # Found deepspeed 0.7.0, recommend 0.7.0 (faster than newer versions) # Found pytorch_lightning 1.9.2, recommend 1.9.1 or newer # ############################################################################ {'load_model': '/rwkv/RWKV-4-World-CHNtuned-1.5B-v1-20230620-ctx4096.pth', 'wandb': '', 'proj_dir': '/rwkv/output', 'random_seed': -1, 'data_file': '/rwkv/binidx/mission_text_document', 'data_type': 'binidx', 'vocab_size': 50277, 'ctx_len': 1024, 'epoch_steps': 200, 'epoch_count': 20, 'epoch_begin': 0, 'epoch_save': 2, 'micro_bsz': 8, 'n_layer': 24, 'n_embd': 2048, 'dim_att': 2048, 'dim_ffn': 8192, 'pre_ffn': 0, 'head_qk': 0, 'tiny_att_dim': 0, 'tiny_att_layer': -999, 'lr_init': 1e-05, 'lr_final': 1e-05, 'warmup_steps': 0, 'beta1': 0.9, 'beta2': 0.999, 'adam_eps': 1e-08, 'grad_cp': 1, 'my_pile_stage': 0, 'my_pile_shift': -1, 'my_pile_edecay': 0, 'layerwise_lr': 1, 'ds_bucket_mb': 200, 'my_img_version': 0, 'my_img_size': 0, 'my_img_bit': 0, 'my_img_clip': 'x', 'my_img_clip_scale': 1, 'my_img_l1_scale': 0, 'my_img_encoder': 'x', 'my_sample_len': 0, 'my_ffn_shift': 1, 'my_att_shift': 1, 'my_pos_emb': 0, 'load_partial': 0, 'magic_prime': 0, 'my_qa_mask': 0, 'my_testing': '', 'lora': True, 'lora_load': '', 'lora_r': 8, 'lora_alpha': 32.0, 'lora_dropout': 0.01, 'lora_parts': 'att,ffn,time,ln', 'logger': False, 'enable_checkpointing': False, 'default_root_dir': None, 'gradient_clip_val': 1.0, 'gradient_clip_algorithm': None, 'num_nodes': 1, 'num_processes': None, 'devices': '1', 'gpus': None, 'auto_select_gpus': None, 'tpu_cores': None, 'ipus': None, 'enable_progress_bar': True, 'overfit_batches': 0.0, 'track_grad_norm': -1, 'check_val_every_n_epoch': 100000000000000000000, 'fast_dev_run': False, 'accumulate_grad_batches': 8, 'max_epochs': -1, 'min_epochs': None, 'max_steps': -1, 'min_steps': None, 'max_time': None, 'limit_train_batches': None, 'limit_val_batches': None, 'limit_test_batches': None, 'limit_predict_batches': None, 'val_check_interval': None, 'log_every_n_steps': 100000000000000000000, 'accelerator': 'gpu', 'strategy': 'deepspeed_stage_2', 'sync_batchnorm': False, 'precision': 'bf16', 'enable_model_summary': True, 'num_sanity_val_steps': 0, 'resume_from_checkpoint': None, 'profiler': None, 'benchmark': None, 'reload_dataloaders_every_n_epochs': 0, 'auto_lr_find': False, 'replace_sampler_ddp': False, 'detect_anomaly': False, 'auto_scale_batch_size': False, 'plugins': None, 'amp_backend': None, 'amp_level': None, 'move_metrics_to_cpu': False, 'multiple_trainloader_mode': 'max_size_cycle', 'inference_mode': True, 'my_timestamp': '2023-10-30-18-23-23', 'betas': (0.9, 0.999), 'real_bsz': 8, 'run_name': '50277 ctx1024 L24 D2048'} !!!!! LoRA Warning: Gradient Checkpointing requires JIT off, disabling it RWKV_MY_TESTING Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/wkv_1024_bf16/build.ninja... Building extension module wkv_1024_bf16... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv_1024_bf16... Current vocab size = 50277 (make sure it's correct) Traceback (most recent call last): File "train.py", line 284, in <module> train_data = MyDataset(args) File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/dataset.py", line 31, in __init__ self.data = MMapIndexedDataset(args.data_file) File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 179, in __init__ self._do_init(path, skip_warmup) File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 189, in _do_init self._index = self.Index(index_file_path(self._path), skip_warmup) File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 105, in __init__ with open(path, "rb") as stream: FileNotFoundError: [Errno 2] No such file or directory: '/rwkv/binidx/mission_text_document.idx' Exception ignored in: <function MMapIndexedDataset.Index.__del__ at 0x7fc7c90c4790> Traceback (most recent call last): File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 150, in __del__ self._bin_buffer_mmap._mmap.close() AttributeError: 'Index' object has no attribute '_bin_buffer_mmap' Exception ignored in: <function MMapIndexedDataset.__del__ at 0x7fc7c90c4d30> Traceback (most recent call last): File "/root/rwkv/RWKV-LM-LoRA/RWKV-v4neo/src/binidx.py", line 202, in __del__ self._bin_buffer_mmap._mmap.close() AttributeError: 'MMapIndexedDataset' object has no attribute '_bin_buffer_mmap'
The text was updated successfully, but these errors were encountered:
No branches or pull requests
用的是云显卡,NVIDIA GeForce RTX 4090。
环境如下:
PyTorch 1.10.0
Python 3.8(ubuntu20.04)
Cuda 11.3
pytorch-lightning==1.9.2
torch==1.13.1
deepspeed==0.7.0
控制台如下:
The text was updated successfully, but these errors were encountered: