My implementation of BiSeNetV1 and BiSeNetV2.
mIOUs and fps on cityscapes val set:
none | ss | ssc | msf | mscf | fps(fp16/fp32) | link |
---|---|---|---|---|---|---|
bisenetv1 | 75.44 | 76.94 | 77.45 | 78.86 | 68/23 | download |
bisenetv2 | 74.95 | 75.58 | 76.53 | 77.08 | 59/21 | download |
mIOUs on cocostuff val2017 set:
none | ss | ssc | msf | mscf | link |
---|---|---|---|---|---|
bisenetv1 | 31.49 | 31.42 | 32.46 | 32.55 | download |
bisenetv2 | 30.49 | 30.55 | 31.81 | 31.73 | download |
Tips:
-
ss means single scale evaluation, ssc means single scale crop evaluation, msf means multi-scale evaluation with flip augment, and mscf means multi-scale crop evaluation with flip evaluation. The eval scales and crop size of multi-scales evaluation can be found in configs.
-
The fps is tested in different way from the paper. For more information, please see here.
-
The authors of bisenetv2 used cocostuff-10k, while I used cocostuff-123k(do not know how to say, just same 118k train and 5k val images as object detection). Thus the results maybe different from paper.
-
The model has a big variance, which means that the results of training for many times would vary within a relatively big margin. For example, if you train bisenetv2 for many times, you will observe that the result of ss evaluation of bisenetv2 varies between 73.1-75.1.
-
tensorrt
You can go to tensorrt for details. -
ncnn
You can go to ncnn for details. -
openvino
You can go to openvino for details. -
tis
Triton Inference Server(TIS) provides a service solution of deployment. You can go to tis for details.
My platform is like this:
- ubuntu 18.04
- nvidia Tesla T4 gpu, driver 450.51.05
- cuda 10.2
- cudnn 7
- miniconda python 3.8.8
- pytorch 1.8.1
With a pretrained weight, you can run inference on an single image like this:
$ python tools/demo.py --config configs/bisenetv2_city.py --weight-path /path/to/your/weights.pth --img-path ./example.png
This would run inference on the image and save the result image to ./res.jpg
.
Or you can run inference on a video like this:
$ python tools/demo_video.py --config configs/bisenetv2_coco.py --weight-path res/model_final.pth --input ./video.mp4 --output res.mp4
This would generate segmentation file as res.mp4
. If you want to read from camera, you can set --input camera_id
rather than input ./video.mp4
.
1.cityscapes
Register and download the dataset from the official website. Then decompress them into the datasets/cityscapes
directory:
$ mv /path/to/leftImg8bit_trainvaltest.zip datasets/cityscapes
$ mv /path/to/gtFine_trainvaltest.zip datasets/cityscapes
$ cd datasets/cityscapes
$ unzip leftImg8bit_trainvaltest.zip
$ unzip gtFine_trainvaltest.zip
2.cocostuff
Download train2017.zip
, val2017.zip
and stuffthingmaps_trainval2017.zip
split from official website. Then do as following:
$ unzip train2017.zip
$ unzip val2017.zip
$ mv train2017/ /path/to/BiSeNet/datasets/coco/images
$ mv val2017/ /path/to/BiSeNet/datasets/coco/images
$ unzip stuffthingmaps_trainval2017.zip
$ mv train2017/ /path/to/BiSeNet/datasets/coco/labels
$ mv val2017/ /path/to/BiSeNet/datasets/coco/labels
$ cd /path/to/BiSeNet
$ python tools/gen_coco_annos.py
3.custom dataset
If you want to train on your own dataset, you should generate annotation files first with the format like this:
munster_000002_000019_leftImg8bit.png,munster_000002_000019_gtFine_labelIds.png
frankfurt_000001_079206_leftImg8bit.png,frankfurt_000001_079206_gtFine_labelIds.png
...
Each line is a pair of training sample and ground truth image path, which are separated by a single comma ,
.
Then you need to change the field of im_root
and train/val_im_anns
in the configuration files. If you found what shows in cityscapes_cv2.py
is not clear, you can also see coco.py
.
I used the following command to train the models:
# bisenetv1 cityscapes
export CUDA_VISIBLE_DEVICES=0,1
cfg_file=configs/bisenetv1_city.py
NGPUS=2
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
# bisenetv2 cityscapes
export CUDA_VISIBLE_DEVICES=0,1
cfg_file=configs/bisenetv2_city.py
NGPUS=2
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
# bisenetv1 cocostuff
export CUDA_VISIBLE_DEVICES=0,1,2,3
cfg_file=configs/bisenetv1_coco.py
NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
# bisenetv2 cocostuff
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
cfg_file=configs/bisenetv2_coco.py
NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
Note:
- though
bisenetv2
has fewer flops, it requires much more training iterations. The the training time ofbisenetv1
is shorter. - I used overall batch size of 16 to train all models. Since cocostuff has 171 categories, it requires more memory to train models on it. I split the 16 images into more gpus than 2, as I do with cityscapes.
You can also load the trained model weights and finetune from it, like this:
$ export CUDA_VISIBLE_DEVICES=0,1
$ python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --finetune-from ./res/model_final.pth --config ./configs/bisenetv2_city.py # or bisenetv1
You can also evaluate a trained model like this:
$ python tools/evaluate.py --config configs/bisenetv1_city.py --weight-path /path/to/your/weight.pth