py-R-FCN-MultiGPU安装

2017-07-29

下载R-FCN

master

git clone --recursive https://github.com/bharatsingh430/py-R-FCN-multiGPU/

coco branch

git clone --recursive https://github.com/bharatsingh430/py-R-FCN-multiGPU/

编译Cython模块

cd py-R-FCN-MultiGPU-coco/lib
make

安装依赖

cython, python-opencv, easydict

修改Makefile.config

# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1
USE_NCCL := 1

安装NCCL

#Nvidia's NCCL library which is used for multi-GPU training
#nccl.hpp:5:18: fatal error: nccl.h: No such file or directory    # 在多个 GPU 上运行 Caffe 需要使用 NVIDIA NCCL
git clone https://github.com/NVIDIA/nccl.git
cd nccl
sudo make install -j32
# NCCL 库和文件头将安装在 /usr/local/lib 和 /usr/local/include 中
sudo  ldconfig 
# 该命令不执行会出现错误： error while loading shared libraries: libnccl.so.1: cannot open shared object file: No such file or directory

Build Caffe and pycaffe

cd py-R-FCN-MultiGPU-coco/caffe
make -j8 && make pycaffe

若编译过程报错：fatal error: caffe/proto/caffe.pb.h: No such file or directory
https://github.com/NVIDIA/DIGITS/issues/105

可通过如下方式将编译通过（首先需要进入 caffe 根目录）：

protoc src/caffe/proto/caffe.proto --cpp_out=.
sudo mkdir include/caffe/proto
sudo mv src/caffe/proto/caffe.pb.h include/caffe/proto

尝试

下载训练好的模型https://1drv.ms/u/s!AoN7vygOjLIQqUWHpY67oaC7mopf

解压并拷贝到目录：

py-R-FCN-MultiGPU-coco/data/rfcn_models/resnet50_rfcn_final.caffemodel
py-R-FCN-MultiGPU-coco/data/rfcn_models/resnet101_rfcn_final.caffemodel

运行demo

./tools/demo_rfcn.py

demo报错：

[libprotobuf ERROR google/protobuf/message_lite.cc:123] 
Can't parse message of type "caffe.NetParameter" 
because it is missing required fields: 
layer[494].psroi_pooling_param.output_dim, 
layer[494].psroi_pooling_param.group_size

可能的解决方法：

zhanghaoinf commented on Apr 12 • edited
Hi, @liu09114 , I checked before when I used the detection model published
 by msra (models store in onedrive), the problem will occur. If you 
 initialize and train your own model with resnet-101 (classification model 
 pre-trained on ImageNet-1000), the problem will disappear. By the way, you 
 need the coco branch to train your own model.

I think the sentence "If you want to use/train this model, 
please use the coco branch of this repository. " in the readme is important. :)

训练自己的模型

1.数据集

数据集拷贝到$RFCN_ROOT/data下，此处只有VOC2007的数据：

VOCdevkit2007  #link
VOCdevkit0712  #link                     
VOCdevkit/VOC2007/                  
VOCdevkit/VOC0712/   #作者是用VOC2007和VOC2012训练的，所以文件夹名字带0712

2.准备预训练模型

deep-residual-networks: https://github.com/KaimingHe/deep-residual-networks

OneDrive download: https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777

然后将caffemodel放在./data/imagenet_models文件夹下。

3.修改网络

打开./models/pascal_voc/ResNet-50/rfcn_end2end (以end2end为例)

PS：下面的cls_num指的是你数据集的类别数+1（背景）。比如我有15类，+1类背景，cls_num=16.

<1>修改class-aware/train_ohem.prototxt

layer {  
  name: 'input-data'  
  type: 'Python'  
  top: 'data'  
  top: 'im_info'  
  top: 'gt_boxes'  
  python_param {  
    module: 'roi_data_layer.layer'  
    layer: 'RoIDataLayer'  
    param_str: "'num_classes': 16" #cls_num  
  }  
}

layer {  
  name: 'roi-data'  
  type: 'Python'  
  bottom: 'rpn_rois'  
  bottom: 'gt_boxes'  
  top: 'rois'  
  top: 'labels'  
  top: 'bbox_targets'  
  top: 'bbox_inside_weights'  
  top: 'bbox_outside_weights'  
  python_param {  
    module: 'rpn.proposal_target_layer'  
    layer: 'ProposalTargetLayer'  
    param_str: "'num_classes': 16" #cls_num  
  }  
}

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_cls"  
    name: "rfcn_cls"  
    type: "Convolution"  
    convolution_param {  
        num_output: 784 #cls_num*(score_maps_size^2)  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_bbox"  
    name: "rfcn_bbox"  
    type: "Convolution"  
    convolution_param {  
        num_output: 3136 #4*cls_num*(score_maps_size^2)  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "rfcn_cls"  
    bottom: "rois"  
    top: "psroipooled_cls_rois"  
    name: "psroipooled_cls_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 16  #cls_num  
        group_size: 7  
    }  
}

layer {  
    bottom: "rfcn_bbox"  
    bottom: "rois"  
    top: "psroipooled_loc_rois"  
    name: "psroipooled_loc_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 64 #4*cls_num  
        group_size: 7  
    }  
}

<2>修改class-aware/test.prototxt

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_cls"  
    name: "rfcn_cls"  
    type: "Convolution"  
    convolution_param {  
        num_output: 784 #cls_num*(score_maps_size^2)  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_bbox"  
    name: "rfcn_bbox"  
    type: "Convolution"  
    convolution_param {  
        num_output: 3136 #4*cls_num*(score_maps_size^2)  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "rfcn_cls"  
    bottom: "rois"  
    top: "psroipooled_cls_rois"  
    name: "psroipooled_cls_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 16  #cls_num  
        group_size: 7  
    }  
}

layer {  
    bottom: "rfcn_bbox"  
    bottom: "rois"  
    top: "psroipooled_loc_rois"  
    name: "psroipooled_loc_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 64  #4*cls_num  
        group_size: 7  
    }  
}

layer {  
    name: "cls_prob_reshape"  
    type: "Reshape"  
    bottom: "cls_prob_pre"  
    top: "cls_prob"  
    reshape_param {  
        shape {  
            dim: -1  
            dim: 16  #cls_num  
        }  
    }  
}

layer {  
    name: "bbox_pred_reshape"  
    type: "Reshape"  
    bottom: "bbox_pred_pre"  
    top: "bbox_pred"  
    reshape_param {  
        shape {  
            dim: -1  
            dim: 64  #4*cls_num  
        }  
    }  
}

<3>修改train_agnostic.prototxt

layer {  
  name: 'input-data'  
  type: 'Python'  
  top: 'data'  
  top: 'im_info'  
  top: 'gt_boxes'  
  python_param {  
    module: 'roi_data_layer.layer'  
    layer: 'RoIDataLayer'  
    param_str: "'num_classes': 16"  #cls_num  
  }  
}

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_cls"  
    name: "rfcn_cls"  
    type: "Convolution"  
    convolution_param {  
        num_output: 784 #cls_num*(score_maps_size^2)   ###  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "rfcn_cls"  
    bottom: "rois"  
    top: "psroipooled_cls_rois"  
    name: "psroipooled_cls_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 16 #cls_num   ###  
        group_size: 7  
    }  
}

<4>修改train_agnostic_ohem.prototxt

layer {  
  name: 'input-data'  
  type: 'Python'  
  top: 'data'  
  top: 'im_info'  
  top: 'gt_boxes'  
  python_param {  
    module: 'roi_data_layer.layer'  
    layer: 'RoIDataLayer'  
    param_str: "'num_classes': 16" #cls_num ###  
  }  
}

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_cls"  
    name: "rfcn_cls"  
    type: "Convolution"  
    convolution_param {  
        num_output: 784 #cls_num*(score_maps_size^2)   ###  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "rfcn_cls"  
    bottom: "rois"  
    top: "psroipooled_cls_rois"  
    name: "psroipooled_cls_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 16 #cls_num   ###  
        group_size: 7  
    }  
}

<5>修改test_agnostic.prototxt

layer {  
    bottom: "conv_new_1"  
    top: "rfcn_cls"  
    name: "rfcn_cls"  
    type: "Convolution"  
    convolution_param {  
        num_output: 784 #cls_num*(score_maps_size^2) ###  
        kernel_size: 1  
        pad: 0  
        weight_filler {  
            type: "gaussian"  
            std: 0.01  
        }  
        bias_filler {  
            type: "constant"  
            value: 0  
        }  
    }  
    param {  
        lr_mult: 1.0  
    }  
    param {  
        lr_mult: 2.0  
    }  
}

layer {  
    bottom: "rfcn_cls"  
    bottom: "rois"  
    top: "psroipooled_cls_rois"  
    name: "psroipooled_cls_rois"  
    type: "PSROIPooling"  
    psroi_pooling_param {  
        spatial_scale: 0.0625  
        output_dim: 16 #cls_num   ###  
        group_size: 7  
    }  
}

layer {  
    name: "cls_prob_reshape"  
    type: "Reshape"  
    bottom: "cls_prob_pre"  
    top: "cls_prob"  
    reshape_param {  
        shape {  
            dim: -1  
            dim: 16 #cls_num   ###  
        }  
    }  
}

4.修改代码

<1>修改./lib/datasets/pascal_voc.py

class pascal_voc(imdb):  
    def __init__(self, image_set, year, devkit_path=None):  
        imdb.__init__(self, 'voc_' + year + '_' + image_set)  
        self._year = year  
        self._image_set = image_set  
        self._devkit_path = self._get_default_path() if devkit_path is None \  
                            else devkit_path  
        self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)  
        self._classes = ('__background__', # always index 0  
                         '你的标签1','你的标签2',你的标签3','你的标签4'  
                      )

<2>修改./lib/datasets/imdb.py

在语句assert (boxes[:, 2] >= boxes[:, 0]).all()前面添加：

for b in range(len(boxes)):
    if boxes[b][2]< boxes[b][0]:
        boxes[b][0] = 0

以避免出现AssertionError.

<3>修改./lib/fast_rcnn/train.py和train_multi_gpu.py

在开头添加：import google.protobuf.text_format，以避免因protobuf版本问题出现的AttributeError: 'module' object has no attribute 'text_format'.

5.开始训练

Multi-GPU Training R-FCN

cd py-R-FCN-multiGPU-coco

python ./tools/train_net_multi_gpu.py --gpu 0,1 ...
--solver models/pascal_voc/ResNet-50/rfcn_end2end/solver_ohem.prototxt ...
--weights data/imagenet_models/ResNet-50-model.caffemodel ...
--iters 110000 --cfg experiments/cfgs/rfcn_end2end_ohem.yml

6.测试训练好的模型

cd py-R-FCN-multiGPU-coco
./tools/demo_rfcn.py

此时如果报错：

IndexError: index 4 is out of bounds for axis 1 with size 4

说明在demo_rfcn.py执行矩形框操作时出了问题，先找到语句cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)]，将其修改为cls_boxes = boxes[:, 4:8]即可。