1.安装Ubuntu16.04 LTS x64
利用工具rufus制作USB系统盘(官方下载64位版本: ubuntu-16.04-desktop-amd64.iso).
语言选择English,安装开始:1.不选安装第三方软件;2.安装类型选择“其他选项(something else)”;3.设置分区,多硬盘挂载,如挂载到/data,/data2…;开始执行安装直到提示重新启动。
2.更新源
cd /etc/apt/
sudo cp sources.list sources.list.bak
sudo gedit sources.list
在sources.list文件头部添加如下源:
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
然后更新源和安装的包:
sudo apt-get update
sudo apt-get upgrade
常用软件安装:
sudo apt-get install vim #编辑
sudo apt-get install htop #查看cpu和内存占用情况
sudo apt-get install python-pip
3.配置静态IP
首先查看本机的网卡名称
ifconfig
配置静态ip地址
sudo vim /etc/network/interfaces
#在打开的interfaces文件中添加如下信息:
auto eth0 #eth0对应你的网卡名称,在ifconfig中查看
iface eth0 inet static
address 192.168.1.100
netmask 255.255.255.0
gateway 192.168.1.1
dns-nameserver 114.114.114.114
配置DNS
sudo vim /etc/resolv.conf
#添加如下信息:
nameserver 114.114.114.114
sudo vim /etc/resolvconf/resolv.conf.d/base
#添加如下信息:
nameserver 114.114.114.114
重启网卡服务
sudo /etc/init.d/networking restart
#重启检验是否设置成功
sudo reboot
4.配置SSH和SFTP
SSH安装命令:
sudo apt-get install openssh-server
ssh-server配置文件位于/etc/ssh/sshd_config
,在这里可以定义SSH的服务端口,默认端口是22。
#若更改端口后请重启SSH服务:
sudo /etc/init.d/ssh resart
Ubuntu或Mac客户端可在命令行中执行如下语句来使用ssh:
ssh username@192.168.1.100
sftp安装:
sudo apt-get install openssh-sftp-server
Ubuntu客户端可在文件管理器中选择“connect to server”,然后输入:
sftp://192.168.1.100
即可查看到username所在的home文件夹下的内容。
5.安装NVIDIA显卡驱动
此处由于NVIDIA驱动和Ubuntu桌面冲突的问题(如循环卡在登录界面)。这里我们的VGA显示器默认接在主板的集显上,而不是接在NVIDIA显卡上,所以我们不采用ppa的显卡安装方式,而是采用独立的显卡驱动安装方式,关键之处在于不勾选OpenGL即可。
首先到NVIDIA官网下载官方驱动:http://www.nvidia.cn/Download/index.aspx?lang=cn,其中Titan XP属于GeForce 10 series系列。下载驱动:NVIDIA-Linux-x86_64-375.66.run
安装前准备:
卸载原有nvidia驱动,若采用的是apt-get
安装方式
sudo apt-get purge nvidia*
或者采用--uninstall
的方式卸载,按提示操作
sudo sh NVIDIA-Linux-x86_64-375.66.run --uninstall
禁用nouveau
sudo vim /etc/modprobe.d/blacklist.conf
在打开的文件的最后加入nouveau黑名单,禁用第三方驱动
blacklist nouveau
然后执行
sudo update-initramfs -u
再执行如下语句,没有输出即说明已屏蔽成功
lsmod | grep nouveau
开始安装驱动
首先关闭X服务:
sudo service lightdm stop
若在本机则要进入Ctrl-Alt+F1
命令行界面
若在远程主机则在ssh中执行即可,前提是要关闭x服务。
开始:
sudo apt-get install build-essential pkg-config xserver-xorg-dev linux-headers-`uname -r`
sudo chmod a+x NVIDIA-Linux-x86_64-375.66.run
sudo sh NVIDIA-Linux-x86_64-375.66.run -no-opengl-files
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev
sudo reboot
其中参数(后面两个参数不加):
–no-opengl-files #只安装驱动文件,不安装OpenGL文件。这个参数最重要
–no-x-check #安装驱动时不检查X服务
–no-nouveau-check #安装驱动时不检查nouveau
若安装过程中报关于kernel-source的错误:
ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.
请务必执行如下语句:
sudo apt-get install linux-headers-`uname -r`
若出现警告说:
/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 is not a symbolic link
可能是由于libEGL.lib存在多个版本的冲突,解决方法:
sudo mv /usr/lib/nvidia-375/libEGL.so.1 /usr/lib/nvidia-375/libEGL.so.1.org
sudo mv /usr/lib32/nvidia-375/libEGL.so.1 /usr/lib32/nvidia-375/libEGL.so.1.org
sudo ln -s /usr/lib/nvidia-375/libEGL.so.375.66 /usr/lib/nvidia-375/libEGL.so.1
sudo ln -s /usr/lib32/nvidia-375/libEGL.so.375.66 /usr/lib32/nvidia-375/libEGL.so.1
重启后若还是循环卡在登录界面,则要卸载到驱动,重新安装,在安装过程中务必不安装驱动提示的x-config的选项:
|
|
如果出现无法进入桌面的问题,这是因为驱动修改了xorg的配置,可执行一下命令:
cd /usr/share/X11/xorg.conf.d/
sudo mv nvidia-drm-outputclass.conf nvidia-drm-outputclass.conf.bak
若进入到界面后发现分辨率问题:启动到界面之后发现分辨率只有600x480,而显示器适合1920x1080,采用xrandr并修改xorg.conf来解决:
sudo gedit /etc/X11/xorg.conf
修改如下:
HorizSync 31.0 - 84.0
VertRefresh 56.0-77.0
即最终的xorg.conf文件部分内容为:
Section "Device"
Identifier "Configured Video Device"
EndSection
Section "Monitor"
Identifier "Configured Monitor"
Horizsync 30-84
Vertrefresh 56-77
EndSection
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
Device "Configured Video Device"
SubSection "Display"
Modes "1920x1080" "1360x768" "1024x768" "1152x864"
EndSubSection
EndSection
或者采用cvt xrand方法修改分辨率:
cvt 1920 1080
# 1920x1080 59.96 Hz (CVT 2.07M9) hsync: 67.16 kHz; 173.00 MHZ
# Modeline "1920x1080_60.00" 173.00 1920 2048 2248 2576 1080 1083 1088 1120 -hsync +vsync
xrandr --newmode "1920x1080_60.00" 173.00 1920 2048 2248 2576 1080 1083 1088 1120 -hsync +vsync
xrandr -q #查看VGA
# Sceen 0: minimum 320 x 200 .....
# VGA-1 connected ....
xrandr --addmode VGA-1 "1920x1080_60.00"
xrandr --output VGA-1 --mode "1920x1080_60.00"
6.安装CUDA8.0
到官网下载cuda_8.0.61_linux.run,复制到根目录下。
sudo sh cuda_8.0.61_linux.run --tmpdir=/tmp/
遇到问题:incomplete installation,然后执行
sudo apt-get install libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
sudo sh cuda_8.0.61_linux.run -silent -driver
注:此时安装过程中提示是否要安装NVIDIA驱动时选择no。其他选择yes或默认即可。
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26? (y)es/(n)o/(q)uit: n
安装完毕后声明环境变量:
sudo vim ~/.bashrc
在.bashrc尾部添加如下内容:
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
测试下安装是否成功:
测试:
nvidia-smi
输出:
xx xx xx 15:20:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:01:00.0 On | N/A |
| 22% 48C P5 27W / 250W | 169MiB / 12205MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2421 G /usr/lib/xorg/Xorg 105MiB |
| 0 10062 G compiz 63MiB |
+-----------------------------------------------------------------------------+
7.安装OpenCV 3.2.0
从官网下载zip源代码,解压到根目录下。
安装依赖:
sudo apt-get -y remove ffmpeg x264 libx264-dev
sudo apt-get -y install libopencv-dev build-essential checkinstall cmake pkg-config yasm libjpeg-dev libjasper-dev libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev libv4l-dev python-dev python-numpy libtbb-dev libqt4-dev libgtk2.0-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev x264 v4l-utils ffmpeg libgtk2.0-dev
cd opencv-3.2.0
mkdir build
cd build/
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_OPENGL=ON ..
make -j32
sudo make install
安装成功后配置环境:
sudo sh -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig
测试OpenCV安装是否成功:
mkdir DisplayImage
cd DisplayImage
vim DisplayImage.cpp
添加代码:
#include <stdio.h>
#include <opencv2/opencv.hpp>
using namespace cv;
int main(int argc, char** argv)
{
if(argc!= 2)
{
printf("usage:DisplayImage.out <Image_Path>\n");
return -1;
}
Mat image;
image= imread(argv[1], 1);
if(!image.data)
{
printf("Noimage data\n");
return -1;
}
namedWindow("DisplayImage",CV_WINDOW_AUTOSIZE);
imshow("DisplayImage",image);
waitKey(0);
return 0;
}
创建CMake文件:
vim CMakeLists.txt
添加内容:
cmake_minimum_required(VERSION 2.8)
project(DisplayImage)
find_package(OpenCV REQUIRED)
add_executable(DisplayImage DisplayImage.cpp)
target_link_libraries(DisplayImage ${OpenCV_LIBS})
编译:
cmake .
make
执行:
./DisplayImage lena.jpg
如果在make opencv-3.2过程中错误:
fatal error: LAPACKE_H_PATH-NOTFOUND/lapacke.h: No such file or directory #include "LAPACKE_H_PATH-NOTFOUND/lapacke.h"
此时LAPACK和BLAS都已经安装了,解决方案:
sudo apt-get install liblapacke-dev checkinstall
修改在build文件夹内的lapack.h文件,将如下语句
#include "LAPACKE_H_PATH-NOTFOUND/lapacke.h"
改为
#include "lapacke.h"
8.安装cudnn 5.0
从官网下载cudnn-8.0-linux-x64-v5.0.tgz for CUDA 8.0. 解压到当前目录:
tar -zxvf cudnn-8.0-linux-x64-v5.0.tgz
解压后的文件如下:
cuda/include/cudnn.h
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.5
cuda/lib64/libcudnn.so.5.0.5
cuda/lib64/libcudnn_static.a
然后执行:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
9.BLAS安装与配置
BLAS(基础线性代数集合)是一个应用程序接口的标准。caffe官网上推荐了三种实现:ATLAS, MKL, OpenBLAS。其中ATLAS可以直接通过命令行安装。MKL是微软开发的商业工具包,面向科研和学生免费开放。申请学生版的Parallel Studio XE Cluster Edition,下载parallel_studio_xe_2017.tgz。注意接收邮件中的key(2HWS-34Z7S69B)。
tar zxvf parallel_studio_xe_2017.tgz #解压下载文件
chmod 777 parallel_studio_xe_2017 -R #获取文件权限
cd parallel_studio_xe_2017/
sudo ./install_GUI.sh
安装完成之后,进行相关文件的链接:
sudo gedit /etc/ld.so.conf.d/intel_mkl.conf
添加库文件:
/opt/intel/lib/intel64
/opt/intel/mkl/lib/intel64
编译链接使lib文件生效:
sudo ldconfig
如果选择安装ATLAS,在终端输入sudo apt-get install libatlas-base-dev
即可。
10.Py-Faster-RCNN配置
下载源码:包含caffe文件夹
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
安装库文件:
sudo apt-get install python-opencv
sudo pip install cython easydict
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libboost-all-dev libhdf5-serial-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler
安装依赖:
sudo apt-get install -y build-essential cmake git pkg-config libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get install --no-install-recommends libboost-all-dev
安装Python接口依赖:
sudo apt-get install python-tk
sudo apt-get install python-dev
sudo apt-get install -y python-pip
sudo apt-get install -y python-dev
sudo apt-get install -y python-numpy python-scipy sudo apt-get install -y python3-dev
sudo apt-get install -y python3-numpy python3-scipy
在caffe的python文件夹内,使用root执行依赖项的检查与安装:
sudo su
cd caffe-fast-rcnn/python
for req in $(cat requirements.txt); do pip install $req; done
修改Makefile文件
终端输入
cd py-faster-rcnn/caffe-fast-rcnn/
cp Makefile.config.example Makefile.config
vim Makefile.config
使用python层
将 # WITH_PYTHON_LAYER := 1修改为 WITH_PYTHON_LAYER := 1
使用cudnn加速
将 # USE_CUDNN := 1修改为 USE_CUDNN := 1
保留 # CPU_ONLY := 1不变,使用GPU运行
如下两行对应内容修改为:
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial /usr/local/share/OpenCV/3rdparty/lib/
在Makefile中配置:
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5 opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
hdf5的配置:官方说这对于Ubuntu 16.04是必须的;(libhdf5的版本号需要根据实际来修改)
sudo find . -type f -exec sed -i -e 's^"hdf5.h"^"hdf5/serial/hdf5.h"^g' -e 's^"hdf5_hl.h"^"hdf5/serial/hdf5_hl.h"^g' '{}' \;
cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so
编译Cython模块
cd py-faster-rcnn/lib/
make
编译caffe
由于当前版本的caffe中cudnn实现与系统所安装的cudnn的版本不一致会引起错误,rbgirshick的py-faster-rcnn其cudnn实现为旧版本的实现,所有出现了以上问题。
cudnn-7.0-linux-x64-v4.0-prod.tgz不会出现此问题
cudnn-7.5-linux-x64-v5.1.tgz会出现同样问题
cudnn-8.0-linux-x64-v5.1.tgz会出现同样问题
解决办法:
1将py-faster-rcnn/caffe-fast-rcnn/include/caffe/util/cudnn.hpp 换成最新版caffe里的相应目录下的cudnn.hpp;
2将py-faster-rcnn/caffe-fast-rcnn/include/caffe/layers/下所有cudnn开头的文件都替换为最新版caffe里相应目录下的同名文件;
3将py-faster-rcnn/caffe-fast-rcnn/src/caffe/layer下所有cudnn开头的文件都替换为最新版caffe里相应目录下的同名文件;
注:官方caffe源包caffe-master:https://github.com/BVLC/caffe
编译
cd py-faster-rcnn/caffe-fast-rcnn/
make clean #清除前一次编译结果
make -j32
编译pycaffe
cd py-faster-rcnn/caffe-fast-rcnn/
make pycaffe
下载训练好的模型
终端输入
cd py-faster-rcnn/
./data/scripts/fetch_faster_rcnn_models.sh
faster-rcnn测试pascal_voc目标检测
cd py-faster-rcnn/
./tools/demo.py
常见的报错Debug:
1.*AttributeError: 'module' object has no attribute 'text_format'*
需要在py-faster-rcnn/lib/fast_rcnn/train.py中添加:
import google.protobuf.text_format
2.*KeyError: 'chair' [when train only several classes]*
使用py-faster-rcnn训练VOC2007数据集时遇到如下问题:
File “/home/sai/py-faster-rcnn/tools/../lib/datasets/pascal_voc.py”, line 217, in _load_pascal_annotation
cls = self._class_to_ind[obj.find(‘name’).text.lower().strip()]
KeyError: ‘chair‘
解决:
You probably need to write some line of codes to ignore any objects with classes except the classes you are looking for when you are loading the annotation _load_pascal_annotation.
Something like
cls_objs = [obj for obj, clas in objs, self._classes if obj.find(‘name‘).text== clas]
when you are loading the annotation in _load_pascal_annotation method, look for something like
objs = diff_objs (or non_diff_objs) (after this line in pascal_voc.py)
After that line insert something similar to below code
cls_objs = [obj for obj in objs if obj.find('name').text in self._classes]
objs = cls_objs
参考:https://github.com/rbgirshick/py-faster-rcnn/issues/316
3.Annotations files 标记文件问题
Note that: <difficult>0</difficult>
must be 0, if not, we will get error: ZeroDivisionError: integer division or modulo by zero
4.AssertionError: Selective search data not found at
训练时报错,可修改为:Change _C.TRAIN.PROPOSAL_METHOD = 'gt'
the 118 line in model/config.py file. It should be OK.
5.AttributeError: ‘module’ object has no attribute ‘text_format’
在不采用预训练权重时,碰到错误pb2.text_format.Merge(f.read(), self.solver_param) AttributeError: 'module' object has no attribute 'text_format'
原因是protobuf的版本问题,更换版本或者修改:
在文件./lib/fast_rcnn/train.py
增加一行import google.protobuf.text_format
即可.
6.Ubuntu环境下python2和python3的切换
用 update-alternatives
1)建立链接:
|
|
2)sudo update-alternatives --config python
按照提示选择默认python
3) 删除某个可选项:sudo update-alternatives --remove python /usr/bin/python2.7
7.网络更改
修改类别数:
在train.prototxt中:input-data
层的num_classes
,为类别数+1 (1个背景类,下同)roi-data
层的num_classes
,为类别数+1cls_score
层的num_output
,为类别数+1bbox_pred
层的num_output
,为$(类别数+1)4$, 4表示一个bbox的4个坐标值
在test.prototxt
中
修改anchor数:rpn_cls_prob_reshape
层的第二个dim
: $2anchor数量$(2表示bg/fg,背景和前景做二分类,下同)rpn_cls_score
层的num_output
: $2*anchor$数量
同时,python代码中也要修改这个anchor数.
8...\lib\roi_data_layer\layer.py", line 125, in setup
top[idx].reshape(1, self._num_classes * 4)
IndexError: Index out of range
Do you provide a config file (eg. experiments/cfgs/faster_rcnn_end2end.yml
)? Looks like cfg.TRAIN.HAS_RPN
is false but it should be true! Please have a look at experiments/scripts/faster_rcnn_end2end.sh
for details.
keep __C.TRAIN.PROPOSAL_METHOD = 'gt'
__C.TEST.PROPOSAL_METHOD = 'gt'
same with the faster_rcnn_end2end.yml
9.loss_layer.cpp:25] Check failed: bottom[0]->num() == bottom[1]->num() (2 vs. 1) The data and label should have the same number.
*** Check failure stack trace: ***
使用end2end的方法训练py-faster-rcnn, 把 TRAIN.IMS_PER_BATCH 设置为 2的时候会出错,显示data和label的batch size不一致,在源码lib/rpn/anchor_target_layer.py
中可以看到,anchor_target_layer
的top[0] 的batchsize被写死为1了。
The data blob had num = 2
so I set cfg.TRAIN.IMS_PER_BATCh
to 1
, and the problem is gone now.
10.train完之后在测试时碰到问题Check failed: error == cudaSuccess (2 vs. 0) out of memory
一般是通过减少Batch,此处通过减少TEST的两个参数值可解决问题!
|
|