caffe SSD mobileNet people detection model - training

20190904

install caffe ssd branch
生成訓練資料(pascal voc–>lmdb)
生成prototxt
執行train

caffe SSD mobileNet people detection model - training

此篇以people detection為例
資料的label只有"person"一種類別

Install

下載並切換到ssd branch

$ git clone https://github.com/weiliu89/caffe.git
$ cd caffe
$ git checkout ssd

編譯
照著github步驟 make
makefile 跟 Makefile.config都要改一下
[install issue1] [install issue2]
[make issue1] [make issue2]

補充:
記得需要編譯"新版的caffe-SSD",才會認得SSD

不然會出現一些layer找不到的issue
issue:caffe.LayerParameter has no field named permute_param

If you just want to perform inference, I think adding about three new layers (prior_bbox_layer, permute_layer, detection_output_layer) to your caffe is enough.

Generate lmdb dataset (from pascal voc format )

ref.

Train SSD on the Custom Dataset
Mobilenet-SSD的Caffe系列實現
 chuanqi305/MobileNet-SSD

Prepare voc form dataset

將已準備好的pascal voc格式的資料放在/home/data/VOCdevkit裡 (不然就要改creat_list.sh裡面的path)

(caffe)wang@ai4:~/data/VOCdevkit/MYDATASET$ ls
Annotations  ImageSets  JPEGImages  MYDATASET_person_64127.txt

複製相關的4個sh,txt檔到新的資料夾(MYDATASET)內

$ cd ~/caffe-SSD/data/
$ cp VOC0712/* MYDATASET/.

結果會像這樣

(caffe)wang@ai4:~/caffe-SSD/data/MYDATASET$ ls
coco_voc_map.txt  create_data.sh  create_list.sh  labelmap_voc.prototxt

modify the `create_list.sh`

In the second loop, replace the keywords VOC2007 and VOC2012 with "MYDATASET" since we have only one dataset.

  ##for name in VOC2007 VOC2012
  for name in MYDATASET

Generate filename in test/trainval.txt

(caffe)wang@ai4:~/caffe-SSD/data/MYDATASET$ ./create_list.sh

to generate test_name_size.txt, test.txt, and trainval.txt in data/MYDATASET/.

結果會像這樣:

(caffe)wang@ai4:~/caffe-SSD/data/MYDATASET$ ls 
coco_voc_map.txt  create_list.sh         test.txt            trainval.txt
create_data.sh    labelmap_voc.prototxt  test_name_size.txt

Rename the labelmap_voc.prototxt

$ vim data/MYDATASET/labelmap_MYDATASET.prototxt

In this file, the first block points to the background. So, don't change it. For the rest block, change their class names accordingly.

### labelmap_MYDATASET.prototxt
item {
  name: "none_of_the_above"
  label: 0
  display_name: "background"
  }
  item {
  name: "person"
  label: 1
  display_name: "person"
  }

modify the `create_data.sh`

Convert dataset to lmdb database
edit ~/caffe-SSD/data/MYDATASET/create_data.sh

# 改兩個名稱
dataset_name="MYDATASET"
mapfile="$root_dir/data/$dataset_name/labelmap_MYDATASET.prototxt"

Run create_data.sh

(caffe)wang@ai4:~/caffe-SSD/data/MYDATASET$ ./create_data.sh

This will create LMDB database in ~/data/VOCdevkit
and make a soft link in examples/MYDATASET/.

–> ~/data/VOCdevkit/persondataset 會多一個"lmdb"資料夾(成功!)
~/data/examples/MYDATASET/會多soft link

issue:No module named caffe.proto
參考:SSD from caffe.proto import caffe_pb2 ImportError: No module named caffe.proto
改~/caffe-ssd/scripts/create_annoset.py檔
加一句ssd python path 即可

import sys
sys.path.insert(0,"/home/xxx/caffe-ssd/python")  // ++
from caffe.proto import caffe_pb2

Create lmdb symlinks

create symlinks to current directory.

$ cd ~/MobileNet-SSD
$ ln -s /home/xxx/data/VOCdevkit/MYDATASET/lmdb/MYDATASET_trainval_lmdb trainval_lmdb
$ ln -s /home/xxx/data/VOCdevkit/MYDATASET/lmdb/MYDATASET_test_lmdb test_lmdb

Generate training prototxt

參考chuanqi305/MobileNet-SSD]

在caffe-SSD/examples/MobileNet-SSD
由於原VOC數據集是21類(20+背景)，而我們是"2類"(1+背景)
因此，需要重新生成訓練、測試和運行網絡文件
這裡使用gen_model.sh，他會調用template文件夾中的模板，按照我們指定的參數，生成所需的文。用法如下：

$ cd ~/caffe-SSD/examples/MobileNet-SSD
$ ./gen_model.sh 2
## ./gen_model.sh [cls]

執行之後，得到example文件夾，內已生好prototxt檔了!
根據作者設置，其中的deploy文件是已經合併過bn層的，需要後面配套使用。

$ ls ~/caffe-SSD/examples/MobileNet-SSD/example
MobileNetSSD_deploy.prototxt  MobileNetSSD_test.prototxt  MobileNetSSD_train.prototxt

Modify `solver.prototxt` setting

修改訓練和測試超參數
~/caffe-SSD/examples/MobileNet-SSD
根據實際情況,修改solver_train.prototxt 和solver_test.prototxt。
其中test_iter=測試集圖片數量/batchsize；
初始學習率不宜太高，否則基礎權重破壞比較嚴重；
優化算法是RMSProp，可能對收斂有好處，不要改成SGD，也是為了保護權重。

[參考] Mobilenet-SSD的Caffe系列實現

Train

修改並運行train.sh，中途可以不斷調節參數。
訓練結束後，運行test.sh，測試網絡的精度值。

Download the training weights from the link above, and run train.sh, after about 30000 iterations, the loss should be 1.5 - 2.5.

## train and save log
$ ./train.sh 2>&1 | tee -a log/0905_ssd_mob.log

附錄

Training time 很慢長

10 iteration要 3分鐘
1000–>300
10000–>3000min = 50 hr

batch size:24 + 300x300 –> train gpu 5651MiB~7163MiB

可改進項目

training 時的 input size 調整
300x300是大還是小?(但改了就無法使用pretrained model)
最後測 deploy 時,用哪一種input size, confidence 效果會更好?
about 小物件偵測:改變feature maps的大小是否能看得更細?
Mininum size of the detected objects #297

it does not do well for small objects.
A 50x50 object may only have 5-6 pixels on conv4_3 (i.e. 8x reduction in resolution).
To detect smaller objects better, besides what you mentioned, you could increase input image size or increase feature map size.

Model result

accuracy & loss

caffemodel test result:

–- END –-

caffe SSD mobileNet people detection model - training

Install

Generate lmdb dataset (from pascal voc format )

ref.

Prepare voc form dataset

modify the create_list.sh

modify the create_data.sh

Create lmdb symlinks

Generate training prototxt

Modify solver.prototxt setting

Train

附錄

Training time 很慢長

可改進項目

Model result

Read more

[note] 2024生成式AI導論_李宏毅

[2025李宏毅ML] 生成式AI時代下的機器學習

[2025李宏毅ML] 第8講：大型語言模型的推理過程不用太長、夠用就好

[2025李宏毅ML] 第7講：DeepSeek-R1 這類大型語言模型是如何進行「深度思考」(Reasoning)的？

modify the `create_list.sh`

modify the `create_data.sh`

Modify `solver.prototxt` setting