Smart Self-Checkout Carts Based on Deep Learning Activity Recognition

###### tags: `NOL` `iStore` # Smart Self-Checkout Carts Based on Deep Learning Activity Recognition ### Abstract --- In this work, a prototype of smart shopping cart is developed based on image-based action recognition to enable the possibility of a "Just Walk Out" shopping scenario. A video camera is installed on the cart to monitor shopping activities such as adding or removing items so that items in the cart can be tracked and checked out. The framework consists of five modules. Firstly, deep learning networks such as Faster R-CNN, YOLOv2, and YOLOv2-Tiny are utilized to analyze the content of each video frame. Frames are classified into three classes: \enquote {No Hand}, \enquote{Empty Hand}, and \enquote{Holding Items}. The classification accuracy based on Faster R-CNN, YOLOv2, or YOLOv2-Tiny is between $93.0\%$ and $90.3\%$, and the processing speed of the three networks can be up to $5$ fps, $39$ fps, and $50$ fps, respectively. Secondly, based on the sequence of frame classes, the timeline is divided into \enquote{No Hand intervals}, \enquote{Empty Hand intervals}, and \enquote{Holding Items intervals}. The accuracy of action recognition is $96\%$, and the time error is 0.119s on average. Finally, we categorize the events into four cases: \enquote{No Change}, \enquote{Placing}, \enquote{Removing} and \enquote{Swapping}. Even including the correctness of the item recognition, the accuracy of shopping event detection is $97.9\%$, which is higher than the minimal requirement to deploy such a system in a smart shopping environment. --- ### [Demo Video](P5UmD3mBP00) Hong-Chuan Chi, "Smart Self-Checkout Carts Based on Deep Learning Activity Recognition," Master Thesis, advised by Tsì-Uí İk, National Chiao Tung University, Taiwan, 2019. The web demo of Smart Self-Checkout Carts is available at this link. http://nol.cs.nctu.edu.tw/iCart/index.html --- ### Dataset The dataset has a video of shopping activities. The resolution of video is $756 \times 1344$ at a frame rate 30 fps. The data set is divided into three folders: "data_0", "data_1", and "data_2". The contents of the folder are: 1. data_0 #All labeling for frames. 2. data_1 #part of labeling for frames. 3. data_2 #labeling actions and events. Used only to evaluate actions and events. These three folder also have sub-folder which contains different files: 1. Each video along with its label files is stored in a sub-folder named video0, video1 etc. The video files are named video0.mp4, video1.mp4, etc. 2. The corresponding frame label file, shopping action file, and shopping event file of video0.mp4 considered as an example are named as video0.csv, video0_action.csv, and video0_event.csv respectively. 3. In these sub-folder such as video0 etc. have jpg folder which contain the images files are sequentially named as 000001.jpg, 000002.jpg, etc. 4. The labeled images are stored in a folder named "jpg". 5. The unlabeled images are stored in a folder named "nolabel". **The Figure shows the sample of labeled image of jpg folder**. ![](https://i.imgur.com/1eVlno9.png) ### Frame Lable Format The labeled file attributes details describes in below mention points: 1. file name, class, xmin, ymin, ymax, ymax. 2. file name and class are the file names of the labelled frame and the class of the frame. 3. x coordinate (xmin) and y coordinate (ymin) are in the upper left corner of the object, x coordinate (xmax) and y coordinate (ymax) are in the lower right corner of the object. 4. If an object is labelled in the frame, xmin and ymin are the coordinates in pixel of the upper left corner of the labelled object and xmax and ymax are the coordinates of the lower right corner of the object. 5. If frames belong to the NH class, no object will be labelled, and "xmin", "ymin", "xmax", and "ymax" will be blank. --- ***In the shopping action labelled files which each line annotates one action interval which composes the start frame (start), the end frame (end), and the action class (action).*** --- ***The shopping event have attributes start frame, end frame, event class, placed item and picked item of the event.*** --- **Example** This figure shows frame label files(left), shopping action labelled files(middle) and shopping event labeled file(right). ![](https://i.imgur.com/x5BwTD5.jpg) ### Class Activities Table | No| hand & item activities | | -------- | -------- | |-1|no hand (NH)| |0|empty hand (EH)| | 1 | holding item 1 (HI-1) | |2|holding item 2 (HI-2)| |3|holding item 3 (HI-3)| |4|holding item 4 (HI-4)| |5|holding item 5 (HI-5)| |6|holding item 6 (HI-6)| |7|holding item 7 (HI-7)| |8|holding item 8 (HI-8)| |9|holding item 9 (HI-9)| |10|holding item 10 (HI-10)| ### Event Activities Table | No| hand & item activities | | -------- | -------- | |0|nothing| | 1 | placing item | |2|removing item | |3|exchange| ### Download Dataset Click the link https://drive.google.com/a/nctu.edu.tw/file/d/1RNKc5F07LynUfodBRZ3ioL5vdDemd168/view?usp=sharing to download the dataset. <div style="background-color: rgba(0, 0, 0, 0.9); color: #fff; padding: 20px;"> ## Contact us If you have any question feel free and contact us at: cwyi@nctu.edu.tw </div>

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.