# PERSONAL MEETING
## EfficientPose Scalable single-person pose estimation
Daniel Groos,Heri Ramampiaro,Espen AF Ihlen
### The main contributions of this paper
1. improvement of OpenPose called EfficientPose, the shortcomings of the popular OpenPose network on single-person HPE with improved level of precision, rapid convergence during optimization, low number of parameters, and low computational cost.
2. approach providing scalable models that can suit various demands, enabling a trade-off between accuracy and efficiency across diverse application constraints and limited computational budgets.
3. propose a new way to incorporate mobile ConvNet components, which can address the need for computationally efficient architectures for HPE, thus facilitating real-time HPE on the edge.
### OpenPose architecture utilizing

Fig. 1 OpenPose architecture utilizing 1) VGG-19 feature extractor, and 2) detection blocks performing 4+2 passes of estimating part affinity fields (3a-d) and confidence maps (3e and 3f)
### Effectivepose Architecture

Fig. 2 Proposed architecture comprising 1a) high-resolution and 1b) low-resolution inputs, 2a) high-level and 2b) low-level EfficientNet backbones combined into 3) cross-resolution features, 4) Mobile DenseNet detection blocks, 1+2 passes for estimation of part affinity fields (5a) and confidence maps (5b and 5c), and 6) bilinear upscaling

ForConv(K×K,N,S),K×K denotes filter size,N is number of output feature maps, and Sisstride.BN denotes batch normalization.I defines input size,corresponding with image resolution on ImageNet,where as αφ refers to the depth factoras determinedby(1)

Fig. 3 The composition of MBConvs. From left: a-d) MBConv(K×K, B, S) in EfficientNets performs depthwise convolution with filter
size K×K and stride S, and outputs B feature maps. MBConv∗ (b and d) extends regular MBConvs by including dropout layer and adjusts MBConv6 skip connection. e) E-MBConv6(K×K, B, S) in Mobile DenseNets adjusts MBConv6 with E-swish activation and number of feature maps in expansion phase as 6B. All MBConvs take as input M feature maps with spatial height and width of h and w, respectively. R is the reduction ratio of SE

Mobile DenseNets MD(C) computes 3C feature maps. P and Q denotes the number of 2D part affinity fields and confidence maps, respectively. ConvT(K×K, O, S) defines transposed convolutions with kernel size K×K, output maps O, and stride S




<iframe src="https://drive.google.com/file/d/1btr_SEqhSdLc4bMXf5yfvY99_YaL8YuR/preview" width="640" height="480" allow="autoplay"></iframe>