# PERSONAL MEETING ## EfficientPose Scalable single-person pose estimation Daniel Groos,Heri Ramampiaro,Espen AF Ihlen ### The main contributions of this paper 1. improvement of OpenPose called EfficientPose, the shortcomings of the popular OpenPose network on single-person HPE with improved level of precision, rapid convergence during optimization, low number of parameters, and low computational cost. 2. approach providing scalable models that can suit various demands, enabling a trade-off between accuracy and efficiency across diverse application constraints and limited computational budgets. 3. propose a new way to incorporate mobile ConvNet components, which can address the need for computationally efficient architectures for HPE, thus facilitating real-time HPE on the edge. ### OpenPose architecture utilizing ![](https://i.imgur.com/T753BhI.jpg) Fig. 1 OpenPose architecture utilizing 1) VGG-19 feature extractor, and 2) detection blocks performing 4+2 passes of estimating part affinity fields (3a-d) and confidence maps (3e and 3f) ### Effectivepose Architecture ![](https://i.imgur.com/ffw6mkU.jpg) Fig. 2 Proposed architecture comprising 1a) high-resolution and 1b) low-resolution inputs, 2a) high-level and 2b) low-level EfficientNet backbones combined into 3) cross-resolution features, 4) Mobile DenseNet detection blocks, 1+2 passes for estimation of part affinity fields (5a) and confidence maps (5b and 5c), and 6) bilinear upscaling ![](https://i.imgur.com/qlNnrHI.jpg) ForConv(K×K,N,S),K×K denotes filter size,N is number of output feature maps, and Sisstride.BN denotes batch normalization.I defines input size,corresponding with image resolution on ImageNet,where as αφ refers to the depth factoras determinedby(1) ![](https://i.imgur.com/dm3K5ES.jpg) Fig. 3 The composition of MBConvs. From left: a-d) MBConv(K×K, B, S) in EfficientNets performs depthwise convolution with filter size K×K and stride S, and outputs B feature maps. MBConv∗ (b and d) extends regular MBConvs by including dropout layer and adjusts MBConv6 skip connection. e) E-MBConv6(K×K, B, S) in Mobile DenseNets adjusts MBConv6 with E-swish activation and number of feature maps in expansion phase as 6B. All MBConvs take as input M feature maps with spatial height and width of h and w, respectively. R is the reduction ratio of SE ![](https://i.imgur.com/Fapgdhu.jpg) Mobile DenseNets MD(C) computes 3C feature maps. P and Q denotes the number of 2D part affinity fields and confidence maps, respectively. ConvT(K×K, O, S) defines transposed convolutions with kernel size K×K, output maps O, and stride S ![](https://i.imgur.com/wNlXGfR.jpg) ![](https://i.imgur.com/NGTGar4.jpg) ![](https://i.imgur.com/fnPKvcm.jpg) ![](https://i.imgur.com/GvIQzZw.jpg) <iframe src="https://drive.google.com/file/d/1btr_SEqhSdLc4bMXf5yfvY99_YaL8YuR/preview" width="640" height="480" allow="autoplay"></iframe>