論文筆記 - HackMD

###### tags: `Log` # 論文筆記 ## 110 Summer Vacation ### AI x EDU | Title | Short Name | Year | Tags | Notes | | :----------------------------------------------------------: | :--------: | :--: | :------------------: | :---: | | [Introducing a Framework to Assess Newly Created Questions with Natural Language Processing](https://link.springer.com/chapter/10.1007/978-3-030-52237-7_4) | text2props | 2020 | `AIED` `LA` `IRT` | | | [Strategies for Deploying Unreliable AI Graders in High-Transparency High-Stakes Exams](https://link.springer.com/chapter/10.1007/978-3-030-52237-7_2) | | 2020 | `AIED` `ASAG` `EiPE` | | | [Making Sense of Student Success and Risk Through Unsupervised Machine Learning and Interactive Storytelling](https://link.springer.com/chapter/10.1007/978-3-030-52237-7_1) | FIRST | 2020 | `AIED` `LA` | | ### VQA | Title | Short Name | Year | Tags | Notes | | :----------------------------------------------------------: | :--------: | :--: | :-----------------------: | :----------------------------------------------------------: | | [VirTex: Learning Visual Representations from Textual Annotations](https://arxiv.org/abs/2006.06666) | VirTex | 2020 | `VLP` | [ppt](https://docs.google.com/presentation/d/1s8yUAVGhYPCUb4pzRS7E9FUY3P-C45FQ/edit?usp=sharing&ouid=108285430116199915641&rtpof=true&sd=true) | | [Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning](https://arxiv.org/abs/2104.03135) | SOHO | 2021 | `VLP` `MVM` `E2E` | [here](https://hackmd.io/cchb6OQnSOmoqwu29uyGaQ?view) | | [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) | BEiT | 2021 | `MIM` | | | [You Only Learn One Representation: Unified Network for Multiple Tasks](https://arxiv.org/abs/2105.04206) | YOLOR | 2021 | `OD` `Implicit Knowledge` | | | [E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning](https://arxiv.org/abs/2106.01804) | E2E-VLP | 2021 | `VLP` `E2E` | | ### ROCLING | Title | Short Name | Year | Tags | Notes | | :----------------------------------------------------------: | :--------: | :--: | :------------: | :----------------------------------------------------------: | | [ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders](https://arxiv.org/abs/2105.01279) | ZEN2.0 | 2021 | `NLP` `n-gram` | [ppt](https://drive.google.com/file/d/1FYZnuZO3mDzhPU2z38i6tn2XRu4WtZdB/view?usp=sharing) | ## Previous ### AI x EDU | Title | Short Name | Year | Tags | Notes | | :----------------------------------------------------------: | :--------: | :--: | :--------------------: | :----------------------------------------------: | | [TinaFace: Strong but Simple Baseline for Face Detection](https://arxiv.org/abs/2011.13183) | TinaFace | 2020 | `FD` | [here](https://hackmd.io/VdVbr7HESUOBEEM4b6Jg2Q) | | [Engagement detection in online learning: a review](https://slejournal.springeropen.com/articles/10.1186/s40561-018-0080-z) | | 2019 | `Engagement Detection` | [here](https://hackmd.io/-ldFTCwzRVOAGDSooCaEpQ) | | [Automatic Recognition of Student Engagement using Deep Learning and Facial Expression](https://arxiv.org/abs/1808.02324) | | 2018 | `Engagement Detection` | [here](https://hackmd.io/FQEf8C8rQA2Tj_NkA_zu7A) | ### VQA | Title | Short Name | Year | Tags | Notes | | :----------------------------------------------------------: | :---------------: | :--: | :-----------------: | :----------------------------------------------------------: | | [UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning](https://arxiv.org/abs/2012.15409) | UNIMO | 2020 | `VLP` `Scene Graph` | [ppt](https://drive.google.com/file/d/1P3aKgAgLBOxa_kRxPpByd3qgDsCorqYk/view?usp=sharing) | | [ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph](https://arxiv.org/abs/2006.16934) | ERNIE-ViL | 2021 | `VQA` `Scene Graph` | [here](https://hackmd.io/t60ixInxQv-MYX9FdFDCKA) | | [Large-Scale Adversarial Training for Vision-and-Language Representation Learning](https://arxiv.org/abs/2006.06195) | VILLA | 2020 | `VLP` | [here](https://hackmd.io/EnJmjzFqT_SqzNaKCGPyBA) | | [Answer Questions with Right Image Regions: A Visual Attention Regularization Approach](https://arxiv.org/abs/2102.01916) | AttReg | 2021 | `VQA` | [here](https://hackmd.io/dxQc7j6KScSlnZQnlSZ0Gg) | | [VinVL: Revisiting Visual Representations in Vision-Language Models](https://arxiv.org/abs/2101.00529) | VinVL<br />Oscar+ | 2021 | `VLP` | | | [Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks](https://arxiv.org/abs/2004.06165) | Oscat | 2020 | `VLP` | [here](https://hackmd.io/PbarrzKVTpOIs41MILEVQw) | | [In Defense of Grid Features for Visual Question Answering](https://arxiv.org/abs/2001.03615) | GridFeat | 2020 | `VQA` | [here](https://hackmd.io/1KhLKVEhQWienb44auGuXw) | | [Deep Modular Co-Attention Networks for Visual Question Answering](https://arxiv.org/abs/1906.10770?) | MCAN | 2019 | `VQA` | [here](https://hackmd.io/SbixSrnDSIaH64gvaP94zg) | | [Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering](https://arxiv.org/abs/1707.07998) | BUTD | 2017 | `VQA` | [here](https://hackmd.io/1mRVs0IaQAm15DfQGIhw5Q) | | [YOLOv4: Optimal Speed and Accuracy of Object Detection](https://arxiv.org/abs/2004.10934) | YOLOv4 | 2020 | `OD` | [here](https://hackmd.io/pN-2VObUT7-d1e0YsO-gbw) | | [YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767) | YOLOv3 | 2018 | `OD` | | | [YOLO9000: Better, Faster, Stronger](https://arxiv.org/abs/1612.08242) | YOLOv2 | 2016 | `OD` | | | [You Only Look Once: Unified, Real-Time Object Detection](https://arxiv.org/abs/1506.02640) | YOLOv1 | 2015 | `OD` | [here](https://hackmd.io/wGBunSLgTB2mtElsmEif6A) | | [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) | DeBERTa | 2020 | `Transformer` | [here](https://hackmd.io/dz9XFCVfQiiLuEn6QA86rw) | | [Rethinking Attention with Performers](https://arxiv.org/abs/2009.14794) | Performer | 2020 | `Transformer` | [here](https://hackmd.io/bKPL44O1S2ioYNLUC-kcRw) | | [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) | ViT | 2020 | `Transformer` | [here](https://hackmd.io/IrAmnIOVQcSyFk2dYIk9aA) | | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) | Transformer | 2017 | `Transformer` | [here](https://hackmd.io/NTqeJ7i-QuCmNNxG8ad8BA) | | [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168) | DCNv2 | 2018 | `CNN` | [here](https://hackmd.io/Y6Fi0jZeQmGWTmgKJnpfQQ) | | [Deformable Convolutional Networks](https://arxiv.org/abs/1703.06211) | DCNv1 | 2017 | `CNN` | [here](https://hackmd.io/JU5DRgZ1SJKUkuobe5tZwA) | | [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) | Faster R-CNN | 2015 | `OD` | [here](https://hackmd.io/jXaBefEdToSKKAn76viuww) | | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) | ResNet | 2015 | `CNN` | [here](https://hackmd.io/tZwZ1TyjQtCwuiyd0l397g) | | [Rich feature hierarchies for accurate object detection and semantic segmentation](https://arxiv.org/abs/1311.2524) | R-CNN | 2014 | `OD` | [here](https://hackmd.io/-x4quqDdRb6ORgx2ORKxJQ) | ## Abbreviation - `VQA`：Visual Question Answering - `VLP`：Visual-Language Pre-training - `OD`：Object Detection - `FD`：Face Detection - `LA`：Learning Analytics - `IRT`：Item Response Theory - `MIM`：Mask Image Modeling - `MVM`：Mask Visual Modeling - `E2E`：End 2 End - `ASAG`：Automated Short Answer Grading - `EiPE`：Explain in Plain English