Categorical Collections

# Categorical Collections ## Video Retrieval <details> <summary>Convolutional Learning of Spatio-temporal Feature</summary> 2010 [paper](https://www.researchgate.net/publication/216792694_Convolutional_Learning_of_Spatio-temporal_Features) </details> <details> <summary>UCF101 A Dataset of 101 Human Actions Classes From Videos in The Wild</summary> 2012 [paper](https://arxiv.org/pdf/1212.0402) </details> <details> <summary>Two-Stream Convolutional Networks for Action Recognition in Videos</summary> 2014 [paper](https://arxiv.org/pdf/1406.2199) [code](https://github.com/jeffreyyihuang/two-stream-action-recognition/tree/master) </details> <details> <summary>Learning Spatiotemporal Features with 3D Convolutional Networks</summary> 2015 [paper](https://arxiv.org/pdf/1412.0767) </details> <details> <summary>Spatiotemporal Residual Networks for Video Action Recognition</summary> 2016 [paper](https://arxiv.org/pdf/1611.02155) </details> <details> <summary>Convolutional Two-Stream Network Fusion for Video Action Recognition</summary> 2016 [paper](https://arxiv.org/pdf/1604.06573) [code](https://github.com/feichtenhofer/twostreamfusion) </details> <details> <summary>Temporal Segment Networks for Action Recognition in Videos</summary> 2017 [paper](https://arxiv.org/pdf/1705.02953) [code](https://github.com/yjxiong/temporal-segment-networks?tab=readme-ov-file#news--updates) </details> <details> <summary>Localizing Moments in Video with Natural Language</summary> 2017 [paper](https://arxiv.org/pdf/1708.01641v1) </details> <details> <summary>A Closer Look at Spatiotemporal Convolutions for Action Recognition</summary> 2018 [paper](https://arxiv.org/pdf/1711.11248v3) </details> <details> <summary>SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval</summary> 2019 [paper](https://svdbase.github.io/files/ICCV19_SVD.pdf) [code](https://github.com/4ML-platform/ndvr) </details> <details> <summary>SlowFast Networks for Video Recognition</summary> 2019 [paper](https://arxiv.org/pdf/1812.03982) [code](https://github.com/facebookresearch/SlowFast) </details> <details> <summary>VideoMix: Rethinking Data Augmentation for Video Classification</summary> 2020 [paper](https://arxiv.org/pdf/2012.03457) </details> <details> <summary>Temporal Context Aggregation for Video Retrieval with Contrastive Learning</summary> 2020 [paper](https://arxiv.org/pdf/2008.01334) </details> <details> <summary>TAP-Vid: A Benchmark for Tracking Any Point in a Video</summary> 2023 [paper](https://arxiv.org/pdf/2211.03726) [code](https://github.com/google-deepmind/tapnet) </details> <details> <summary>CoTracker: It is Better to Track Together</summary> 2023 [paper](https://arxiv.org/pdf/2307.07635) [code](https://github.com/facebookresearch/co-tracker) </details> <details> <summary>Revisiting Feature Prediction for Learning Visual Video Representations</summary> 2024 [paper](https://arxiv.org/pdf/2404.08471v1) [code](https://github.com/facebookresearch/jepa/tree/main) </details> <details> <summary>Scaling Foundation Models For Multimodal Video Understanding</summary> 2024 [paper](https://arxiv.org/pdf/2403.15377v3) [code](https://github.com/opengvlab/internvideo) </details> <details> <summary>LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment</summary> 2024 [paper](https://arxiv.org/pdf/2310.01852) [code](https://github.com/PKU-YuanGroup/LanguageBind?tab=readme-ov-file) </details> ### Others Resources <details> <summary>Pytorch Video</summary> [repository](https://github.com/facebookresearch/pytorchvideo) [document](https://pytorchvideo.org) </details> <details> <summary>Frame Extractor</summary> [code 1](https://github.com/titania7777/FrameExtractor) [code 2](https://github.com/fastcatai/frame-extraction/blob/master/extract.py) </details> <details> <summary>Katna library</summary> [repository](https://github.com/keplerlab/katna/tree/master) [document](https://katna.readthedocs.io/en/latest/index.html) </details> <details> <summary>SOTA of DiDeMo</summary> [paper-with-code](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-didemo?p=languagebind-extending-video-language) </details> ## Computer Vision ### Image Classification ### Object Detection <details> <summary>Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks</summary> [paper](https://arxiv.org/pdf/1506.01497) </details> <details> <summary>SSD: Single Shot MultiBox Detector</summary> [paper](https://arxiv.org/pdf/1512.02325) </details> <details> <summary>Other resouces</summary> [d2l-object-detection](https://d2l.ai/chapter_computer-vision/bounding-box.html) </details> ### Segmentation <details> <summary>TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation</summary> [paper](https://arxiv.org/pdf/2102.04306) [code](https://github.com/Beckschen/TransUNet) </details> <details> <summary>Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation</summary> [paper](https://arxiv.org/pdf/2105.05537) [code](https://github.com/HuCaoFighting/Swin-Unet) </details> ### Image-Text Retrieval <details> <summary>SAM - Segment Anything </summary> [paper](https://arxiv.org/pdf/2304.02643) [code](https://github.com/facebookresearch/segment-anything) </details> <details> <summary>SAM 2: Segment Anything in Images and Videos </summary> [paper](https://arxiv.org/pdf/2408.00714) [code](https://github.com/facebookresearch/segment-anything-2) </details> <details> <summary>Grounding DINO: DINO with Grounded Pre-Training for Open-Set Object Detection </summary> [paper](https://arxiv.org/pdf/2303.05499) [code](https://github.com/IDEA-Research/GroundingDINO) [demo](https://huggingface.co/docs/transformers/en/model_doc/grounding-dino) </details> <details> <summary>SlimSAM: 0.1% Data Makes Segment Anything Slim</summary> [paper](https://arxiv.org/pdf/2312.05284) [code](https://github.com/czg1225/SlimSAM) [demo](https://huggingface.co/Zigeng/SlimSAM-uniform-50) </details> <details> <summary>CLIP - Learning Transferable Visual Models From Natural Language Supervision </summary> [paper](https://arxiv.org/pdf/2103.00020) [code](https://github.com/openai/CLIP) [demo](https://huggingface.co/docs/transformers/model_doc/clip) </details> <details> <summary>Open CLIP - Reproducible scaling laws for contrastive language-image learning </summary> [paper](https://arxiv.org/pdf/2212.07143) [code](https://github.com/mlfoundations/open_clip) </details> <details> <summary>GLIGEN: Open-Set Grounded Text-to-Image Generation </summary> [paper](https://arxiv.org/abs/2301.07093) [code](https://github.com/microsoft/GLIP) </details> <details> <summary>DINO: End-to-End Object Detection with Transformers </summary> [paper](https://arxiv.org/pdf/2005.12872) [code](https://github.com/facebookresearch/detr) </details> <details> <summary>DINOv2: Learning Robust Visual Features without Supervision </summary> [paper](https://arxiv.org/pdf/2304.07193) [code](https://github.com/purnasai/Dino_V2) </details> <details> <summary>Prismatic VLMs: The Design Space of Visually-Conditioned Language Models </summary> [paper](https://arxiv.org/pdf/2402.07865) [code](https://github.com/TRI-ML/prismatic-vlms) </details> <details> <summary>Improved Baselines with Visual Instruction Tuning </summary> [paper](https://arxiv.org/pdf/2310.03744) [code](https://llava-vl.github.io) [model](https://llava-vl.github.io) [dataset](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) </details> <details> <summary>OpenVLA: An Open-Source Vision-Language-Action Model </summary> [paper](https://arxiv.org/pdf/2406.09246) [code](https://github.com/openvla/openvla) </details> <details> <summary>Nomic Embed: Training a Reproducible Long Context Text Embedder </summary> [paper](https://arxiv.org/pdf/2402.01613) </details> <details> <summary>MetaFormer Is Actually What You Need for Vision </summary> [paper](https://arxiv.org/pdf/2111.11418) [code](https://github.com/sail-sg/poolformer) </details> <details> <summary>Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs </summary> [paper](https://arxiv.org/pdf/2401.06209v1) [code](https://github.com/facebookresearch/detr) </details> <details> <summary>Other Resources</summary> [Dinov2 Image Retrieval Roboflow](https://github.com/roboflow/notebooks/blob/main/notebooks/dinov2-image-retrieval.ipynb) [DinoV2-Image Classification, Retrieval, Visualization, and Paper Review](https://purnasaigudikandula.medium.com/dinov2-image-classification-visualization-and-paper-review-745bee52c826) [Clip Image Retrieval](https://github.com/purnasai/CLIP_Image_Retrieval) [Hugging Face Open Clip](https://huggingface.co/models?library=open_clip&sort=trending) [Timmm Hugging Face Pretrained Models](https://huggingface.co/timm?search_models=siglip) [Phi-3-vision-128k-instruct-onnx-data](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cuda) [Phi-3-22b-128k](https://huggingface.co/ontocord/phi-3-22b-128k) [Onnxruntime-genai python examples](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python) [NanoLLaVA-1.5](https://huggingface.co/qnguyen3/nanoLLaVA-1.5) [BitsandBytes Tutorials and Guides](https://huggingface.co/docs/bitsandbytes/main/en/index) [Quantization Hugging Face Document](https://huggingface.co/docs/transformers/en/main_classes/quantization#transformers.BitsAndBytesConfig) </details> <details> <summary>AWS Deployment</summary> [Lambda Handler Tutorial](https://testdriven.io/blog/ml-model-aws-lambda/) [Serverless Framework Document](https://www.serverless.com/framework/docs) [Serverless Python Examples](https://github.com/serverless/examples/tree/v4/aws-python-http-api) </details> ## Natural Language Processing ## 3D Deep Learning ## Quantization ## Mathematics ### (Mixed) Integer Programming <details> <summary>Illinois Integer Programming Lectures Notes</summary> [lecture-notes](https://karthik.ise.illinois.edu/courses/ie511/ie511-sp-21.html) </details> <details> <summary>MIT Non-linear Programming</summary> [lecture-notes](https://ocw.mit.edu/courses/15-084j-nonlinear-programming-spring-2004/pages/lecture-notes/) </details> ### Combinatorial Optimization with ML <details> <summary>Embedded Mixed-Integer Quadratic Optimization Using the OSQP Solver</summary> [paper](https://cse.lab.imtlucca.it/~bemporad/publications/papers/ecc18-miosqp.pdf) [miosqp](https://github.com/osqp/miosqp/tree/master) </details> <details> <summary>Exact Combinatorial Optimization with Graph Convolutional Neural Networks</summary> [paper](https://arxiv.org/pdf/1906.01629v3) [learn2branch](https://github.com/ds4dm/learn2branch) [neurips2019](https://github.com/audreyanneguindon/NeurIPS_2019) </details> <details> <summary>Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies</summary> [paper](https://arxiv.org/pdf/2002.05120) [branch-search-trees](https://github.com/ds4dm/branch-search-trees) </details> <details> <summary>Learning to Pivot as a Smart Expert</summary> [paper](https://arxiv.org/pdf/2308.08171) </details> ## Graph Convolutional Neural Network <details> <summary>Semi-Supervised Classification with Graph Convolutional Networks</summary> [paper](https://arxiv.org/pdf/1609.02907) [gcn](https://github.com/tkipf/gcn) [pygcn](https://github.com/tkipf/pygcn) [document](https://tkipf.github.io/graph-convolutional-networks/) [github-topic](https://github.com/topics/graph-convolutional-networks) </details> <details> <summary>Just Jump: Dynamic Neighborhood Aggregation in Graph Neural Networks</summary> [paper](https://arxiv.org/pdf/1904.04849v2) [pytorch-geometric](https://github.com/pyg-team/pytorch_geometric) </details> <details> <summary>Graph neural networks: A review of methods and applications</summary> [paper](https://arxiv.org/pdf/1812.08434v6) [must-read](https://github.com/thunlp/GNNPapers) </details> <details> <summary>On Statistical Learning of Branch and Bound for Vehicle Routing Optimization</summary> [paper](https://arxiv.org/pdf/2310.09986) [ml4prv](https://github.com/isotlaboratory/ml4vrp?tab=readme-ov-file) </details> <details> <summary>Modeling Polypharmacy Side Effects with Graph Convolutional Networks</summary> [paper](https://arxiv.org/pdf/2310.09986) [decagon](https://github.com/mims-harvard/decagon) [document](https://snap.stanford.edu/decagon/) </details> <details> <summary>Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation</summary> [paper](https://arxiv.org/pdf/1806.02473) [rl_graph_generation](https://github.com/bowenliu16/rl_graph_generation/tree/master) </details> <details> <summary>DGL: A Graph-Centric, Highly-Performant Package for Graph Neural Networks</summary> [paper](https://arxiv.org/pdf/1909.01315) [dgl](https://github.com/dmlc/dgl/tree/master) </details> <details> <summary>Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation</summary> [paper](https://arxiv.org/pdf/1806.02473) [rl_graph_generation](https://github.com/bowenliu16/rl_graph_generation/tree/master) </details> <details> <summary>Inductive Representation Learning on Large Graph (GraphSAGE)</summary> [paper](https://arxiv.org/pdf/1806.02473) [dgl-graphsage](https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphsage) </details> <details> <summary>Other Resources</summary> [optimization](https://github.com/francisadrianviernes/Optimization/tree/main) [gilp](https://github.com/engri-1101/gilp) [tsp-solver](https://github.com/mostafabahri/tsp-solver) </details> ### Computer Graphics <details> <summary>Affine Transformation</summary> [Blog](https://www.algorithm-archive.org/contents/affine_transformations/affine_transformations.html) [Video](https://www.youtube.com/watch?v=AheaTd_l5Is&t=13s) </details>