# AliCoCo: Alibaba E-commerce Cognitive Concept Net (SIGMOD-2020)
- 4 Parts
- Items (Product)
- Taxonomy: Manual Defined High level category/attributes, e.g., Category->Dress, Time->Holiday, Time->Season, IP->Movie Star, Location, etc.
- Primitive Concept: Minied Lower level category/attributes, e.g., Outdoor, Barbecue, Winter, etc.
- E-commerce Concepts: scenarios, e.g., outdoor barbecue, christmas gifts
- Primitive Concept Discovery
- data
- get concept candidates from multiple sources: catalog attribute values, wikipedia, etc.
- distance supervision training data (6M sentences)
- matching attribute values with text (query/title/review)
- remove text with multiple matching
- model: learn BiLSTM-CRF model from distance supervision examples
- apply: run prediction on sentences, getting 2.7M primitive concepts
- Learning Primitive Concept Hierarchy: Hypernym Discovery
- algorithm: bi-linear matching
- data: active-learning (both most confident ones and most uncertain ones)
- E-commerce Concepts
- candidate generation
- AutoPhrase detection (TBD) from query/title/review
- Pattern generation based on Priminative Concepts
- classify: is it a good concept?
- model: Deep & Wide
- Wide: meta features
- Deep: LSTM of word embedding + wikipedia augmented Features
- Item Association
- data: data pairs of (queries, Concepts associated with higly clicked products)
- algorithm: text matching between query and concepts