The 3rd Augmented Intelligence and Interaction (AII) Workshop

# The 3rd Augmented Intelligence and Interaction (AII) Workshop ###### tags: `shared` `workshop` [TOC] ## About this workshop + This workshop is held by Prof. Min Sun from June 30 to July 1 + Detailed information can be found in this page: http://aliensunmin.github.io/aii_workshop/3rd/ ## Keynote speech + K1: On Adversarial Learning - Traditional Model Learning - Maximum likelihood learning (with regularization): $\theta = argmax \quad \mathbb{E}_q \log p \Rightarrow argmin \quad KL(q\Vert p)$ - Likely that the samples drawn from model distribution will NOT look realistic - Model distribution will not have mass where there’s no support of the true distribution - How about interchange p (model) & q (true)? (q is not available though) - Likely to look realistic, but may only represent (very small) a subset of possible data - Use entropy of p as regularizer to let p (model) spread out - theta = argmax E_p log (q/p) = argmin KL(p||q) - Problem: not knowing q => Is there a method to know q? - Implementation details - Critic: g(x) = log (q/p) - Optimal classifier should be able to discriminate between real/fake samples - i.e., GAN - Q: What is the role for GAN in classification? - GAN is a way to synthesize data - Small unlabelled and large labelled learning => semi-supervised learning + K2: Object-Preserving Cross-Domain Image Translation for Adaptive Object Detection - Domain adaptation in object detection - Paired/Un-paired training images; un-paired case is more practical - Multimodal image translation - GAN, conditional GAN (to different domain), cycle GAN, AugGAN ## Invited speech + Meta Learning of Figure-Ground Segmentation - Region-of-interest from user feedback (whether is in the region (yes/no)) - Feedback Segmentation using Transductive Learning (whether a point is in the region) - SwipeCut (whether line is in the region of interest) - Tap&Shoot (tapping focus) - Learning by Editing (unsupervised learning with GAN): Visual-Effect GAN (VEGAN) + On Manageable Visual Storytelling - Given photos => text story - One story is not multiple image captions - Cohesion & coherence, creativeness, visual: grounding - E2E models are hardly manageable - Preliminary results showed - Model does NOT know how to describe things NOT in the training set - Need more data (due to small datasets) - Divide and conquer: image/scene understanding -> story generation - Semantic layer in between these two steps: FrameNet Terms (verbs and nouns) + Network Representation Learning and its Applications - Network embedding applications: user identification (sharing accounts) (SIGIR) - Representing data as a heterogeneous net - Nodes: items/meta info - Find mappings for nodes to low-dimensional representation - User ID as ground truth - Hybrid account-user recommender - MARINE (WWW) + Research at Taiwan AI Labs: Music AI - Pipeline: audio in, audio out - Source separation -> music transcription -> composition -> synthesis + Exploration via Flow-Based Intrinsic Rewards - Curiosity-Driven exploration challenges + AutoML: Who is Designing Your Neural Net - AutoML - Human should focus on problem formulation - AutoML in industry: Google cloud AutoML - Neural Architecture Search (NAS) - Automating architecture design - Subfield of AutoML - RL-based & EA-based approaches - NAS: recent trends - Multi-objective NAS - Distribution of architectures - Accelerating NAS - What’s the next wave - Federated learning + NAS + Towards Unsupervised Speech Recognition - Why unsupervised learning? - More than 7000 languages - Labelling is labour intensive - Acoustic token discovery - Problem: token is not readable - To speech recognition: need a table between tokens and texts - Introduction of GAN: find a better and better mapping network - Through a totally unsupervised learning - Jointly learn token discovery and mapping table - Learn from itself - Pseudo labels - Bootsrapping - How about semi-supervised learning? + Goal-Driven-Based Speech Enhancement and its Applications to Assistive Hearing Device - Speech enhancement - Replacing the original norm objective constraints to other specific goals + A Semantic Approach to Abstractive Summarization - Extractive/Abstractive - LCSTS Chinese Text Summarization Datasets