Model Reprogramming

###### tags: `Progress` # Model Reprogramming ## Introduction * Deep learning in **resource-limited** domains still faces the following challenges including (i)**limited data**, (ii) **constrained model development cost**, and (iii) **lack of adequate pre-trained models for effective finetuning** * Model reprogramming enables resource-efficient cross-domain machine learning by **repurposing** and **reusing** **a well-developed pretrained model** from a source domain to solve tasks in a target domain without model finetuning. ![](https://i.imgur.com/VduAjDg.png)  ## Related work  ## Voice2Series: Reprogramming Acoustic Models for Time Series Classification [Link](https://arxiv.org/pdf/2106.09296.pdf) We propose Voice2Series (V2S), a novel end-to-end approach that **reprograms acoustic models for time series classification**, through input transformation learning and output label mapping ![](https://i.imgur.com/a47oq9e.png) ### V2S Reprogramming on Data Inputs Our V2S aims to find a **trainable input transformation function $\mathcal{H}$** that is universal to all target data inputs, which serves the purpose of reprogramming $x_t$ into the source data space $\mathcal{X}_{\mathcal{S}} \subseteq \mathbb{R}^{d_{\mathcal{S}}}$, where $d_{\mathcal{T}}<d_{\mathcal{S}}$. Specifically, the reprogrammed sample $x_t^{\prime}$ is formulated as $$ x_t^{\prime}=\mathcal{H}\left(x_t ; \theta\right):=\operatorname{Pad}\left(x_t\right)+\underbrace{M \odot \theta}_{\triangleq \delta} $$ where Pad($x_t$) is a **zero padding function** that outputs a zero-padded time series of dimension $d_{\mathcal{S}}$. The term $M \in\{0,1\}^{d_s}$ is a **binary mask** that indicates the location of $x_t$ in its zero-padded input Pad($x_t$ ), where the i-th entry of M is 0 if xt is present (indicating the entry is non-reprogrammable), and it is 1 otherwise. ### V2S Reprogramming on Acoustic Models (AMs) One can obtain the class prediction of the source model $f_{\mathcal{S}}$ on an reprogrammed target data sample $x_t$, denoted by $$ P\left(y_s \mid f_{\mathcal{S}}\left(\mathcal{H}\left(x_t ; \theta\right)\right)\right) \text {, for all } y_s \in \mathcal{Y}_{\mathcal{S}} $$ We assign a **(many-to-one) label mapping function $h$** to map source labels to target labels. For a target label $y_t \in \mathcal{Y}_{\mathcal{T}}$, its class prediction will be the averaged class predictions over the set of source labels assigned to it. We use the term $P\left(h\left(\mathcal{Y}_{\mathcal{S}}\right) \mid f_{\mathcal{S}}\left(\mathcal{H}\left(x_t ; \theta\right)\right)\right)$ to denote the prediction probability of the target task on the associated ground-truth target label $y_t=h\left(\mathcal{Y}_{\mathcal{S}}\right)$. Finally, we learn the optimal parameters $\theta^*$ for data input reprogramming by optimizing the following objective: $$ \theta^*=\arg \min _\theta \underbrace{-\log P\left(h\left(\mathcal{Y}_{\mathcal{S}}\right) \mid f_{\mathcal{S}}\left(\mathcal{H}\left(x_t ; \theta\right)\right)\right.}_{\text {V2S loss } \triangleq L} ; $$ $$ \text { where } h\left(\mathcal{Y}_{\mathcal{S}}\right)=y_t $$ ![](https://i.imgur.com/9Tz6iRi.png) ## Cross-modal Adversarial Reprogramming [Link](https://arxiv.org/pdf/2102.07325.pdf) With the abundance of **large-scale deep learning models**, it has become possible to **repurpose pre-trained networks for new tasks.** Recent works on adversarial reprogramming have shown that it is possible to **repurpose neural networks for alternate tasks without modifying the network architecture or parameters.** ![](https://i.imgur.com/nitkaBg.png) ## Theoretical Characterization of Model Reprogramming **Lemma 1:** Given a $K$-way neural network classifier $f(\cdot)=$ $\eta(z(\cdot))$. Let $\mu_z$ and $\mu_z^{\prime}$ be the **probability measures of the logit representations** $\{z(x)\}$ and $\left\{z\left(x^{\prime}\right)\right\}$ from two data domains $\mathcal{D}$ and $\mathcal{D}^{\prime}$, where $x \sim \mathcal{D}$ and $x^{\prime} \sim \mathcal{D}^{\prime}$. Assume independent draws for $x$ and $x^{\prime}$, i.e., $\Phi_{\mathcal{D}, \mathcal{D}^{\prime}}\left(x, x^{\prime}\right)=$ $\Phi_{\mathcal{D}}(x) \cdot \Phi_{\mathcal{D}^{\prime}}\left(x^{\prime}\right)$. Then $$ \mathbb{E}_{x \sim \mathcal{D}, x^{\prime} \sim \mathcal{D}^{\prime}}\left\|f(x)-f\left(x^{\prime}\right)\right\|_2 \leq 2 \sqrt{K} \cdot \mathcal{W}_1\left(\mu_z, \mu_z^{\prime}\right) $$ where $\mathcal{W}_1\left(\mu_z, \mu_z^{\prime}\right)$ is the Wasserstein-1 distance between $\mu_z$ and $\mu_z^{\prime}$. With Lemma 1, we now state the main theorem regarding an upper bound on population risk for reprogramming. **Theorem 1:** Let **$\delta^*$ denote the learned additive input transformation** for reprogramming. The population risk for the target task via reprogramming a $K$-way source neural network classifier $f_{\mathcal{S}}(\cdot)=\eta\left(z_{\mathcal{S}}(\cdot)\right)$, denoted by $\mathbb{E}_{\mathcal{D}_{\mathcal{T}}}\left[\ell_{\mathcal{T}}\left(x_t+\delta^*, y_t\right)\right]$, is upper bounded by $$ \mathbb{E}_{\mathcal{D}_{\mathcal{T}}}\left[\ell_{\mathcal{T}}\left(x_t+\delta^*, y_t\right)\right] \leq \underbrace{\epsilon_{\mathcal{S}}}_{\text {source risk }}+2 \sqrt{K} \cdot \underbrace{\mathcal{W}_1\left(\mu\left(z_{\mathcal{S}}\left(x_t+\delta^*\right)\right), \mu\left(z_{\mathcal{S}}\left(x_s\right)\right)\right)_{x_t \sim \mathcal{D}_{\mathcal{T}}, x_s \sim \mathcal{D}_{\mathcal{S}}}}_{\text {representation alignment loss via reprogramming }} $$ **The results suggest that reprogramming can attain better performance (lower risk) when the source model has a lower source loss and a smaller representation alignment loss.** ### Experiment * Statistics of the datasets used for our reprogramming tasks. We also include the test accuracy of both neural network based and TF-IDF based benchmark classifiers trained from scratch on the train set ![](https://i.imgur.com/u3XAudW.png) * Example outputs of our adversarial reprogramming function in both unbounded (top) and bounded (bottom) attack settings while reprogramming two different pre-trained image classifiers for a DNA sequence classification task (H3). ![](https://i.imgur.com/IiRCuqi.jpg) ## Summary and Takeaways * The model reprogramming provide promising and new the state-of-the-art results on **low-resource data.** * Trough trainable input noise, reprogramming requires **less trainable parameters** compared to model fine-tuning and training from scratch. * The neural saliency analysis demonstrates that the frozen pre-trained model also learn to recognize the target signal and could be further advance in **cross-domain learning** (e.g., speech and biomedical; image and genome) ## My Idea 1. Maybe extend Model Reprogramming to Adversarial example. 2. Utilize **Model Reprogramming** on Adversarial Purification, since Train diffusion model from scratch is time-consuming and require computational power. Below are the flow of Adversarial Purification. ![](https://i.imgur.com/q32xcio.png)