UROP 1100: Knowledge Editting

22# Knowledge Editting Research ## Content - [Files](#Files) - [6 March 2024](#6-Mar-2024) - [13 March 2024](#13-Mar-2024) - [20 March 2024](#20-Mar-2024) - [3 April 2024](#3-Apr-2024) - [10 April 2024](#10-Apr-2024) - [17 April 2024](#17-Apr-2024) - [22 June 2024](#22-Jun-2024) - [29 June 2024](#29-Jun-2024) - [6 Jul 2024](#6-Jul-2024) - [13 Jul 2024](#13-Jul-2024) - [***20 Jul 2024 (Latest)***](#20-Jul-2024) ## Files - [github](https://github.com/samuellau0802/urop) - [google drive](https://drive.google.com/drive/folders/1COsDGJEIcrlx7d1kDXyV9tNronEKcoEs?usp=sharing) (for testing a few samples before running in server) ## Notes <details> <summary>6 Mar 2024</summary> ## 6 Mar 2024 ### Paper reading - Memory-Based Model Editing at Scale https://arxiv.org/pdf/2206.06520.pdf - FAST MODEL EDITING AT SCALE https://arxiv.org/pdf/2110.11309.pdf - About E-commerce: https://arxiv.org/pdf/2307.09688.pdf https://aclanthology.org/2023.findings-acl.76.pdf ### Meeting - Using KG to do knowledge editing ### Todo - Look into data, eg. Folkscope </details> <details> <summary>13 Mar 2024</summary> ## 13 Mar 2024 ### Data - Questions: examples for knowledge editing? - If the editing part is about a product dimension, it is not conceptualizable? - https://www.aboutamazon.com/news/retail/amazon-rufus ### Model - Looked into VERA - https://huggingface.co/liujch1998/vera ### Todo - https://nijianmo.github.io/amazon/index.html - Llava - ![image](https://hackmd.io/_uploads/SJyjRaPCT.png) </details> <details> <summary>20 Mar 2024</summary> ## 20 Mar 2024 ### Flow 1. got amazon data 2023 - https://amazon-reviews-2023.github.io/main.html - downloaded electronics for trial 2. data input formating - used include name and features in prompt - also included product images - format to llava evaluation format 3. evaluate data in llava - reuse model_vqa.py, only changed a few lines ### Code - https://drive.google.com/drive/folders/1COsDGJEIcrlx7d1kDXyV9tNronEKcoEs?usp=drive_link ### Questions 1. using VSCode to access cloud? 2. Currently the eval is using around 17GB of GPU, cloud only serves 11GB per GPU, any suggestions to split? ### Todo - try another method: let llava generate features, use another model (chatgpt, llama, mistral) to determine whether the product has this feature </details> <details> <summary>3 Apr 2024</summary> ## 3 Apr 2024 ### Flow 1. Format Amazon 2023 dataset 2. evluate data in llava, asking it to generate features - 1000 products is asked, ~10k feats is generated (2 hours with A6000) 3. Used chatgpt to answering whether the product has this feature, yes or no - ~6k feats (600 products) is asked, 80% yes, 20% no ### Code and Data - [Google Drive Link](https://drive.google.com/drive/folders/1nXhYO9Pw4DHHexTOT0Bveu0iUWehk4wT?usp=drive_link) ### Todo - Minor fix in chatgpt - temperature - each time only ask 1 question - Apart from yes or no, ask llm to give reason / give the correct features - Use the "features" and "details" in amazon dataset and ask gpt/mistral whether it is correct (yes/no) ### Goal - Have 3 sets of data 1. llava gen features, llm answer have or not 2. "features" in dataset, llm answer yes or no 3. "details" in dataset, llm answer yes or no </details> <details> <summary>10 Apr 2024</summary> ## 10 Apr 2024 ### Two roadmaps for our paper: 1. We propose a new benchmark dataset + a new multi-modal knowledge editing framework + A benchmark dataset, focus on E-commerce domain, multi-dimensional, multi-modal, multi-domains + A new KE method, **better than previous methods** (difficult) + Our data + Our method -> E-commerce-expert LVLM 2. We propose a new benchmark dataset + We do comprehensive experiments over current knowledge editing methods + A benchmark dataset, focus on E-commerce domain, multi-dimensional, multi-modal, multi-domains + *Ideally*, current KE methods cannot perform well on our dataset --imply--> more attention / more method should be studied in future works to benefit LVLM in E-commerce ### Todo - scale up benchmark dataset, for every domain, sample 1000 - Baseline on dataset: ROME, MEMIT + [MEND](https://github.com/eric-mitchell/mend) + [MEMIT](https://github.com/kmeng01/memit) Self-contain ROME + [ROME](https://github.com/kmeng01/rome) ### Paper + [Can We Edit Multimodal Large Language Models?](https://arxiv.org/pdf/2310.08475.pdf) + [Editing Conceptual Knowledge for Large Language Models](https://arxiv.org/pdf/2403.06259.pdf) ![image](https://hackmd.io/_uploads/rkgjBK7xA.png) ### Recent Interesting Papers (Optional) + [EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries](https://arxiv.org/pdf/2402.11324.pdf) + [Event-level Knowledge Editing](https://arxiv.org/pdf/2402.13093v1.pdf) + [ACL 2024 Paper Review](https://mighty-weaver.github.io/files/review.pdf) </details> <details> <summary>17 Apr 2024</summary> ## 17 Apr 2024 ### Current Limitations - Editing multimodal is hard, especially in vision moedule (it is just an input to LLM) - Editing concepts is hard - conceptual knowledge: early layers in attention layer(?) Paper Overall Pipeline: 0. Select 5 domains, for every domain, we select 2000 products. - Clothing_Shoes_and_Jewelry - Home_and_Kitchen - Electronics - Industrial_and_Scientific - Sports_and_Outdoors 1. We propose to use a stronger teacher LLM to supervise the "output" of weak LLMs. - meta-llama/Llama-2-7b-chat-hf - mistralai/Mistral-7B-Instruct-v0.2 - tiiuae/falcon-7b-instruct - google/gemma-1.1-7b-it **Task 1**: Implement an unified framework to (1) prompt LLM about the features/shopping intentions of the product. (2) Ask ChatGPT to discriminate the plausibility of such generations. *prompt LLM about the features/shopping intentions of the product*: Prompt LLM generate product details / judge correct or not. Prompt LLM generate intention (A customer buy [product], the intention of purchasing this product is) *Ask ChatGPT to discriminate the plausibility of such generations*: Mistral: A customer buy an iPhone 14, the intention of purchasing this product is: the customer **wants to have a cool device.** Prompt ChatGPT: If a customer **wants to have a cool device**, would the customer purchase an iPhone 14? Answer Yes or No only. If no, also generate a rationale behind. 2. We propose to integrate product/intention conceptualization and plausibility estimation into knowledge editing for improving LLM's E-commerce understanding. - For three products: Iphone 15, iPad 2023, Macbook Pro. - Conceptualize them into: Apple Products - Instantiate them into: Airpods - Iphone 15 can make call - Subtitute Iphone 15 with Airpods: *Airpods can make call* - [VERA](liujch1998/vera) discriminate the correctness of *Airpods can make call* 3. Evaluation: - Intrinsic Evaluation: Knowledge editing data: evaluate locality, generality, ..., ... - Extrinsic Evaluation: - ProductQA (We edit product details -> improve product understanding) (Difficult to improve, because many previous works have covered) - IntentionQA (We edit product intention/details -> improve purchase intention understanding) (Easy to improve, just proposed by ourselves, no competition with others) ## Todo 1. conceptualize the products with wrong intention/description 2. test baselines: 1. MEMIT, ... 3. Evaluation: locality, generality..., IntentionQA </details> <details> <summary>22 Jun 2024</summary> ## 22 Jun 2024 ### Update - llava to generate correct feature - seperate product specific and concept ### Questions 1. How to ensure that the conceptualized products are also misidentified by the weak LLM? - since if the LLM is correct at first, there should be no need to edit? - or it does not matter(?) 2. What should be correct feature/intention? - We only have the wrong feature now, but not the correct one(?) - What should be the structure of the prompt when doing editting 3. Timeline </details> <details> <summary>29 Jun 2024</summary> ## 29 Jun 2024 ### Update - Got the correct KE answer by asking LLAVA or ChatGPT - If it is a specific product, we provide the product image and ask LLAVA what should be the correct feature/intention - If it is a conceptualized product/ product type, we ask ChatGPT to provide a correct feature/intention - $~30 HKD for 23k products for ChatGPT - ~10k products for LLAVA - So we have got the wrong features/intention and also the correct ones - [Data (ChatGPT)](https://github.com/samuellau0802/urop/blob/main/correct_ans/gpt_get_correct_ans.csv) - [Data (LLAVA)](https://github.com/samuellau0802/urop/blob/main/correct_ans/llava_answer.jsonl) ### Todo - Working on testing MEMIT, and other baselines </details> <details> <summary>6 Jul 2024</summary> ## 6 Jul 2024 ### Update - Tried MEMIT on the default model - 6 data/min -> for 30k data, 3.5 days - Problems: - dataset quality - VPN need to extend session to continously run - tmux - OOM: trimming input, segmentation - layer - input length - batch size ### Todo - easyEdit - llama factory </details> <details> <summary>13 Jul 2024</summary> ## 13 Jul 2024 ### Update - Tidied the dataset, remove some poor quality data - Tried out easyedit (it’s quite straight forward to use, but I am still checking the codes as sometimes it still have OOM/ takes a lot of time to run) - Tried reducing layers to edit (it’s faster), increasing batch size ( reduce quite a lot of time to edit) - M⁠multi-gpu works (usually it’s using 30GB in 0 and 16GB in 1), but the main problem is if during the editting process, so others have a new process using gpu, OOM will occur) - ⁠⁠Furthermore, regarding the dataset, I still need to construct the locality inputs and portability inputs for metrics, I plan to use GPT (the teacher LLM) to do so ### Todo - continue reviewing the code to reduce memory and time to edit (becoz for memit github repo there is a num_edits hyperparameter to determine how many edits to do at once, but I dont find it in easyedit. So right now memit in easyedit is editing one by one. Setting num_edits should greatly reduce the time to edit) - ⁠⁠expanding the dataset to include locality and portability inputs </details> ## 20 Jul 2024 ### Update - Continue worked on EasyEdit (MEMIT): reviewing the code - Locality and portability: Reading papers - https://arxiv.org/pdf/2305.13172 1. time used 2. locality - Other Attribution: "what is the ___ for {product}" - Distracting Neighbour 4. portability - Subject Replace - Reversed Relation: "One of the products that have ____ is {product}" - One-hop Reason ### Todo -

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.