22# Knowledge Editting Research
## Content
- [Files](#Files)
- [6 March 2024](#6-Mar-2024)
- [13 March 2024](#13-Mar-2024)
- [20 March 2024](#20-Mar-2024)
- [3 April 2024](#3-Apr-2024)
- [10 April 2024](#10-Apr-2024)
- [17 April 2024](#17-Apr-2024)
- [22 June 2024](#22-Jun-2024)
- [29 June 2024](#29-Jun-2024)
- [6 Jul 2024](#6-Jul-2024)
- [13 Jul 2024](#13-Jul-2024)
- [***20 Jul 2024 (Latest)***](#20-Jul-2024)
## Files
- [github](https://github.com/samuellau0802/urop)
- [google drive](https://drive.google.com/drive/folders/1COsDGJEIcrlx7d1kDXyV9tNronEKcoEs?usp=sharing) (for testing a few samples before running in server)
## Notes
<details>
<summary>6 Mar 2024</summary>
## 6 Mar 2024
### Paper reading
- Memory-Based Model Editing at Scale https://arxiv.org/pdf/2206.06520.pdf
- FAST MODEL EDITING AT SCALE https://arxiv.org/pdf/2110.11309.pdf
- About E-commerce: https://arxiv.org/pdf/2307.09688.pdf
https://aclanthology.org/2023.findings-acl.76.pdf
### Meeting
- Using KG to do knowledge editing
### Todo
- Look into data, eg. Folkscope
</details>
<details>
<summary>13 Mar 2024</summary>
## 13 Mar 2024
### Data
- Questions: examples for knowledge editing?
- If the editing part is about a product dimension, it is not conceptualizable?
- https://www.aboutamazon.com/news/retail/amazon-rufus
### Model
- Looked into VERA
- https://huggingface.co/liujch1998/vera
### Todo
- https://nijianmo.github.io/amazon/index.html
- Llava
- 
</details>
<details>
<summary>20 Mar 2024</summary>
## 20 Mar 2024
### Flow
1. got amazon data 2023
- https://amazon-reviews-2023.github.io/main.html
- downloaded electronics for trial
2. data input formating
- used include name and features in prompt
- also included product images
- format to llava evaluation format
3. evaluate data in llava
- reuse model_vqa.py, only changed a few lines
### Code
- https://drive.google.com/drive/folders/1COsDGJEIcrlx7d1kDXyV9tNronEKcoEs?usp=drive_link
### Questions
1. using VSCode to access cloud?
2. Currently the eval is using around 17GB of GPU, cloud only serves 11GB per GPU, any suggestions to split?
### Todo
- try another method: let llava generate features, use another model (chatgpt, llama, mistral) to determine whether the product has this feature
</details>
<details>
<summary>3 Apr 2024</summary>
## 3 Apr 2024
### Flow
1. Format Amazon 2023 dataset
2. evluate data in llava, asking it to generate features
- 1000 products is asked, ~10k feats is generated (2 hours with A6000)
3. Used chatgpt to answering whether the product has this feature, yes or no
- ~6k feats (600 products) is asked, 80% yes, 20% no
### Code and Data
- [Google Drive Link](https://drive.google.com/drive/folders/1nXhYO9Pw4DHHexTOT0Bveu0iUWehk4wT?usp=drive_link)
### Todo
- Minor fix in chatgpt
- temperature
- each time only ask 1 question
- Apart from yes or no, ask llm to give reason / give the correct features
- Use the "features" and "details" in amazon dataset and ask gpt/mistral whether it is correct (yes/no)
### Goal
- Have 3 sets of data
1. llava gen features, llm answer have or not
2. "features" in dataset, llm answer yes or no
3. "details" in dataset, llm answer yes or no
</details>
<details>
<summary>10 Apr 2024</summary>
## 10 Apr 2024
### Two roadmaps for our paper:
1. We propose a new benchmark dataset + a new multi-modal knowledge editing framework
+ A benchmark dataset, focus on E-commerce domain, multi-dimensional, multi-modal, multi-domains
+ A new KE method, **better than previous methods** (difficult)
+ Our data + Our method -> E-commerce-expert LVLM
2. We propose a new benchmark dataset + We do comprehensive experiments over current knowledge editing methods
+ A benchmark dataset, focus on E-commerce domain, multi-dimensional, multi-modal, multi-domains
+ *Ideally*, current KE methods cannot perform well on our dataset --imply--> more attention / more method should be studied in future works to benefit LVLM in E-commerce
### Todo
- scale up benchmark dataset, for every domain, sample 1000
- Baseline on dataset: ROME, MEMIT
+ [MEND](https://github.com/eric-mitchell/mend)
+ [MEMIT](https://github.com/kmeng01/memit) Self-contain ROME
+ [ROME](https://github.com/kmeng01/rome)
### Paper
+ [Can We Edit Multimodal Large Language Models?](https://arxiv.org/pdf/2310.08475.pdf)
+ [Editing Conceptual Knowledge for Large Language Models](https://arxiv.org/pdf/2403.06259.pdf)

### Recent Interesting Papers (Optional)
+ [EVEDIT: Event-based Knowledge Editing with Deductive Editing
Boundaries](https://arxiv.org/pdf/2402.11324.pdf)
+ [Event-level Knowledge Editing](https://arxiv.org/pdf/2402.13093v1.pdf)
+ [ACL 2024 Paper Review](https://mighty-weaver.github.io/files/review.pdf)
</details>
<details>
<summary>17 Apr 2024</summary>
## 17 Apr 2024
### Current Limitations
- Editing multimodal is hard, especially in vision moedule (it is just an input to LLM)
- Editing concepts is hard
- conceptual knowledge: early layers in attention layer(?)
Paper Overall Pipeline:
0. Select 5 domains, for every domain, we select 2000 products.
- Clothing_Shoes_and_Jewelry
- Home_and_Kitchen
- Electronics
- Industrial_and_Scientific
- Sports_and_Outdoors
1. We propose to use a stronger teacher LLM to supervise the "output" of weak LLMs.
- meta-llama/Llama-2-7b-chat-hf
- mistralai/Mistral-7B-Instruct-v0.2
- tiiuae/falcon-7b-instruct
- google/gemma-1.1-7b-it
**Task 1**: Implement an unified framework to (1) prompt LLM about the features/shopping intentions of the product. (2) Ask ChatGPT to discriminate the plausibility of such generations.
*prompt LLM about the features/shopping intentions of the product*: Prompt LLM generate product details / judge correct or not. Prompt LLM generate intention (A customer buy [product], the intention of purchasing this product is)
*Ask ChatGPT to discriminate the plausibility of such generations*: Mistral: A customer buy an iPhone 14, the intention of purchasing this product is: the customer **wants to have a cool device.**
Prompt ChatGPT: If a customer **wants to have a cool device**, would the customer purchase an iPhone 14? Answer Yes or No only. If no, also generate a rationale behind.
2. We propose to integrate product/intention conceptualization and plausibility estimation into knowledge editing for improving LLM's E-commerce understanding.
- For three products: Iphone 15, iPad 2023, Macbook Pro.
- Conceptualize them into: Apple Products
- Instantiate them into: Airpods
- Iphone 15 can make call
- Subtitute Iphone 15 with Airpods: *Airpods can make call*
- [VERA](liujch1998/vera) discriminate the correctness of *Airpods can make call*
3. Evaluation:
- Intrinsic Evaluation: Knowledge editing data: evaluate locality, generality, ..., ...
- Extrinsic Evaluation:
- ProductQA (We edit product details -> improve product understanding) (Difficult to improve, because many previous works have covered)
- IntentionQA (We edit product intention/details -> improve purchase intention understanding) (Easy to improve, just proposed by ourselves, no competition with others)
## Todo
1. conceptualize the products with wrong intention/description
2. test baselines: 1. MEMIT, ...
3. Evaluation: locality, generality..., IntentionQA
</details>
<details>
<summary>22 Jun 2024</summary>
## 22 Jun 2024
### Update
- llava to generate correct feature
- seperate product specific and concept
### Questions
1. How to ensure that the conceptualized products are also misidentified by the weak LLM?
- since if the LLM is correct at first, there should be no need to edit?
- or it does not matter(?)
2. What should be correct feature/intention?
- We only have the wrong feature now, but not the correct one(?)
- What should be the structure of the prompt when doing editting
3. Timeline
</details>
<details>
<summary>29 Jun 2024</summary>
## 29 Jun 2024
### Update
- Got the correct KE answer by asking LLAVA or ChatGPT
- If it is a specific product, we provide the product image and ask LLAVA what should be the correct feature/intention
- If it is a conceptualized product/ product type, we ask ChatGPT to provide a correct feature/intention
- $~30 HKD for 23k products for ChatGPT
- ~10k products for LLAVA
- So we have got the wrong features/intention and also the correct ones
- [Data (ChatGPT)](https://github.com/samuellau0802/urop/blob/main/correct_ans/gpt_get_correct_ans.csv)
- [Data (LLAVA)](https://github.com/samuellau0802/urop/blob/main/correct_ans/llava_answer.jsonl)
### Todo
- Working on testing MEMIT, and other baselines
</details>
<details>
<summary>6 Jul 2024</summary>
## 6 Jul 2024
### Update
- Tried MEMIT on the default model
- 6 data/min -> for 30k data, 3.5 days
- Problems:
- dataset quality
- VPN need to extend session to continously run
- tmux
- OOM: trimming input, segmentation
- layer
- input length
- batch size
### Todo
- easyEdit
- llama factory
</details>
<details>
<summary>13 Jul 2024</summary>
## 13 Jul 2024
### Update
- Tidied the dataset, remove some poor quality data
- Tried out easyedit (it’s quite straight forward to use, but I am still checking the codes as sometimes it still have OOM/ takes a lot of time to run)
- Tried reducing layers to edit (it’s faster), increasing batch size ( reduce quite a lot of time to edit)
- Mmulti-gpu works (usually it’s using 30GB in 0 and 16GB in 1), but the main problem is if during the editting process, so others have a new process using gpu, OOM will occur)
- Furthermore, regarding the dataset, I still need to construct the locality inputs and portability inputs for metrics, I plan to use GPT (the teacher LLM) to do so
### Todo
- continue reviewing the code to reduce memory and time to edit (becoz for memit github repo there is a num_edits hyperparameter to determine how many edits to do at once, but I dont find it in easyedit. So right now memit in easyedit is editing one by one. Setting num_edits should greatly reduce the time to edit)
- expanding the dataset to include locality and portability inputs
</details>
## 20 Jul 2024
### Update
- Continue worked on EasyEdit (MEMIT): reviewing the code
- Locality and portability: Reading papers
- https://arxiv.org/pdf/2305.13172
1. time used
2. locality
- Other Attribution: "what is the ___ for {product}"
- Distracting Neighbour
4. portability
- Subject Replace
- Reversed Relation: "One of the products that have ____ is {product}"
- One-hop Reason
### Todo
-