1/30 報告日【Building Generative AI Applications with Gradio】

# 1/30 報告日【Building Generative AI Applications with Gradio】主編人：林士桓、正哲 [TOC] --- ## 簡報內容 1. Teds - [常見前端AI應用框架介紹](https://docs.google.com/presentation/d/1DrhlvhAERD_NCKtk_rsTjKhc6QMtnwfdQXLQOmrSPWk/edit#slide=id.p) 2. 正哲 - [LLM與Gradio實作-語言轉換-摘要工具介紹](https://docs.google.com/presentation/d/1wx5WnmGv1HNCXNM6GUj2rYOVWeBzJR7lUQOGYzC_F84/edit#slide=id.p1) 3. 君諦 - [Gradio應用實例](https://docs.google.com/presentation/d/1mLlv-zI3V_jibjbpQ54KukVGhxkx_LKFM1T2Ng53htU/edit#slide=id.p) 4. 昱誠 - 醫學領域分享 --> 未來展望 --- ## 課程簡介 **課程連結**: https://learn.deeplearning.ai/huggingface-gradio/lesson/1/introduction * Introduction * NLP Tasks interface * Image Captioning app * Image generation app * Describe and Generate Game * Chat with any LLM * Conclusion ## Introduction :::info Gradio 是一種簡單輕易快速建立前端介面的工具，這對於展示、測試和評估各種機器學習模型（如深度學習、計算機視覺、語音識別等）非常有用。 ::: Gradio 用法跟技巧參考來源 1. [Gradio Docs](https://www.gradio.app/docs/interface) 2. [Gradio Tutorial-ML UI Demo & Deployment](https://youtube.com/playlist?list=PLpdmBGJ6ELUJsU8e-B_QwokxOpWjzfYxZ&si=S2awgQi73ahK2Rr_) 3. [Gradio Crash Course](https://youtu.be/eE7CamOE-PA?si=fLTXwiCwh7fk6-Cc) ### Interface 基礎操作快速打造你的ML 應用介面，必須包含三個參數 fn, inputs, outputs。 `gradio.Interface(fn, inputs, outputs)` `demo.launch()`利用launch()啟動APP ```python= import gradio as gr def image_classifier(inp): return {'cat': 0.3, 'dog': 0.7} demo = gr.Interface(fn=image_classifier, inputs="image", outputs="label") demo.launch() ``` `gr.Text()` `gr.Image()` `gr.Audio()` `gr.slider()` `gr.block()` `gr.Chatbot()` ## 關鍵概念與定義 ### L1: 簡單介面的NLP應用利用transformer 的 [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines) 來建一個簡單的summary 工具，以下用HuggingFace API 來串pipeline([HF Access Tokens](https://huggingface.co/settings/tokens))。 ```python= ### Import libraries and set your API_KEY import os import io from IPython.display import Image, display, HTML from PIL import Image import base64 from dotenv import load_dotenv, find_dotenv import openai _ = load_dotenv(find_dotenv()) # read local .env file hf_api_key = os.environ['HF_API_KEY'] ``` Gradio interface with transformer pipeline ```python= import gradio as gr from transformers import pipeline get_completion = pipeline("summarization") # defaulted model="sshleifer/distilbart-cnn-12-6" def summarize(input): output = get_completion(input) return output[0]['summary_text'] gr.close_all() demo = gr.Interface(fn=summarize, inputs="textbox", outputs="text") demo.launch(share=True, server_port=5000) ``` ![image](https://hackmd.io/_uploads/Sy8741FFp.png) 這樣就可以簡單產出輸入與輸出的介面，但可以發現訊息的題詞上有點單調 ```python= import gradio as gr from transformers import pipeline get_completion = pipeline("summarization",) # defaulted model="shleifer/distilbart-cnn-12-6" def summarize(input): output = get_completion(input) return output[0]['summary_text'] gr.close_all() demo = gr.Interface(fn=summarize, inputs=[gr.Textbox(label="閱讀文章", lines=5, placeholder="Paste text here...")], outputs=[gr.Textbox(label="AI總結",lines=3, placeholder="Summary")], title="AI總結", description="用AI幫助你總結你的文章",)) demo.launch(share=True, server_port=5000) ``` ![image](https://hackmd.io/_uploads/rJ9iueYFa.png) 從上面可以可以看到透過label, placeholder 的方式可以在對話筐加入提示詞，可以更人性化的產出介面 **Building a Named Entity Recognition app** 這個部分就是要來帶大家找出文字中的實體(entity) ![image](https://hackmd.io/_uploads/rJAYsJYtT.png) ```python= import gradio as gr from transformers import pipeline get_completion = pipeline("ner", model="dslim/bert-base-NER") def ner(input): output = get_completion(input) return {"text": input, "entities": output} gr.close_all() demo = gr.Interface(fn=ner, inputs=[gr.Textbox(label="Text to find entities", lines=2)], outputs=[gr.HighlightedText(label="Text with entities")], title="NER with dslim/bert-base-NER", description="Find entities using the `dslim/bert-base-NER` model under the hood!", allow_flagging="never", #Here we introduce a new tag, examples, easy to use examples for your application examples=["My name is Andrew and I live in California", "My name is Poli and work at HuggingFace"]) demo.launch(share=True, server_port=5000) ``` ![image](https://hackmd.io/_uploads/SJGZnxKFp.png) 可以輕易的識別段落中的實體名稱，以下整理各類名稱的含義([連結](https://huggingface.co/dslim/bert-base-NER)) | Abbreviation | Description | | -------- | -------- | | O | Outside of a named entity | | B-MIS | Beginning of a miscellaneous entity right after another miscellaneous entity | |I-MIS | Miscellaneous entity | |B-PER|Beginning of a person’s name right after |another| person’s name |I-PER| Person’s name |B-ORG |Beginning of an organization right after |another |organization |I-ORG| organization |B-LOC |Beginning of a location right after |another |location |I-LOC |Location 如果你會發現有些字可能會被他切割掉，例如:Spotify -->Spot(B-ORG) ify(I-ORG)，這樣會顯得很不清楚，所以我們把這些名稱合併起來 ```python= import gradio as gr from transformers import pipeline get_completion = pipeline("ner", model="dslim/bert-base-NER") def merge_tokens(tokens): merged_tokens = [] for token in tokens: if merged_tokens and token['entity'].startswith('I-') and merged_tokens[-1]['entity'].endswith(token['entity'][2:]): # If current token continues the entity of the last one, merge them last_token = merged_tokens[-1] last_token['word'] += token['word'].replace('##', '') last_token['end'] = token['end'] last_token['score'] = (last_token['score'] + token['score']) / 2 else: # Otherwise, add the token to the list merged_tokens.append(token) return merged_tokens def ner(input): output = get_completion(input) merged_tokens = merge_tokens(output) return {"text": input, "entities": merged_tokens} gr.close_all() demo = gr.Interface(fn=ner, inputs=[gr.Textbox(label="Text to find entities", lines=2)], outputs=[gr.HighlightedText(label="Text with entities")], title="NER with dslim/bert-base-NER", description="Find entities using the `dslim/bert-base-NER` model under the hood!", allow_flagging="never", examples=["My name is Andrew, I'm building DeeplearningAI and I live in California", "My name is Poli, I live in Vienna and work at HuggingFace"]) demo.launch(share=True, server_port=5000) ``` ![image](https://hackmd.io/_uploads/SyGpRxYKa.png) ![image](https://hackmd.io/_uploads/r1RmZWKYa.png) ### L2: Image captioning app :::info 課程中的範例會出現502，應該是課程中的endpoint斷掉所致。所以我在huggingface 找llava model 來複現。 ::: ### L3: Image generation app 使用[Stable Diffusion model](https://huggingface.co/blog/stable_diffusion) 來生成圖片應用。點我觀看[Stable Diffusion 相關介紹文章](https://huggingface.co/blog/stable_diffusion) 安裝模組 ```bash pip install diffusers==0.11.1 pip install transformers scipy ftfy accelerate ``` 將pretrained model 載入，第一次執行需要花時間下載， ```python= import torch from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16) ``` ```python= prompt = "a photograph of an astronaut riding a horse" image = pipe(prompt).images[0] # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/) # Now to display an image you can either save it such as: image.save(f"astronaut_rides_horse.png") # or if you're in a google colab you can directly display it with image ``` ![image](https://hackmd.io/_uploads/SJ1NpZtKp.png) 這樣就可以做一個應用 ```python= import gradio as gr #A helper function to convert the PIL image to base64 #so you can send it to the API def base64_to_pil(img_base64): base64_decoded = base64.b64decode(img_base64) byte_stream = io.BytesIO(base64_decoded) pil_image = Image.open(byte_stream) return pil_image def generate(prompt): output = get_completion(prompt) result_image = base64_to_pil(output) return result_image gr.close_all() demo = gr.Interface(fn=generate, inputs=[gr.Textbox(label="Your prompt")], outputs=[gr.Image(label="Result")], title="Image Generation with Stable Diffusion", description="Generate any image with Stable Diffusion", allow_flagging="never", examples=["the spirit of a tamagotchi wandering in the city of Vienna","a mecha robot in a favela"]) demo.launch(share=True, server_port=int(os.environ['PORT1'])) ``` ![image](https://hackmd.io/_uploads/ry2OwMYtp.png) 我們可以用一個Block區塊，這個區塊可以來做參數的調整，像是滑桿`gr.Slider()`, 按鈕`gr.Button` ```python= with gr.Blocks() as demo: gr.Markdown("# Image Generation with Stable Diffusion") prompt = gr.Textbox(label="Your prompt") with gr.Row(): with gr.Column(): negative_prompt = gr.Textbox(label="Negative prompt") steps = gr.Slider(label="Inference Steps", minimum=1, maximum=100, value=25, info="In many steps will the denoiser denoise the image?") guidance = gr.Slider(label="Guidance Scale", minimum=1, maximum=20, value=7, info="Controls how much the text prompt influences the result") width = gr.Slider(label="Width", minimum=64, maximum=512, step=64, value=512) height = gr.Slider(label="Height", minimum=64, maximum=512, step=64, value=512) btn = gr.Button("Submit") with gr.Column(): output = gr.Image(label="Result") btn.click(fn=generate, inputs=[prompt,negative_prompt,steps,guidance,width,height], outputs=[output]) gr.close_all() demo.launch(share=True, server_port=int(os.environ['PORT3'])) ``` ![image](https://hackmd.io/_uploads/r1PdYMYta.png) ### L4 Describe and Generate Game :::info 這裏有些我搞不明白 blocks ::: ### L5 Chat with any LLM 我們可以利用`gr.Chatbot()`做出像聊天窗那樣與機器人對話，以下是課程上的範例，但是這邊注意，[勘誤]~~新版本的 `gr.ClearButton` 已經不能用了，可以用以下方式來清除歷史紀錄。 `clear = gr.Button("Clear")` `clear.click(lambda: chatbot.empty())`~~ ```python= import random import gradio as gr def respond(message, chat_history): #No LLM here, just respond with a random pre-made message bot_message = random.choice(["Tell me more about it", "Cool, but I'm not interested", "Hmmmm, ok then"]) chat_history.append((message, bot_message)) return "", chat_history with gr.Blocks() as demo: chatbot = gr.Chatbot(height=240) #just to fit the notebook msg = gr.Textbox(label="Prompt") btn = gr.Button("Submit") clear = gr.ClearButton(components=[msg, chatbot], value="Clear console") btn.click(respond, inputs=[msg, chatbot], outputs=[msg, chatbot]) msg.submit(respond, inputs=[msg, chatbot], outputs=[msg, chatbot]) #Press enter to submit gr.close_all() demo.launch(share=True, server_port=int(os.environ['PORT2'])) ``` ![image](https://hackmd.io/_uploads/SJzdO7KYp.png) 以下是串Openai API 來實作 ```python= import gradio as gr import openai from dotenv import load_dotenv, find_dotenv import os import random load_dotenv() openai.api_key = os.environ['API_KEY'] def get_completion(prompt, model="gpt-3.5-turbo"): messages = [{"role": "user", "content": prompt}] response = openai.chat.completions.create( model=model, messages=messages, temperature=0, ) return response.choices[0].message.content def respond(message, chat_history): try: mem = chat_history[-1][1] except: mem = "" bot_message = get_completion(mem + message) chat_history.append((message, bot_message)) return "", chat_history with gr.Blocks() as demo: chatbot = gr.Chatbot(height=240) #just to fit the notebook msg = gr.Textbox(label="Prompt") btn = gr.Button("Submit") clear = gr.Button("Clear") clear.click(lambda: chatbot.empty()) btn.click(respond, inputs=[msg, chatbot], outputs=[msg, chatbot]) msg.submit(respond, inputs=[msg, chatbot], outputs=[msg, chatbot]) #Press enter to submit gr.close_all() demo.launch(share=True) ``` ![image](https://hackmd.io/_uploads/SJd9tmYFT.png) **Streaming** ## 問題討論與反思 1. 課程中提到的一些蒸餾過程，什麼是distillation? - [Knowledge Distillation: Principles, Algorithms, Applications](https://neptune.ai/blog/knowledge-distillation) - MIT 課程[Knowledge Distillation | MIT 6.S965](https://youtu.be/tT9Lnt6stwA?si=sqDzXvrpbW2M1nv7) ![image](https://hackmd.io/_uploads/Hyao2ktKa.png) ## 介紹Transformers 的各種用法在實作中，如果要用huggingface 上的特定模型，就需要[HF Access Tokens](https://huggingface.co/settings/tokens)才能拿取。參考連結： 1. [Hugging Face + Langchain in 5 mins](https://youtu.be/_j7JEDWuqLE?si=8xiz4iW-mf7yNQP3) 2. [Getting Started With Hugging Face這篇說得滿清楚的](https://youtu.be/QEaBAZQCtwE?si=2Thg0BYosbOnjp_j) 3. [Pipeline Docs](https://huggingface.co/docs/transformers/main_classes/pipelines) 以下整理了Pipeline的不同玩法： - Text classification - Zero-shot classification - Text generation - Text completion (mask filling) - Token classification - Question answering - Summarization - Translation **Text classification** ```python= # 情感分析輸出正面或負面用pipeline 一次達成 classifier = pipeline("sentiment-analysis") res = classifier("I've been waiting for a HuggingFace course my whole life.") print(res[0]) # 用感情 mdoel_name = "distilbert-base-uncased-finetuned-sst-2-english" model = AutoModelForSequenceClassification.from_pretrained(mdoel_name) tokenizer = AutoTokenizer.from_pretrained(mdoel_name) classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer) res = classifier("I've been waiting for a HuggingFace course my whole life.") print(res[0]) ``` >{'label': 'POSITIVE', 'score': 0.9598049521446228} **Zero-shot classification** 可以自己定義你要的類別是什麼，並對於段文字的 ```python= # 給予類別中的其中一種 classifier = pipeline("zero-shot-classification") res = classifier( "This is a course about the Transformers library", candidate_labels=["education", "politics", "business"], ) print(res) ``` > {'sequence': 'This is a course about the Transformers library', 'labels': ['education', 'business', 'politics'], 'scores': [0.8445981740951538, 0.11197477579116821, 0.04342705383896828]} 從輸出的結果可以看到不同類別的分數，從而知道這段話的涵義屬於什麼樣的類別。 **Text Generation** 延伸你段落的文字 ```python= # 輸出生成的文字 generator=pipeline("text-generation", model="distilgpt2") res = generator("In this course, we will teach you how to", max_length=30, num_return_sequences=2) print(res[0]) ``` > {'generated_text': 'In this course, we will teach you how to learn as much as possible about the basics of physics: The basic physics of the universe, and the'} 從段落斷掉的地方開始生成文字來豐富內容 **Text completion (mask filling)** 填鴨式的內容將一些填空的問題補上，類似填鴨式問答 ```pyhton= # 填鴨式問答 unmasker = pipeline('fill-mask') #, model='bert-base-uncased') res = unmasker("I am a <mask> and I handle legal-related consultations.", top_k=2) print(res[0]) ``` > {'score': 0.742207944393158, 'token': 2470, 'token_str': ' lawyer', 'sequence': 'I am a lawyer and I handle legal-related consultations.'} 應用情景：在prompt 的處理可以透過transformer 讓整體的描述更加完整，在這裡雖然user 沒有表明他的職業，但可以用 `<mask>` 的方式讓整句看起來是完整的。 **Token classisfication** 簡單知道整段話有哪些字屬於人名、地名、組織等 ```python= # 識別文中的角色 ner = pipeline("ner", model='dbmdz/bert-large-cased-finetuned-conll03-english',grouped_entities=True) ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` > [{'entity_group': 'PER', 'score': 0.9981694, 'word': 'Sylvain', 'start': 11, 'end': 18}, {'entity_group': 'ORG', 'score': 0.9796019, 'word': 'Hugging Face', 'start': 33, 'end': 45}, {'entity_group': 'LOC', 'score': 0.9932106, 'word': 'Brooklyn', 'start': 49, 'end': 57}] **Question Answering** 利用問答的方式，快速在一段文字中獲得特定的答案 ```python= question_answerer = pipeline("question-answering") res = question_answerer( question="Where do I work?", context="My name is Sylvain and I work at Hugging Face in Brooklyn" ) print(res) ``` > {'score': 0.6949770450592041, 'start': 33, 'end': 45, 'answer': 'Hugging Face'} **Summarization** 簡易摘要應用 ```python= # Summarization from transformers import pipeline get_completion = pipeline("summarization") #, model="shleifer/distilbart-cnn-12-6") def summarize(input): output = get_completion(input) return output[0]['summary_text'] print(summarize("""Earlier this month, Spotify announced a massive reduction in its workforce to the tune of 17%. This is Spotify’s third round of layoffs this year alone. While this shouldn’t come as a surprise given the recent number of layoffs in the tech sector, what is surprising is that this round of layoffs came after Spotify posted its first profitable quarter since 2021. When business is booming, employers should be retaining workers, right? Not necessarily. One look at a company’s business model will tell you everything you need to know about how viable employment is with that company. Unfortunately, most of the business models that have proliferated in tech aren’t conducive to stable, long-term employment. This essay is going to analyze statements made by Spotify’s CEO that justify the company’s posture on its workforce. It reveals how C Suite executives are thinking about the future of work and how this will likely shape other sectors of the economy in the years ahead.""")) ``` > Spotify announced a massive reduction in its workforce to the tune of 17% earlier this month . This is Spotify’s third round of layoffs this year alone . Spotify posted its first profitable quarter since 2021 . Most of the business models in tech aren’t conducive to stable, long-term employment . **Translation** 在huggingface 上面尋找一些翻譯的模型([連結](https://huggingface.co/models?pipeline_tag=translation&sort=trending)) ```python= # translation from en to zh translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh") res = translator("Hugging Face is a technology company based in New York and Paris", max_length=40) print(res[0]['translation_text']) ``` > Hugggface是一家技术公司,总部设在纽约和巴黎 ## 參考資料 - [Huggingface Audio Course](https://huggingface.co/learn/audio-course/chapter0/introduction) -