hehhehehhkakakkak.stupid.wednesday.report - HackMD

<style> .spinner { /* margin: 20px; width: 100px; */ /* height: 100px; */ /* background: #f00; */ -webkit-animation-name: spin; -webkit-animation-duration: 4000ms; -webkit-animation-iteration-count: infinite; -webkit-animation-timing-function: linear; -moz-animation-name: spin; -moz-animation-duration: 4000ms; -moz-animation-iteration-count: infinite; -moz-animation-timing-function: linear; -ms-animation-name: spin; -ms-animation-duration: 4000ms; -ms-animation-iteration-count: infinite; -ms-animation-timing-function: linear; animation-name: spin; animation-duration: 4000ms; animation-iteration-count: infinite; animation-timing-function: linear; } .spinner-hover:hover { /* margin: 20px; width: 100px; */ /* height: 100px; */ /* background: #f00; */ -webkit-animation-name: spin; -webkit-animation-duration: 4000ms; -webkit-animation-iteration-count: infinite; -webkit-animation-timing-function: linear; -moz-animation-name: spin; -moz-animation-duration: 4000ms; -moz-animation-iteration-count: infinite; -moz-animation-timing-function: linear; -ms-animation-name: spin; -ms-animation-duration: 4000ms; -ms-animation-iteration-count: infinite; -ms-animation-timing-function: linear; animation-name: spin; animation-duration: 4000ms; animation-iteration-count: infinite; animation-timing-function: linear; } @-ms-keyframes spin { from { -ms-transform: rotate(0deg); } to { -ms-transform: rotate(360deg); } } @-moz-keyframes spin { from { -moz-transform: rotate(0deg); } to { -moz-transform: rotate(360deg); } } @-webkit-keyframes spin { from { -webkit-transform: rotate(0deg); } to { -webkit-transform: rotate(360deg); } } @keyframes spin { from { transform:rotate(0deg); } to { transform:rotate(360deg); } } </style> <p class="spinner" style="color: indigo; background: linear-gradient(45deg, #ff0000, #ff7700, #ffcc00, #33cc33, #3366cc, #9900cc); text-align: center; font-family: Consolas, sans-serif; font-size: 120px; font-weight: bold; margin-bottom: 20px;">星期三專題報告<p> <p style="background: linear-gradient(45deg, #ff0000, #ff7700, #ffcc00, #33cc33, #3366cc, #9900cc); -webkit-background-clip: text; color: transparent; text-align: center; font-family: consolas, sans-serif; font-size: 24px; font-weight: bold; margin-bottom: 20px;">final-edit<5>th</p> <img class="spinner" style="width: 300px;" src="https://hackmd.io/_uploads/ByqAHX4Bp.png"> --- <p>Project:</p> <div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div> <div> <p>MANGA PREDICTION 【漫畫劇情預測與生成】</p> <div class="spinner" style="color: #f00; margin: 50px; background: #f00; width: 100px; height: 100px;"><div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">yya<div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div><div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div><div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;"><div class="spinner" style="color: #fff; margin: 20px; background: green; width: 30px; height: 30px;">o</div></div></div></div> </div> --- <p>研究動機</p> <div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div> <p style="font-size: 30px;">從前，有一位天生害羞的台灣學生，名叫蔡欣翰。蔡欣翰性格孤僻，當獨自一人時，總是沉浸在漫畫的世界，生活充滿了迷人的幻想和色彩。然而，他的家境並不富裕，經濟上的壓力使他無法盡情追逐自己的興趣。有一天，蔡欣翰神情愁容滿面，彷彿隨時有一場風暴在他心中蔓延。老師留心到他的情緒變化，為了幫助他，特地派來一位經驗豐富的教官。這位教官了解到蔡欣翰的處境，發現他家裡一片冷清，生活陷入經濟的迫切需求中。蔡欣翰的唯一慰藉就是漫畫，但因為經濟原因，沒有辦法有多餘的餘裕讓他從事他的興趣--看漫畫。但是他覺得如果他沒辦法看到漫畫的結局，他這生就白活了。所以我們決定透過我們的研究來幫助他，讓她不會自尋死路，或是成為曝險少年。</p> --- <p>研究目的</p> <div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div> 1. 透過各種seq2seq模型預測並畫出漫畫的後續與結局 2. 滿足可憐的窮學生的夢想 --- <p>研究器材</p> <div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div> 1. Tensorflow/PyTorch 2. Colab 3. DiscordFS-SFTP ([https://github.com/TWTom041/DiscordFS-SFT](https://www.youtube.com/watch?v=mwSq9dhpn8Y)[P](https://github.com/TWTom041/DiscordFS-SFTP)) --- <p>過程與方法</p> <div class="spinner" style="color: #fff; margin: 20px; background: indigo; width: 50px; height: 50px;">o</div> --- <h2 class="spinner-hover" style=" font-family: Papyrus, fantasy; font-style: normal; font-variant: normal; font-weight: 400; line-height: 20px;"> dataset </h2> <span class="spinner-hover" style="font-family: Papyrus, fantasy; font-size: 42px; font-style: normal; font-variant: normal; font-weight: 400; line-height: 20px;"> [Kaggle Japanese Manga Dataset](https://www.kaggle.com/datasets/chandlertimm/unified/data) </span> <img class="spinner-hover" src="https://hackmd.io/_uploads/SkGCi74r6.jpg" style="width: 600px;"> --- ## related research ![圖片](https://hackmd.io/_uploads/HJMGTXEra.png) <img src="https://media.discordapp.net/attachments/1181782367694761994/1181782519100739715/image.png?ex=65824fbd&is=656fdabd&hm=854de47ad267b28ea5c6145865d8feeaa980cba0957529ca03d7a69b9c6d5bf7&=&format=webp&quality=lossless&width=1754&height=694"> --- <img src="https://media.discordapp.net/attachments/1181782367694761994/1181782414805192816/electronics-11-00764-g001.png?ex=65824fa4&is=656fdaa4&hm=0f5d9f68d21ef0cdb5792b989b2506c2131779187f40f65b354f5fc03c1c00cc&=&format=webp&quality=lossless&width=2074&height=1276"> --- ## model ![圖片](https://hackmd.io/_uploads/BkYr6QNB6.png =350x400)![圖片](https://hackmd.io/_uploads/rJiLamVHp.png =500x400) --- <style> #title { background-clip: border-box; font-family: Consolas; background: linear-gradient(45deg, #ff0000, #ff7700, #ffcc00, #33cc33, #3366cc, #9900cc); }, #hahaha { background-clip: border-box; font-family: Consolas; background: linear-gradient(135deg, #ff0000, #ff7700, #ffcc00, #33cc33, #3366cc, #9900cc); } </style> <h2 id="title"> modifying the model </h2> <p id="hahaha" style="font-weight: 999; -webkit-text-stroke: 2px #ff7f00;background-clip: border-box; background: linear-gradient(120deg, #7a42f4, #e34a33, #56b4e9, #feb24c, #4daf4a);"> 1. Use ViT to replace the encoder <br> 2. upsample the output to make it to an image <br> 3. Transfer learning </p> ![圖片](https://hackmd.io/_uploads/SJm0R7VHp.png =x300) --- ## brief inference workflow 1. image to sequence of feature 2. features to text 3. text + features to image training: 1. train a model that can classify if the image is manga-style. 2. RLHF the generated sequence. --- ## Current Progress ---- ```scala Sequential( (0): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1)) (1): BertForMaskedLM( (bert): BertModel( (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (cls): BertOnlyMLMHead( (predictions): BertLMPredictionHead( (transform): BertPredictionHeadTransform( (dense): Linear(in_features=768, out_features=768, bias=True) (transform_act_fn): GELUActivation() (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) ) (decoder): Linear(in_features=768, out_features=30522, bias=True) ) ) ) ) ``` --- <p class="spinner" style="font-size: 200px; color: pink;"> THANKS FOR LISTENING </p>