Try   HackMD

資訊科技產業專案設計課程作業 3

Resume

Apple

Software Engineer in Natural Language Processing (NLP) and Machine Learning (ML

website

mininum qualifications

  • Experience writing production software (Swift, C/C++, Obj-C, Python)
  • Strong machine learning (ML) fundamentals
  • B.S, M.S. or PhD in Computer Science or a related field
  • 3+ years experience in relevant roles

preferred qualifications

  • Hands-on experience in using open-source ML toolkits, e.g., PyTorch, Tensorflow
  • Hands-on experience with building NLP/Generative AI applications
  • Excellent communication and organizational skills

Evaluation

Advantage

  1. 勉強可以在communication上面擦到邊(當過鋼琴社社長(?))
  2. 有Hands-on experience on PyTorch。

Disadvantage

  1. 僅只有B.S 學歷
  2. 了解NLP的時間不足。

AIML - Machine Learning Engineer, NLP - Siri and Information Intelligence

website

mininum qualifications

  • Strong programming skills in Objective Oriented Programming and system design using Python (or other languages).
  • Experience with cloud services like AWS (S3, ECS, Lambda) and container orchestration using Kubernetes.
  • Knowledge of big data technologies like Spark, Airflow, and Trino, and familiarity with databases like Snowflake, MySQL, PostgreSQL.
  • Experience with data visualization and reporting tools such as Tableau.

preferred qualifications

  • Prior experience in machine learning or related scientific research.
  • Demonstrated ability to manage large datasets, generate synthetic data, fine-tune models, and conduct comprehensive statistical evaluations.
  • Excellent problem-solving, communication, and teamwork skills.
  • Proven experience in designing and building scalable applications and services.
    Advanced degree in Computer Science, Machine Learning, or a related field is preferred.

Evaluation

Advantage

  1. Python

Disadvantage

  1. prefer當中完全沒有符合條件
  2. 對big data technology不熟悉
  3. 雖然曾經用過Kubernet但忘記了@@

Google

Machine Learning Engineer, Gemini

website

mininum qualifications

  • Bachelor's degree in Computer Science or related technical field, or equivalent practical experience.
  • 2 years of experience with NLP/LLMs.
  • Experience with ML models, working with data, quality metrics, quality iterations.

preferred qualifications

  • Master's degree or PhD in ML.
  • Experience with RL modeling.
  • Experience with large-scale distributed model training, MLOps, ML infrastructure.
  • Strong collaboration and communication skills.
  • Strong track record of pushing state-of-the-art in ML (already in LLMs).

Evaluation

Disadvantage

  1. 僅有B.S學歷
  2. 沒有寫過Reinforcement Learning的model
  3. 沒有做過大型的ML Training
  4. 了解LLMs僅一年多

Conclusion

乖乖讀書

Mock Interview

Interviewer: ER
Interviewee: EE

ER: 你可以說明一個NLP的任務,並且你要如何去實現它嗎?
EE: 其實現在有很多關於Distillation的研究,主旨在於利用LLMs的輸出作為訓練資料。
關於Summarization如何做Distillation其實比較少,在今年的Conference當中比較讓我印象深刻的是TriSum,它利用LLMs去生成數個Rationale並拆分成主動受(被稱為Triples),以此訓練LMs學習其的關係。
我覺得將一篇文章拆解成Triples這點十分新穎且有效,但它並沒有更詳細的使用Triples而是直接Train有些可惜。
我覺得既然可以從LLMs當中得到人事物之間的關係,那我們便可以將一篇文章轉換成一份圖,並讓LM學習如何去建造這張圖之後,再學習如何從其抽取重要的資訊做Summary,可能會有更好的結果。

ER: 你說你主要在做Summarization的作業,其擁有的潛在問題或者近期的挑戰是什麼呢?
EE: 最著名且一直存在的問題便是Exposure-Bias,由於訓練方式本身缺陷,所以其實在前幾年算是比較有名的問題,但其實也有許多緩解的方法,Contrastive Learning作為一個較為著名的方式,在這之後將Summarization的ROUGE Metric大幅進步。
而近期對於Summarization的挑戰主要是Distillation,畢竟現在Summarization的成果已經有所瓶頸,且要如何去評論說一個Summary是比較好的指標也相當缺乏,這也是為什麼許多Summarization論文都會需要Human Annotation的結果。

ER: 你對於要如何去評論一個Summary的指標有什麼想法呢?
EE: 其實很早的時候ROUGE這個指標已經有些不堪用了,在之後出現了數個利用bert, BART當作指標如Bertscore, Bartscore,甚至也有利用GPT作為標準的(像是GPTScore, GEval),但其實近幾年會發現直接讓GPT作為評分有不穩定的特徵,或者GEval通常只會給出較為樂觀的結果(3~5),我希望可以透過LLMs去生成一些Summary的核心,並以此去跟Document比較,也許會得到更好的結果。

ER: 當你有複數個專案,你該如何去分配時間呢?
EE: 介於訓練是有時間的,且大概可以知道說一次訓練的時長如何,在盡量保證機台持續訓練的同時,我也可以去處理其他的專案或文書等等,並且提早排好需要訓練哪些模型並預計成果等等。