# Tiffany Ting Yu Hsu **Data, Learning Analytics & Applied Research** >I am currently pursuing an MSc in Applied Social Data Science at the London School of Economics and Political Science (LSE). I have extensive experience across a wide range of projects, including the development of generative AI applications, with a primary focus on data analytics, database operations, and end-to-end analytical pipeline construction. Most of my work has been in educational big data and learning analytics, particularly in building systems to analyse, visualise, and interpret learning processes at scale. I can independently design and deliver complete projects or capstone-level systems (even though I much enjoy the sense of achievement from working in teams), from problem formulation and data engineering to modelling, analysis, and deployment. >My strong technical foundation enables me to rapidly adapt methods across domains. At the same time, my university's double major in political science and close collaboration with industry, including participation in cross-sector AI needs interviews with Taiwan AI Academy, has boosted my ability to translate technical insights into meaningful narratives for policy, organisational decision-making, and public communication. 👉 [GitHub](https://github.com/KingLeear) 👉 [LinkedIn](www.linkedin.com/in/tiffany-hsu-5ab026268www.linkedin.com/in/tiffany-hsu-5ab026268) 👉 [HuggingFace](https://huggingface.co/TiffanyH) ## Education **The London School of Economics and Political Science (LSE)** — United Kingdom MSc in Applied Social Data Science (2025–2026) Relevant modules: Data for Data Scientists; Applied Machine Learning for Social Science; Distributed Computing for Big Data; Computational Text Analysis and Large Language Models. **National Chengchi University (NCCU)** — Taiwan BA in Global Studies and Political Science (2021–2025) Relevant modules: Data Science; Machine Learning and Databases; Discrete Mathematics; Statistics; Quantitative Research Methods; Research Methods in Political Science; AI and Digital Humanities. ## Current Work #### Epistemic Network Analysis (ENA) 20/12/2026~ https://github.com/KingLeear/ena-philosophy >Ongoing collaboration with a university instructor on modelling argumentative and epistemic structures in student writing. The dataset contains student-generated text and is therefore not publicly disclosed. The goal of this work is to build an analytical pipeline that allows me to examine both the argumentative structure and the thematic content of student writing, and to explore how these two dimensions interact. <details> <summary><strong>Objectives and Current Approach</strong></summary> ### Objectives Specifically, I aim to: 1. Identify the discourse or argumentative function of each textual segment (e.g. Claim, Evidence, Counterclaim, Rebuttal, Lead, Position). 2. Identify the main semantic topics present in the same corpus. 3. Use Epistemic Network Analysis (ENA) to model how argumentative functions and topics co-occur, and how these co-occurrence patterns form structured networks. Together, this approach moves beyond surface-level content analysis toward a structural view of knowledge construction and argumentation. --- ### Current Approach My current approach is to integrate discourse classification, topic modelling, and network analysis into a single analytical pipeline. I first use a fine-tuned language model to label each text segment with an argumentative function (e.g. Claim, Evidence, Counterclaim, Rebuttal). In parallel, I apply BERTopic to identify semantic topics across the corpus. I then combine these two layers and use Epistemic Network Analysis (ENA) to model how discourse roles and topics co-occur and form structured patterns. This allows me to analyse not only what students write about, but also how they structure and connect their ideas. </details> ![Docs](https://hackmd.io/_uploads/BywrxJ_Nbl.png) ![Docs](https://hackmd.io/_uploads/ByseR2DVZl.png =50%x) ## Key skills ### Python Python is my main working language for data analysis and modelling. In addition to formal university training, I have used Python extensively in project work for data processing and **visualisation**, **machine learning**, **fine-tuning language models**, and building end-to-end analytical and modelling pipelines. ### R I use R extensively for statistical analysis and network-based modelling. I have formal training in R through undergraduate **statistics** and **quantitative methods** courses, as well as through postgraduate **data science** courses, and continue to use it as the main language for **ENA-based analysis**, data cleaning, **ggplot**, and **API integration**. ### PowerBI https://github.com/KingLeear/LSE_powerBI I completed formal Power BI training through the **LSE Digital Skills Lab**, covering data modelling, transformation using **Power Query, DAX**, and the development of interactive dashboards for exploratory analysis and reporting. ### SQL Gained formal training in SQL through data science coursework at the London School of Economics, including structured query design, **data extraction**, **aggregation, and integration for analytical workflows**. ## Published Paper [Hsu, T. T., & Lu, O. H. (2024). Explore the Explanation and Consistency of Explainable AI in the LBLS Data Set. In *LAK* Workshops (pp. 64-72).](https://ceur-ws.org/Vol-3667/DC-LAK24-paper-8.pdf) <details> <summary><strong>Explainable AI for at-risk student prediction</strong></summary> Learning Analytics (LA) is a field focusing on analyzing educational data, utilizing machine learning. One of the most discussed topics is at-risk student prediction. However, the application of these methods for predicting students’ academic behaviors has faced criticism due to concerns about context insensitivity, potentially leading to prejudice and discrimination against students. While some methods in explainable AI (xAI) have been proposed to address these issues, there remains uncertainty regarding the consistency of their results. In response, we incorporate two popular explainable AI (xAI) methods SHAP (Shapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), to interpret the predicting models. These methods attribute the output of these models to individual features, providing a clearer understanding of how each features contributes to the overall prediction. This approach is exemplified in the LBLS467 dataset, which includes data on 467 students’ academic performance and learning behaviors in computer programming courses, encompassing a range of metrics from programming behavior to self-regulated learning and language learning strategies. Concerning the consistency of interpretations derived from SHAP and LIME, analysis via Kendall’s tau coefficients reveals a moderate alignment in their feature weight rankings. Additionally, this alignment is substantiated by a highly significant confidence level, affirming that the observed alignment is not a mere coincidence. ![Screenshot 2026-01-07 at 12.38.49 PM](https://hackmd.io/_uploads/rkvWVRsEZx.png) </details> [Tiffany, T. Y., FLANAGAN, B., & Owen, H. T. (2024, November). Methods of Balancing Model Explainability and Performance in Identifying At-Risk Students. In *International Conference on Computers in Education.*](https://library.apsce.net/index.php/ICCE/article/view/4972) <details> <summary><strong>Model design and imbalance handling strategy</strong></summary> This study will explore and experiment with various combinations of methods to handle data imbalance in order to address the common issue of insufficient minority samples in at-risk student prediction. Additionally, we will examine the purpose of applying computer tools to educational issues and emphasize the necessity of adhering to models with high transparency and explainability, ensuring that the decision-making process can be transparent and comprehensive in the context of learning analytics. After comparing model performance, we selected the logistic regression model combined with correlation analysis and threshold adjustment, which showed outstanding performance in UAR, G-means, and other evaluation metrics. We will analyze the reasons behind students' academic performance based on the feature importance ranking from the model, thereby establishing a high-performance and high- transparency benchmark model for the LBLS593 dataset. ![Screenshot 2026-01-07 at 12.41.38 PM](https://hackmd.io/_uploads/B1yq4CjN-l.png) ![Screenshot 2026-01-07 at 12.41.54 PM](https://hackmd.io/_uploads/ByCqECjN-g.png) </details> [Chuang, K. Z., Hsu, H. K., Hsu, T. T., & Lu, O. H. (2025, September). Rethinking jigsaw method with partially engaged AI. In Proceedings of the 1st *International Conference on Learning Evidence and Analytics (ICLEA)*. Fukuoka, Japan.](https://library.apsce.net/index.php/ICLEA/article/view/5500) ## Project Experience ### Assessing the Quality of Generative AI for ESG Content **Keywords**:Data Science, Evaluation, ESG, Annotation, Quality Assessment, data visualisation Time:4/2023 ~ 8/2024 <details> <summary>Details</summary> Assessing the Quality of Generative Artificial Intelligence Models as an Automated ESG content creation (With Applied Material 應用材料) Led data annotation to evaluate the quality of AI-generated ESG content within the company’s internal knowledge management system, and conducted the data analysis. Using a content analysis framework grounded in traditional news values, I operationalised and coded quality along three dimensions:contextualisation, relevance, and redundancy, and demonstrated that the locally developed model outperformed GPT-based models on these dimensions in this organisational context. </details> ### Unified Search Service for Education Cloud Platform **Keywords**:Semantic Search, Retrieval, Data Analytics,Public Infrastructure, Education, Data Annotation, Carbon Emissions Time:4/2025 ~ 8/2025 <details> <summary>Details</summary> Development and Validation of a Unified Search Service for the Education Cloud Platform (With Ministry of Education 教育部) The project background is because large language models entail high carbon emissions and substantial computational costs, the MoE in Taiwan does not wish to build critical public infrastructure such as the Education Cloud entirely on top of large commercial models. Instead, the Ministry aims to train, validate, and deploy smaller, locally developed language models to support tasks such as question bank retrieval, content understanding, and content generation. As a research assistant, I: • led data annotation through direct engagement with elementary school teachers, • carried out the data analysis, visualisation, and reporting. The second project explored the application of AI in the context of social, historical, and humanities education, examining how AI technologies could be used to support knowledge organisation, interpretation, and learning in non-commercial and public-interest domains. In this project, I used the story of Richard III and related historical documents as a case study to guide students in applying OCR technologies to digitise historical texts and build a searchable retrieval system for historical materials. </details> ### AI Democratic Deliberation Interactive Exhibit **Keywords**:Generative AI, AI Governance and Ethics, UX, Ethics, Public Engagement Time:7/2024 ~ 11/2024 <details> <summary>Details</summary> AI Democratic Deliberation Workshop — Generative AI Interactive Exhibit (With Ministry of Digital Affairs 台灣數位發展部) Linkedin Post In this project, I co-developed a full-stack generative AI interactive exhibit for a public AI Democratic Deliberation Workshop for Ministry of Digital Affairs , designed to engage participants in discussions about AI ethics and governance through hands-on experience. The system captures user photos and transforms them into AI-generated images in real time using Stable Diffusion. I worked across the full stack of the application, including model integration, backend orchestration, and the user-facing interaction flow. During the Taipei session, we initially deployed an earlier version of the model. For the subsequent Tainan session, we upgraded the system to use SDXL, following guidance from the then Minister of Digital Affairs Ms. Audrey Tang, which significantly improved user experience. </details> ### Artificial Intelligence for All Industries — Capstone Program **Keywords**:Industry Collaboration, Applied AI, Curriculum Design, AI Humanities Time:1/2025 ~ 7/2025 <details> <summary>Details</summary> Artificial Intelligence for All Industries: Empowering Education through AI (With Taiwan Ministry of Digital Affairs) As part of Taiwan’s national strategy to promote AI-enabled industrial transformation through talent development, universities were encouraged to introduce industry-linked capstone projects. These projects aim to equip students with the skills to apply AI meaningfully within existing industries. Within this context, our capstone course partnered with companies from different sectors to explore applied AI use cases. One of the projects focused on the luxury retail industry, in collaboration with Breeze and Makalot, examining how generative AI could support content creation and internal knowledge management. My role as teaching assistant in the project include: • Led and supported a student team as a teaching assistant and project lead and directly engage with industry partners • Built an open-source workflow for generative content creation. • Guided problem scoping, methodological design, and alignment between technical development and business needs. • Supervised the application of generative AI for content creation in the luxury retail context. • Supported data preparation, evaluation, and interpretation of project outcomes. The project was showcased at the university’s AI InnoFest, where it received the Audience Choice Award. The course involved 25 students as part of this talent development programme, and 6 students successfully secured industry roles or internships as a result of their participation, demonstrating the programme’s impact in bridging education and industry. Repo: https://github.com/KingLeear/ComfyUi_Video_FaceRestore </details> ### Smart Pole Data Analytics Applications Workshop Keywords:Smart City, IoT, Data Analytics, Public Infrastructure Time:8/2025 ~ 9/2025 <details> <summary>Details</summary> Smart Pole Data Analytics Applications Workshop (with Institute of Information Industry 財團法人工業資訊策進會) </details> ## Other I gym and play volleyball.