![ENP](https://f.hypotheses.org/wp-content/blogs.dir/4548/files/2018/10/ENP_Logo_All.png) #### Elites, Networks and Power in Modern China https://enepchina.hypotheses.org/ --- ## Who am I ? - Pierre Magistry (pierre.magistry@univ-amu.fr) - PhD in Computational Linguistics - Postdoc in the ENP-China project --- # Main Goals ---- ## Keywords - Elites - Networks - Power - Modern Urban China ---- ## Research Questions ### 1830~1949 - What "elites" ? - how did they change - local / national / global - Methodological exploration - size of sources - mixing texts, databases, networks - Study the language(s) of the sources and how to process it --- # The Team ---- ## interdisciplinarity - History - Computational Linguistics (/NLP) - Computer Science ---- https://enepchina.hypotheses.org/our-team ---- # The Sources ---- ## Digital Collections - Press in English (Proquest, Brill) - Press in Chinese (Shun-pao) - Biographical dictionaries (Boorman, Who's Who, ...) - Local Gazetteers - Others (specific to a researcher/case study) ---- ## Existing Databases - Academia Sinica 《近現代人物資訊整合系統》 - Harvard's CBDB - ... ---- ## A Few Facts about the Sources - already digitized (as much as possible) - too big to be read - multilingual (English + 文言 to 「國語」) - some issues with OCR - with or without punctuation ---- ![](https://i.imgur.com/DWATqH5.png) ---- ![](https://i.imgur.com/7vVRLsJ.png) --- # Infrastructure ---- ![](https://i.imgur.com/bXbPPIu.png =900x650) --- # Language Processing # Tools we are using or working on ---- ## Basic Analysis - Sentence Splitting - Word Segmentation - Syntactic Parsing - Word-level Semantic (embeddings) ---- ## Information Extraction - Named Entity Recognition - Graph Pattern Mining --- ## Case Study ![](https://f.hypotheses.org/wp-content/blogs.dir/4548/files/2018/10/boorman_couv-1-367x500.png) --- ### Experiments on Boorman's dictionnary ![](https://i.imgur.com/34HYvqG.png) ![](https://i.imgur.com/4hav47x.png) ---- ## Topic Modeling Classic LDA - with different preprocessing strategies - Sinograms - Word - MWE (extraction / segmentation) - different granularities - article - sentence ---- ## Others Topics - Language Changes - Evolution of typography - Sinograms normalization --- ## Interest in Local Gazetteers - main source for the early period - all the issues of Classical Chinese Processing - with interesting metadata - language changes in time and space --- # Vielen Dank! ## Any question ? ![](https://i.imgur.com/gFSUf2a.png)
{"metaMigratedFrom":"YAML","metaMigratedAt":"2023-06-14T23:09:57.838Z","title":"ENP - China","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"transition\":\"slide\",\"theme\":\"white\"}","contributors":"[{\"id\":\"67044989-3f98-4f73-b3b7-e5286adbb616\",\"add\":2734,\"del\":2481}]"}
    664 views