Paper Notes - HackMD

# Paper Notes --- *8th May 2025* ## Kaiping, Steiger, Chousou-Polydouri (2022) Lexedata software [Lexedata: A toolbox to edit CLDF lexical datasets](https://joss.theoj.org/papers/10.21105/joss.04140), JOSS * [github](https://github.com/lexedata/lexedata) * [readthedocs](https://lexedata.readthedocs.io/en/latest/tour.html) * We should familiarize ourselves with JSON first before using PYTHON * We are aiming towards CLDF application that has already worked with the data from Africa * In the CLDF, we should try to identify the cognates first in the data * We, as linguist, at least able to identify any mistakes with the raw data * Try to get used to command line * Googlecolab is one of the places you can get command line ready, otherwise you can use PYTHON directly * In command line, windows use wise, put $ sign at the start * In windows, rather than / you use \\ the previous one used in Linux * In order to get a command line working, you need to consider setting up the correct environment as well * We should try exploring other tools as well * Jupiter note book can help us making the PYTHON code line by line * Find the article from Mariam, and look at the word schismogenesis on how languages evolved --- *9th May 2025* ## Kaiping & Klamer (2018) [LexiRumah: An online lexical database of the Lesser Sunda Islands](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205250), PLoS One * None of the Lesser Sunda region languages are well studied * There were hypotheses that mention the movement or migration of the languages * ASJP database might be useful for finding out similarities * ABVD can also be useful providing several hundreds database * One important assumption: Basic vocabs are easier to replace * Another important aspects: Rate of borrowing in several domains * Cases of multiple dialects from the same language - several languages can actually be the same, just phonetically different * CLLD conception picture of related concepts is one of the thing that we can create better * Glotolog is really useful in standardizing language code in 3 letter format * Automated cognate algorithm - LexStat - would be great to have it working out * Following the CLDF, we should use GitHub - a tool for software engineering to track every change made to the data * The GitHub should be used to freeze the progress, prior to make any changes * To make it into a database format, we should use SQLite --- *12th May 2025* ## Tadmor, U., Haspelmath, M., & Taylor, B. (2012). [Borrowability and the notion of basic vocabulary](https://www.jbe-platform.com/content/journals/10.1075/dia.27.2.04tad). Diachronica, 27, 226-246. * Borrowable: Noun > Adjectives, Content > Function * Borrowing likelihood = diffusional similarities * What is basic/core vocab? - better yet, based the criteria on its meaning * Basic vocabulary: Swadesh 100-item list (means to establish connections) * Loanword Typology project (LWT): 1460 (lexical) meaning list - later developed into World Loanword Database * LWT project on 41 languages: at average, 24% loanwords * Most probable loanwords: Deictic (expression dependent on context) * Resistant to borrowing: Made up strings of word * The older the word, the more likely it is to be borrowing * Leipzig-Jakarta basic vocab list: 100 words based on borrowing probability, representation of the word, simplicity of the word, and stability of the word * Leipzig-Jakarta was claimed better than Swadesh * This paper may not contribute much, but the use of semantic fields to compare languages might be a good start to analyze cognates (from borrowing perspectives - orthographically or phonologically) later on * Take a look at semantic fields in WOLD (https://wold.clld.org/meaning) ## List et al. 2018, [Sequence comparison in computational historical linguistics](https://academic.oup.com/jole/article/3/2/130/5050100), J Lang Evol, 2:30–144 * Introduces [LingPy](https://lingpy.org/) * Cognate detection and alignment * Detailed Tutorial! * Wahyu walked us through, good explanation of the process of alignment and mutation that is included in the models * Many models to choose from, this paper recommends a particular one ## Hammarström, Rönchen, Elgh, Wiklund (2019) [On computational historical linguistics in the 21st century](https://uu.diva-portal.org/smash/get/diva2:1385121/FULLTEXT01.pdf), Theoretical Linguistics 45(3-4):233–245 * Measured by [WOLD](https://wold.clld.org/) * Loaning changes considerably by meaning category * Useful reference for why we use longer, richer wordlists for borrowing ## Zahrer, Zgank, Schuppler (2020) [Towards Building an Automatic Transcription System for Language Documentation: Experiences from Muyu](https://aclanthology.org/2020.lrec-1.353/) * Overview of the linguistic documentation process * Describes a process for making linguistic data generation more efficient * Focusses on the details of multi-field trip data collection * "ASR4LD" model * Google Singapore project underway to automate this --- *13th May 2025* ## Forkel el al. (2018) [Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics](https://www.nature.com/articles/sdata2018205), Scientific Data 5:180205. * Monika and Johannes Dellert suggested this as reading material ___ * 15th May 2025 ## Geraghty, P. (2025) [Eliciting Fijian Words: A Study of Two Surveys] JSPS Research Project Symposium, Minpaku, Osaka, April 2025 * Predictable phonetic variants * Gender differences in language use - as communalects in Japanese * Suffix-possessed noun in Fijian, just as in languages in Indonesia, but only applies to part of the body or family member * Fijian was elicited by giving forms to informants and recording the local equivalent * Equivalency in meaning - elicitation resulted in broadening, near synonyms, term of references for kin, and transitive intransitive construction * 1983, 100-words list published from 38 communalects, inspired from the Swadesh 100-words list * Reduplication also occur in Fijian, while it's only for pluralization and emphasizing in Indonesian, in Fijian it can also occur to verbs * The project used 37 semantic fields for the nouns * Problems in phonetic contrast: Fijian does not distinguish vowel end - childrens were not told of long vowels at school * Language choice in pronoun can identify the region where these speakers come from --- *19th May 2025* ## Show and Tell day * John's background: bioinformatics * R coding * PCA, statistics, ML * Using high dimensional statistics * Chris' background: UPNG Medical Officer * Used R * MSc in Biological Anthropology * Human stress in PNG amongst healthy adults * Assess wellbeing levels, village stress levels, mental health * Anthropometric data * Genealogical data * Diet diversity * Clinical sample collection for collaborations with Melbourne, Toulouse *20th May 2025* ## Show and Tell day * Sena: * [Google Scholar](https://scholar.google.co.uk/citations?user=56JrKqkAAAAJ&hl=en&oi=ao) * PhD work [Is Passive Priming Really Impervious to Verb Semantics? A High-Powered Replication of Messenger Et al. (2012)](https://online.ucpress.edu/collabra/article/8/1/31055/119389/Is-Passive-Priming-Really-Impervious-to-Verb) * Several elicitation experiments involving participants using [gorilla experiment builder](https://gorilla.sc/) * Wahyu: * Linguistic Markers in Cross-Nation Suicide Notes and Their Implications for Authenticity Verification (in press) * Work with George * 'A field report of Mawes: A language Isolate spoken in North-West New Guinea' (in press) * [The Role of Burdah and Ngelenggang Religious Rituals in Preserving the Loloan Malay Language in West Bali](https://ojs.unud.ac.id/index.php/kajianbali/article/view/123108) * The Dynamics of Vowel Sounds in the Highland and Lowland Nusa Penida Balinese Language (in press) * Komang: * Linguistics/Anthropology * Wordlists recorded in 'Elan' * Community engagement * Collect "Ritual speech" * Bulian Village in Northern Bali, Community Engagement (Paper under review) * Project with Philippines *21st May 2025* * Dendi: [Google Scholar](https://scholar.google.co.id/citations?hl=id&user=8Uvg9igAAAAJ&view_op=list_works&sortby=pubdate) * worked on many aspects of [Mentawai](https://ejournal.brin.go.id/jmi/article/view/8502) people and language ([wiki](https://en.wikipedia.org/wiki/Mentawai_Islands_Regency)) * Enggano Orthography * worked in West Papua * Kowiai people near Pulau Namatota, Pulau Manggawitu, Pulau Kajumerah (Austronesian language) * Paul: * The Settlement of the Pacific * Major linguistic groups are: * Taiwan, * Western Malayo Polynesian (Phillippines, much of Indonesia, Madagascar) * Central Malayan Polynesian (Sundas) * South Malayan Polynesian (?) * East Polynesian * Most subgroups of Austronesian arein Taiwan * Lapita pottery expansion approx 3kya * Extinct animals: Megapode - "Megavitiornis" --- * 22nd May 2025* ## Futuna mo ?uvea * Paul on Polynesia * Proto-polynesian split into 3 about 3kya * West Fijian, East Fijian, "Tokalau" Fijian (islands farther east) * innovations being shared proves an evolutionary relationship * e.g. "mw" sound * Dravidian kinship system (men marrying fathers sisters daughter) is geographically structured, exists only west of a line * zero (0) is a valid symbol in proto languages, not to be used in phonetic representation of modern language * place names helpful for reconstruction ## [ Complex Patterns of Admixture across the Indonesian Archipelago](https://academic.oup.com/mbe/article/34/10/2439/3952785?login=true), Hudjashov 2017, Mol Biol Evol. * [indonesian data](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80534) ## [Investigating the origins of eastern Polynesians using genome-wide data from the Leeward Society Isles](https://www.nature.com/articles/s41598-018-20026-8), Hudjashov 2017, Sci Rep. --- # Discussion on Phonetics * [Robert Blust - The Austronesian language](https://openresearch-repository.anu.edu.au/items/7f833770-bb0f-4c16-b651-44888d980d23) * Blust, R. A. (2009). The Austronesian languages (Pacific Linguistics, 602). Research School of Pacific and Asian Studies, Australian National University. * We should look to see whether his words and examples for "correct" orthography are present in our word lists, and how we represent them. * Importance of distinguishing between **phonemic** (how the sounds are formed regarding meaning of the word) and **phonetic** (the exact pronunciation, not taking into account whetherthe form distinction is important) * Check which languages contain their own orthography * Check whether these are consistent * Page 432: Ca-reduplication in Balinese is incorrect; i.e., ba-bisik-an is not 'whispering', but changing verb into noun. Balinese case might not be fossilized, but lost; i.e., ta-telu that means 'all three', now people just say 'ka-telu', just 'telu', or even 'konyang' (all) not mentioning the number at all * Page 572: 'pacek' is also a noun, not a verb, as described. The verb is 'macek' * Most Balinese ends with glottal stop ʔ for 'k' * A lot of 'e' in Balinese will sound 'ə'