owned this note
owned this note
Published
Linked with GitHub
# Paper Notes
---
*8th May 2025*
## Kaiping, Steiger, Chousou-Polydouri (2022) Lexedata software [Lexedata: A toolbox to edit CLDF lexical datasets](https://joss.theoj.org/papers/10.21105/joss.04140), JOSS
* [github](https://github.com/lexedata/lexedata)
* [readthedocs](https://lexedata.readthedocs.io/en/latest/tour.html)
* We should familiarize ourselves with JSON first before using PYTHON
* We are aiming towards CLDF application that has already worked with the data from Africa
* In the CLDF, we should try to identify the cognates first in the data
* We, as linguist, at least able to identify any mistakes with the raw data
* Try to get used to command line
* Googlecolab is one of the places you can get command line ready, otherwise you can use PYTHON directly
* In command line, windows use wise, put $ sign at the start
* In windows, rather than / you use \\ the previous one used in Linux
* In order to get a command line working, you need to consider setting up the correct environment as well
* We should try exploring other tools as well
* Jupiter note book can help us making the PYTHON code line by line
* Find the article from Mariam, and look at the word schismogenesis on how languages evolved
---
*9th May 2025*
## Kaiping & Klamer (2018) [LexiRumah: An online lexical database of the Lesser Sunda Islands](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205250), PLoS One
* None of the Lesser Sunda region languages are well studied
* There were hypotheses that mention the movement or migration of the languages
* ASJP database might be useful for finding out similarities
* ABVD can also be useful providing several hundreds database
* One important assumption: Basic vocabs are easier to replace
* Another important aspects: Rate of borrowing in several domains
* Cases of multiple dialects from the same language - several languages can actually be the same, just phonetically different
* CLLD conception picture of related concepts is one of the thing that we can create better
* Glotolog is really useful in standardizing language code in 3 letter format
* Automated cognate algorithm - LexStat - would be great to have it working out
* Following the CLDF, we should use GitHub - a tool for software engineering to track every change made to the data
* The GitHub should be used to freeze the progress, prior to make any changes
* To make it into a database format, we should use SQLite
---
*12th May 2025*
## Tadmor, U., Haspelmath, M., & Taylor, B. (2012). [Borrowability and the notion of basic vocabulary](https://www.jbe-platform.com/content/journals/10.1075/dia.27.2.04tad). Diachronica, 27, 226-246.
* Borrowable: Noun > Adjectives, Content > Function
* Borrowing likelihood = diffusional similarities
* What is basic/core vocab? - better yet, based the criteria on its meaning
* Basic vocabulary: Swadesh 100-item list (means to establish connections)
* Loanword Typology project (LWT): 1460 (lexical) meaning list - later developed into World Loanword Database
* LWT project on 41 languages: at average, 24% loanwords
* Most probable loanwords: Deictic (expression dependent on context)
* Resistant to borrowing: Made up strings of word
* The older the word, the more likely it is to be borrowing
* Leipzig-Jakarta basic vocab list: 100 words based on borrowing probability, representation of the word, simplicity of the word, and stability of the word
* Leipzig-Jakarta was claimed better than Swadesh
* This paper may not contribute much, but the use of semantic fields to compare languages might be a good start to analyze cognates (from borrowing perspectives - orthographically or phonologically) later on
* Take a look at semantic fields in WOLD (https://wold.clld.org/meaning)
## List et al. 2018, [Sequence comparison in computational historical linguistics](https://academic.oup.com/jole/article/3/2/130/5050100), J Lang Evol, 2:30–144
* Introduces [LingPy](https://lingpy.org/)
* Cognate detection and alignment
* Detailed Tutorial!
* Wahyu walked us through, good explanation of the process of alignment and mutation that is included in the models
* Many models to choose from, this paper recommends a particular one
## Hammarström, Rönchen, Elgh, Wiklund (2019) [On computational historical linguistics in the 21st century](https://uu.diva-portal.org/smash/get/diva2:1385121/FULLTEXT01.pdf), Theoretical Linguistics 45(3-4):233–245
* Measured by [WOLD](https://wold.clld.org/)
* Loaning changes considerably by meaning category
* Useful reference for why we use longer, richer wordlists for borrowing
## Zahrer, Zgank, Schuppler (2020) [Towards Building an Automatic Transcription System for Language Documentation: Experiences from Muyu](https://aclanthology.org/2020.lrec-1.353/)
* Overview of the linguistic documentation process
* Describes a process for making linguistic data generation more efficient
* Focusses on the details of multi-field trip data collection
* "ASR4LD" model
* Google Singapore project underway to automate this
---
*13th May 2025*
## Forkel el al. (2018) [Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics](https://www.nature.com/articles/sdata2018205), Scientific Data 5:180205.
* Monika and Johannes Dellert suggested this as reading material
___
* 15th May 2025
## Geraghty, P. (2025) [Eliciting Fijian Words: A Study of Two Surveys] JSPS Research Project Symposium, Minpaku, Osaka, April 2025
* Predictable phonetic variants
* Gender differences in language use - as communalects in Japanese
* Suffix-possessed noun in Fijian, just as in languages in Indonesia, but only applies to part of the body or family member
* Fijian was elicited by giving forms to informants and recording the local equivalent
* Equivalency in meaning - elicitation resulted in broadening, near synonyms, term of references for kin, and transitive intransitive construction
* 1983, 100-words list published from 38 communalects, inspired from the Swadesh 100-words list
* Reduplication also occur in Fijian, while it's only for pluralization and emphasizing in Indonesian, in Fijian it can also occur to verbs
* The project used 37 semantic fields for the nouns
* Problems in phonetic contrast: Fijian does not distinguish vowel end - childrens were not told of long vowels at school
* Language choice in pronoun can identify the region where these speakers come from
---
*19th May 2025*
## Show and Tell day
* John's background: bioinformatics
* R coding
* PCA, statistics, ML
* Using high dimensional statistics
* Chris' background: UPNG Medical Officer
* Used R
* MSc in Biological Anthropology
* Human stress in PNG amongst healthy adults
* Assess wellbeing levels, village stress levels, mental health
* Anthropometric data
* Genealogical data
* Diet diversity
* Clinical sample collection for collaborations with Melbourne, Toulouse
*20th May 2025*
## Show and Tell day
* Sena:
* [Google Scholar](https://scholar.google.co.uk/citations?user=56JrKqkAAAAJ&hl=en&oi=ao)
* PhD work [Is Passive Priming Really Impervious to Verb Semantics? A High-Powered Replication of Messenger Et al. (2012)](https://online.ucpress.edu/collabra/article/8/1/31055/119389/Is-Passive-Priming-Really-Impervious-to-Verb)
* Several elicitation experiments involving participants using [gorilla experiment builder](https://gorilla.sc/)
* Wahyu:
* Linguistic Markers in Cross-Nation Suicide Notes and Their Implications for Authenticity Verification (in press)
* Work with George
* 'A field report of Mawes: A language Isolate spoken in North-West New Guinea' (in press)
* [The Role of Burdah and Ngelenggang Religious Rituals in Preserving the Loloan Malay Language in West Bali](https://ojs.unud.ac.id/index.php/kajianbali/article/view/123108)
* The Dynamics of Vowel Sounds in the Highland and Lowland Nusa Penida Balinese Language (in press)
* Komang:
* Linguistics/Anthropology
* Wordlists recorded in 'Elan'
* Community engagement
* Collect "Ritual speech"
* Bulian Village in Northern Bali, Community Engagement (Paper under review)
* Project with Philippines
*21st May 2025*
* Dendi: [Google Scholar](https://scholar.google.co.id/citations?hl=id&user=8Uvg9igAAAAJ&view_op=list_works&sortby=pubdate)
* worked on many aspects of [Mentawai](https://ejournal.brin.go.id/jmi/article/view/8502) people and language ([wiki](https://en.wikipedia.org/wiki/Mentawai_Islands_Regency))
* Enggano Orthography
* worked in West Papua
* Kowiai people near Pulau Namatota, Pulau Manggawitu, Pulau Kajumerah (Austronesian language)
* Paul:
* The Settlement of the Pacific
* Major linguistic groups are:
* Taiwan,
* Western Malayo Polynesian (Phillippines, much of Indonesia, Madagascar)
* Central Malayan Polynesian (Sundas)
* South Malayan Polynesian (?)
* East Polynesian
* Most subgroups of Austronesian arein Taiwan
* Lapita pottery expansion approx 3kya
* Extinct animals: Megapode - "Megavitiornis"
---
* 22nd May 2025*
## Futuna mo ?uvea
* Paul on Polynesia
* Proto-polynesian split into 3 about 3kya
* West Fijian, East Fijian, "Tokalau" Fijian (islands farther east)
* innovations being shared proves an evolutionary relationship
* e.g. "mw" sound
* Dravidian kinship system (men marrying fathers sisters daughter) is geographically structured, exists only west of a line
* zero (0) is a valid symbol in proto languages, not to be used in phonetic representation of modern language
* place names helpful for reconstruction
## [ Complex Patterns of Admixture across the Indonesian Archipelago](https://academic.oup.com/mbe/article/34/10/2439/3952785?login=true), Hudjashov 2017, Mol Biol Evol.
* [indonesian data](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80534)
## [Investigating the origins of eastern Polynesians using genome-wide data from the Leeward Society Isles](https://www.nature.com/articles/s41598-018-20026-8), Hudjashov 2017, Sci Rep.
---
# Discussion on Phonetics
* [Robert Blust - The Austronesian
language](https://openresearch-repository.anu.edu.au/items/7f833770-bb0f-4c16-b651-44888d980d23)
* Blust, R. A. (2009). The Austronesian languages (Pacific Linguistics, 602). Research School of Pacific and Asian Studies, Australian National University.
* We should look to see whether his words and examples for "correct" orthography are present in our word lists, and how we represent them.
* Importance of distinguishing between **phonemic** (how the sounds are formed regarding meaning of the word) and **phonetic** (the exact pronunciation, not taking into account whetherthe form distinction is important)
* Check which languages contain their own orthography
* Check whether these are consistent
* Page 432: Ca-reduplication in Balinese is incorrect; i.e., ba-bisik-an is not 'whispering', but changing verb into noun. Balinese case might not be fossilized, but lost; i.e., ta-telu that means 'all three', now people just say 'ka-telu', just 'telu', or even 'konyang' (all) not mentioning the number at all
* Page 572: 'pacek' is also a noun, not a verb, as described. The verb is 'macek'
* Most Balinese ends with glottal stop ʔ for 'k'
* A lot of 'e' in Balinese will sound 'ə'