--- title: Introduction to Linguistics WS 19/20 description: The recap of introduction to linguistics. Lecture was held in winter term 2019/2020 at TU Darmstadt. image: https://i.imgur.com/dhbRTiR.png --- :::info Mit bestem Wissen und Gewissen erstellt, allerdings keine Gewähr auf Vollständig- oder Richtigkeit :v: ::: | #Lecture | Name | #Slides | Done | Lexicon created | | -------- | ------------------------------------------------ | ------- | ------------ | ------------------- | | 01 | Language: A preview | 25 | $\checkmark$ | $\checkmark$ | | 02 | Corpora and corpus tools | 45 | $\checkmark$ | $\checkmark$ | | 03 | Phonetics | 44 | $\checkmark$ | $\checkmark$ | | 04 | Phonology | 41 | $\checkmark$ | $\checkmark$ | | 05+06 | Morphology | 81 | $\checkmark$ | $\checkmark$ | | 07 | Syntax | 82 | $\checkmark$ | $\checkmark$ | | 08+09 | Data analysis, visulization<br> and presentation | 52 | $\checkmark$ | $\checkmark$ | | 10 | Semantics | 55 | $\checkmark$ | $\checkmark$ | | 11 | Language variation and change | 46 | $\checkmark$ | $\checkmark$ | | 12 | Language Classification | 39 | $\checkmark$ | $\checkmark$ | | | **Total:** | 510 | $\checkmark$ | see end of document | [toc] # 01: Language Introduction ## Language Language is a **_system of conventionalised symbols_** which we use for communication. Languages are **_rule-governed structures_**. Those rules: - reflect the _systematic structure_ of the language - represent _observed regularities_ of language behaviour ### Properties - **Arbitrariness** The relationship between form (sounds) and meaning (concept) is arbitrary. Different sounds can transmit the same message (e.g. in different languages). Exceptions: - **Sound symbolism:** The sounds of words is suggestive of their meaning - **Onomatopoeia:** The sounds of words imitate the sounds of nature. _(E.g. Kuckuck)_ - **Symbolism** The relationship between words and objects or things we experience is symbolic. _(E.g. table symbolises a type of furniture.)_ - **Creativity** Novel utterances can be created in response to new thoughts, experiences, and situations. Their number is infinite (however constraints on creativity exist!). *[utterances]: Äußerungen ## Grammar Grammar is about elements (words, sounds, etc.) and rules on how to combine those elements. * All languages have a grammar * All grammars are equal in terms of communicative expressivity In search of the **universal grammar** there exists the study and quest for language universals. They try to identify common grammatical properties of human language as a whole (different spoken languages) _to gain insight into the workings of human cognition._ ### Components * **Phonetics:** Articulation and perception of speech sounds. * **Phonology:** The patterning of speech sounds. * **Morphology:** Word structure and formation. * **Syntax:** Sentence structure and formation. * **Semantics:** The interpretation of words and sentences. ## Linguistics Linguistics is the discipline concerned with the study of language. Categorizations exist. * **Descriptive vs prescriptive** _Descriptive linguistics_ aims at describing the language. Its methods are systemization and generalization. _Prescriptive linguistics_ prescribes "what people should say". * **Synchronic vs diachronic** _Synchronic linguistics_ looks at language as it is at one particular point in time while _diachronic lingu_ looks at the development of lanugage over time. * **Theoretical vs applied** _Theoretical linguistics_ aims at discovering general principles and formulating theories about the structure of language. Meanwhile _applied linguistics_ deals with the application of linguistic concepts for e.g.: language teaching, translation, artificial intelligence. ## Summary slide * we all have linguistic intuitions * language users have knowledge that accounts for complex linguistic patterns * language is a rule-governed system * modelling these rules involves characterising and describing patterns as rules # 02: Corpora and corpus tools ## Corpus design General characteristics of corpora are: * representativeness (of the language to study) * finite size * machine-readable form * standard reference (a corpus serves as a reference for the variety it represents) Corpora can be of any form: **Spoken, written or multimodal**. They can also be differentiated in terms of data collection: * **Monitor corpora** are datasets with a variety of materials. They grow over time. Any oddities disappear as they grow in size. Example: The web. * Massive amount of data * Problems: * no control over 'sociolinguistic variables' (e.g. age, gender, dialect of speaker) * errors in the data * results depend on search engine * **Balanced corpora** are chosen such that they are representative of a particular type of language. They should be * _balanced_ in regard to different types of language and speakers * _representative_ of as many varieties as possible * _comparable_ to other corpora **Opportunistic corpora** are neither balanced nor representative. They can still be useful. E.g. corpora of extinct languages. ### Some specific differentiation * **general** vs **specialized** * balanced in terms of register, domain, variety, both written & spoken data * genre specific (e.g. corpus of professional spoken american english) * **synchronic** vs **diachronic** * data from one specific time period $\leftrightarrow$ different time periods * **learner corpora** vs **native corpora** * L2 learner data $\leftrightarrow$ native speaker data * **developmental corpora** * Data from L1 learners or child language data * **regional corpora** * sample one variety $\leftrightarrow$ corpora that sample more than one variety of data ## Corpus annotation Annotation adds linguistic and non-linguistic information to the corpus. Pros: * easier extraction of information * stable base for linguistic analyses (as base for future research) $\rightarrow$ reusability Cons: * Annotation not always consistent and error-free * imposes a linguistic analysis on the reseracher $\rightarrow$ makes corpora expensive and thus less accessible **Possibilities of annotation** * Fully automatic annotation * Manual annotation (time consuming and only feasible for smal corpora) * Semi-automatic annotation (includes post-editing by researcher) **Types of annotation** * Parts-of-speech (POS) tagging * Lemmatization: Reduce forms of words to its lemma (does, did, doing $\rightarrow$ do) * Syntactic parsing: Analyze corpus into sentence constituents * ... *[sentence constituents]:parts of sentence ### POS-tagging As example: ``` That must have been quite a day for you when you were a child ? ``` becomes ``` That_DD1 must_VM have_VHI been_VBN quite_RG a_AT1 day_NNT1 for_IF you_PPY when_CS you_PPY were_VBDR a_AT1 child_NN1 ?_? ``` using UCREL CLAWS7 tagset. ### Some major english corpora * ICE: The International Corpus of English * ICLE: The International Corpus of Learner English * LOCNESS: Louvain Corpus of Native English Essays * ARCHER: A Representative Corpus of Historical English Registers * The Brown Family of Corpora * FRED: The FReiburg English Dialect Corpus ### Online corpus tools * FRED Interactive Database * Corpus of Global Web-Based English * COCA: Corpus of Contemporary American English * BNC: The British National Corpus ## Summary: What corpora are good for * Create a dictionary * Create a grammar for a language * Describe how the language works * Teach: * common grammatical patterns, lexical variation * use naturalistic language data * Study language acquisition * Many computational tasks: machine translation, web search, ... * Questions one might answer with corpora: * What are the most frequent words in spanish? * Do men swear more frequently than women? * What is the difference between spoken and written French? *[naturalistic]: lebensnah ## Antconc :::info Redo the Antconc-Handout at this point when learning for the exam! ::: # 03: Phonetics Study of the characteristics of human sound making. It provides methods for their: * description * classification * transcription The object of study is **phones/sounds**, regardless of their meaning. They are represented in square brackets [ ]. ## Introduction Subfields of Phonetics exist: * **Articulatory phonetics** studies how speech sounds are made (by the vocal organs) * **Acoustic phonetics** studies the physical properties of speech sounds as transmitted by mouth and ear * **Auditory phonetics** is the study of perceptual response to speech sounds. Involved are the ear, auditory nerve and brain. *[perceptual]: Wahrnehmungs- / wahrgenommen The **McGurk Effect** displays that both auditory and visual input play a role in the interpretation of sounds. ### Sound Segments Segments are individual speech sounds. ![](https://i.imgur.com/8TqKTwj.png) _thought_ has 3, _lexicon_ has 8 segments. Speech is **continuous**. There are no pauses between the sounds. $\rightarrow$ still speakers and listeners have linguistic knowledge to to identify individual segments. **Invariance of speech sounds** exist in human lanugage. For example _[p]_ sound sounds similar enough across human languages to give it the same symbol. But _[p]_ and _[t]_ sounds are distinct enough to be represented by different symbols. ## Transcription The **IPA (International phonetic alphabet)** has two objectives: - transcribe all speech sounds in all languages - to represent each sound of speech with a different symbol English ortography does not conform to the "one sound - one symbol" principle: - Different sounds, same ortographic symbol (beard, hear, tear) - Same sound, different ortographic symbols (be, sea, bee, seize) **Silent letters** One ortographic symbol, no sound: De**b**t, lis**t**en, **k**now, colum**n**. Sometimes their pronounciation is optional: of**t**en, sand**d**witch, a**l**right. Letters may also be missing entirely: use, futile, cute [k**j**ute] **Diacritics** **Diacritics** can be used to avoid introducing new symbols. It is a written mark in conjuction with a symbol. Its aim is to indicate the quality of the represented sound. In principle, two **types of transcription** exist: * **Phonetic transcription**: Uses diacritics, but rarely used outside of phonetic publications. ![](https://i.imgur.com/EoKGote.png) * **Phonemic transcription**: Most simple type of phonetic transcription ![](https://i.imgur.com/M7i0elG.png) ## The speech organs We do not have speech organs that were developed for speech only. Their primary function is biological. They consist of: * **lungs**: supply air for speech * **larynx**: produces voice for speech sounds ($\rightarrow$ source of sound). The vocal folds are inside the larynx as well as the **glottis** (space between vocal folds) * pharynx: A first filter * oral cavity: A second filter * nasal cavity: A third filter Different **glottal states** exist: * **voicelessness**, where the vocal folds are pulled apart: **s**ee, **h**ug, **f**lag * **voicing**, where the air vibrates the vocal folds that are brought close together: **z**ip, **j**ug, **v**eal *[larynx]: Kehlkopf *[pharynx]: Rachen ## Types of sound Two major types of sounds exist: * **Consonants** * the airflow may be blocked, obstructed or diverted * can be _voiced_ or _voiceless_ (vocal folds may or may not vibrate) * do not function as _nucleus_ of a syllable and thus occur at their margins * **Vowels** * the airflow is relatively free * vowels are always _voiced_ (vocal folds vibrate) * function as _nucleus_ of a syllable. They are _syllabic_. ### Consonants Categories to describe consonants include: * state of the vocal folds: _voiced_ vs _voiceless_ * place of articulation * manner of articulation **Obstruents** Obstruents are sounds where the airflow is strongly obstructed. They come in voiced/voiceless pairs. Types include: * **plosives (or stops)** A complete closure of airflow in the vocal tract is suddenly released $\rightarrow$ explosive sound. The air flows only through the vocal tract. Places of articulation: ![](https://i.imgur.com/NeTHlW6.png) * **fricatives** Articulated by close approximation of two articulators so that the air stream is partially obstructed $\rightarrow$ turbulant airflow. Places of articulation are: ![](https://i.imgur.com/0N5uHzL.png) * **affricates** Sequence of a stop immediately followed by a homorganic fricative. Possible place of articulation is: ![](https://i.imgur.com/z4u0Wgr.png) **Sonorants** Sonorants are sounds where the airflow is unobstructed. They are usually voiced. Types include: * **nasals** Complete obstruction of airflow makes air be released through nasal cavity. Places of articulation: ![](https://i.imgur.com/8m8hcBo.png) * **liquids** Combination of obstruction and simultaneous flow of air through oral cavity. Places of articulation: ![](https://i.imgur.com/Wqom8F5.png) * **semi-vowels (or glides)** Similar to vowel articulation but it moves rapidly to another articulation or quickly terminates it. Places of articulation: ![](https://i.imgur.com/vWx046M.png) **A summary** ![](https://i.imgur.com/Z9nOEeB.jpg) ### Vowels * Monophthongs: Have no change in quality * Diphthongs: Have a change in quality within a single syllable. The tongue moves toward a glide position. Another differentiation is possible into **tense and lax vowels**. **Tense vowels** are produced with a general tension of the speech muscles. Those are usually long vowels and diphthongs. **Lax vowels** are produced with a more relaxed speech muscle movement. Those are usually short vowels. The **degree of tongue raising** (high - mid - low) is called the vertical movement of the tongue. The **part of the tongue which is raised** (front - central - back) is called the horizontal movement of the tongue. The **lip movement** can be rounded, open or spread. The combination of those properties results in this schema: ![](https://i.imgur.com/XnwJZcH.png) **English monophthongs** ![](https://i.imgur.com/y2yItpj.png) **English diphthongs** ![](https://i.imgur.com/2kAhomt.png) # 04: Phonology Phonology is the study of sound systems of languages. It is concerned with the **function and patterning** of sounds. Segments whose function is to contrast forms/words are called **phonemes**. These words for example differ only in their initial consonant segments: ![](https://i.imgur.com/KC8hWfB.png) or their vowel segments: ![](https://i.imgur.com/RiTFCz2.png) Phonemes : can be defined as the smallest linguistic unit capable of distinguishing meaning. phonemic : Phonetic differences that are linguistically significant are called _phonemic_. E.g. (**m**oon, **n**oon) $\rightarrow$ /m/, /n/ are phonemes. Minimal pair : consists of two words with distinct meanings that differ only by one phoneme occuring in the same position in the word. E.g. _(**p**in, **b**in)_ form a minimal pair, _(feel_ and _foot)_ do not. Minimal set : consists of two or more words that differ only in one phoneme occuring in the same position in the word. E.g. ![](https://i.imgur.com/KC8hWfB.png) Allophones : are phonetically conditioned predictable variants of a phoneme. They occur either in **_complementary distribution_** or **_free variation_** and must share either the same place or manner of articulation. E.g. ![](https://i.imgur.com/1dQ1bUe.png) Environment : is the phonetic context in which a sound occurs. (e.g. /d/ and /t/ are articulated as flaps when they occur between two vowels). Complementary distribution : Allophones in complementary distribution do not appear in the same phonetic environment.^[Klar, denn in demselben Umfeld würde man ein Allophon, wie hier verschiedene Varianten des /l/, immer gleich aussprechen.] ![](https://i.imgur.com/9S3C2vn.png) phonetic similarity : Sounds that represent diferent realisations of the same phoneme must be phonetically similar. Free variation : when two phonemes occur in the same _phonetic environment_ and do not cause a change in meaning: ![](https://i.imgur.com/arwBBBH.png) Distinctive feature : is what distinguishes one phoneme from another. They cause a change in meaning. E.g. the distinctive feature of **_voicing_**: ![](https://i.imgur.com/aweto0J.png) : distinctive featues can be present (+) or absent (-). Phonemes must differ by at least one feature. ![](https://i.imgur.com/VYjencg.png) Phonetic class : is a group of sounds whose members share one or more phonetic features. A major goal of phonology is to formulate general statements about sound patterns. ## Differences across phonological systems Sound differences can be **redundant** or **distinctive**. | Thai | English | | ---- | ------- | | ![](https://i.imgur.com/1mhZ65M.png) | ![](https://i.imgur.com/vF2sBLO.png) | | different meanings <br>= distinctive | same meaning <br>= redundant | There may exist **segmental** and **sequential constraints** on distinctive featues. A segmental constraint (sequential constraint example omitted): ![](https://i.imgur.com/ELae2Z4.png) Featues that extend to one or more segments are part of **suprasegmental phonology**. Examples feature: the stress^ and pitch^. *[stress^]: Betonung *[pitch^]: Tonlage ### Syllables Syllables are a phonological unit composed of one or more segments. A syllable must contain a **nucleus**. Their structure is as follows: 1. Onset (beginning of a syllable) 2. Rhyme (consisting of _nucleus_ and _coda_) Nuclei often contain one vowel (V) while onsets and codas usually consist of only one consonant (C ). Onsets may contain up to 3 consonants and codas up to 4 consonants. Possible structures include: * CV * CVC * CCCV * CCCVCCCC Phonotactics : The sequential arrangement of phonological segments which occur in a language. It is part of a speaker's knowledge of grammar. E.g.: * Possible initial consonant clusters in english: ![](https://i.imgur.com/qdqjdT8.png) * The first segment in a 3-consonant cluster is always an _s_, followed by a _voicelss stop_ followed by a _liquid_ or _glide_. Stress : The degree of force employed in producing a syllable $\rightarrow$ greater or lesser prominance of the syllable. Example: ![](https://i.imgur.com/UF4i4YX.png) : The stress in english **can be distinctive**: Disyllabic words or compounds vs adj. + noun. Disyllabic words : represent either nouns/adjectives or verbs, depending on stress. E.g.: ![](https://i.imgur.com/ukc6Yty.png) Compounds vs adj. + noun : * Whíte House vs white hóuse * bláckboard vs black bóard Intonation : Pitch may be a phonemic feature in tone languages (like chinese) and thus differentiate between different meanings: ![](https://i.imgur.com/gftwfpD.png) ## Articulatory processes Various process take place. Some are presented here: Vowel reduction : Unstressed vowels tend to be reduced or weakened. In english this affects especially monosyllabic function words. ![](https://i.imgur.com/38QATD3.png) Assimilation : One sound influences the articulation of a neighboring sound such that they become more alike * progressive * regressive: ![](https://i.imgur.com/YcK5HTE.png) * reciprocal: ![](https://i.imgur.com/KKm3iT9.png) Dissimilation : Two neighboring sounds become less similar: ![](https://i.imgur.com/2IWtY2d.png). Many speakers break up the sequence of three fricatives with a stop. Elision : Sounds are omitted/deleted in certain environments, e.g. vowels or consonants in unstressed syllables: ![](https://i.imgur.com/jn6zLH6.png) Insertion : A sound is inserted within an existing string of segments. Like in engish with the intrusive /r/ in _law and order_. # 05: Morphology Morphology : The study of word structure and of the rules involved in word formation. ## What is a word? **Words** are part of a speaker's **linguistic knowledge**. A six year old child knows already ~13.000 words. But words are not enough to know a language ($\rightarrow$ e.g. in spoken language there are no pauses between most words). ### What does it mean to 'know a word' How do we distinguish between words? - not **pronounciation**: _to, too_ $\Rightarrow$ same pronounciation _a door, adore_ $\Rightarrow$ different pronounciation - not **meaning**: _to, too_ $\Rightarrow$ different meaning _teacher, someone who teaches_ $\Rightarrow$ same meaning - not **position**: _The **dog** chased the cat._ _The cat was was chased by the **dog**._ $\Rightarrow$ different positions Using a word requires 4 kinds of information: - _Phonological information_ Its sounds and their sequencing - _Semantic information_ Its meanings - _Morphological information_ How related words are formed (e.g. plural forms or past tense) - _Syntactic information_ Its category and how to use it in a sentence ### Lexical categories **Content words** are nouns, verbs, adjectives and adverbs. They typically **denote concepts** such as _objects, actions, attributes_. They are an _open class_ i.e. words can be added. **Function words** are conjunctions, prepositions, articles, pronouns. They have little or no lexical meaning and **specify grammatical relation**. They are a _closed class_. Function words are processed differently in our brain. We saw this in the example where we had to count the 'F's in a sentence and **systematically overlooked** the 'F's in function words. ![](https://i.imgur.com/qNNmn5X.png) ## Word parts Morpheme : Morphemes are the smallest meaning-bearing units of language. They are an arbitrary combination of sound and meaning and its meaning must be constant. : **Simple words** consist of one morpheme _(e.g. orange, uncle, ask)_. They are called **monomorphemic**. Words like un-true, tru(e)-th-ful-ly are **complex words** consisting of multiple morphemes. ![](https://i.imgur.com/UMFqrgT.png) We categorize morphemes into 3 big categories: Lexemes : can occur in isolation and be a word by itself. They're also called _free morphemes_. : Grammatical lexemes seem to correspond to function words while lexical lexemes seem to correspond to content words. Affixes : need to be attached to another morpheme. They can never appear on their own and are also called _bound morphemes_. Allomorphs : are alternate realisations of a single morpheme. _(i.e. selbes morpheme, andere Realisierung)_. : Allomorphs can be **_phonologically conditioned_**: ![](https://i.imgur.com/E3tGIqz.png) : as well as **_morphologically conditioned_**: ![](https://i.imgur.com/4ytxAdC.png) All of them can again be categorized into **_derivational morphemes_** which add lexical information and can change the lexical category of the word, and **_inflectional morphemes_** which can only add gramatical information but not lexical. Special types of affixes include: Prefixes : They occur before other morphemes. E.g. _pre-_ as in _pre-judge_. Suffixes : They follow other morphemes. E.g. plural marker _-s_ as in _dog-s_. Infixes : Morphemes which are inserted into other morphemes. E.g. _-fucking-_ as in _fun-fucking-tastic_. Circumfix : Morphemes that are attached to a base both initially and finally _(i.e. umschließen)_. --- Special types of morphemes that do not fit into above categorization are: Portmanteau morphemes : One morph has several different meanings (e.g. _-s_ suffix can indicate plural or 3rd person present tense) Unique morphemes : E.g. _cran-_, _huckle-_, _rasp-_. They are bound, i.e. have no independent, constant meaning and occur in _compounds_ to distinguish meaning. They also have no grammatical meaning (thus are not an affix). They are **_bound roots_**. Root : is the minimal lexical unit which cannot be analyzed any further. They carry the major component of meaning e.g. _**hunt**-er_, _re-**read**_. Roots can usually stand alone as a word except for **_bound roots_**. > **Warning: Ambiguous terminology** [color=red] [name=Prof] stem : a root combined with an affix. E.g. **_hunt-er_**, **_believ(e)-able_** base : any form to which an affix is attached. E.g. _**blacken**-ed_. Sometimes the base of a word is also the root: _**table**-s_ ## Derivational morphology _A sub-branch of morphology which studies how new words are created from existing words. Derivation deals with the process of word formation._ ![](https://i.imgur.com/2Su8AKS.png) ### Major processes Derivation : features two different categories: Prefixation and Suffixation. Prefixation : Prefixes are attached. The lexical category may or may not change because of that. : _un-_ + Adj $\Rightarrow$ Adj _(unkind, untrue, ...)_ _en-_ + Adj $\Rightarrow$ Verb _(enlarge, enrich, ...)_ Suffixation : When attaching suffixes to a base, the lexical category almost always changes. : Noun + _-ish_ $\Rightarrow$ Adj _(childish, doggish, ...)_ Compounding : Two or more lexemes can be combined to create compounds. : Adj + Adj $\Rightarrow$ Adj _(bittersweet)_ Verb + Adj $\Rightarrow$ Adj _(fail-safe)_ : Semantic types of compounds are: - _endocentric_ the first part modifies the second part. E.g. _bedroom_ - _exocentric_ both parts refer to an unspecified head. E.g. _skinhead_ - _appositional_ both parts give different descriptions of the same referent. E.g. _maidservant_ - _copulative_ the meaning is the sum of the meaning of the two parts. E.g. _bittersweet_ Conversion : New lexemes are formed changing the lexical category of an existing lexeme. E.g. : Noun $\Rightarrow$ Verb: _bookmark_, _e-mail_ Verb $\Rightarrow$ Noun: _update_, _play_ ### Minor processes ![](https://i.imgur.com/2Su8AKS.png) Clipping : Shortening a lexeme without a change of meaning or lexical category. : _advertisement $\Rightarrow$ ad_ _laboratory $\Rightarrow$ lab_ Blending : Blending (mixing) two lexemes into a new one. : _breakfast & lunch $\Rightarrow$ brunch_ _smoke & fog $\Rightarrow$ smog_ Initialisms : Taking the initial letters of several words. The pronounciation is different for **_Acronyms_** and **_Alphabetisms_**. : Acronyms: * _NATO $\Rightarrow$ North Atlantic Treaty Organization_ * _DOS $\Rightarrow$ Disk Operating System_ Alphabetisms: * UK $\Rightarrow$ United Kingdom * TV $\Rightarrow$ television Back-formation : Removing a morpheme from an existing lexeme. : _editor $\Rightarrow$ edit_ _bioengineering $\Rightarrow$ bioengineer_ Coinage : This is part of the _Other_ category. It means a word being created _ex nihilo_ (from nothing). Often brand names. E.g. _Sandwich, kleenex, Tesa, Tempo_ Reduplicatives : This is part of the _Other_ category. It describes a compound with parts that are either identical or only slightly different. It is highly informal or familiar. E.g. _child-speech: jim-jams, tick-tock_ ## Inflectional morphology _Inflectional morphemes add grammatical information. They create a different **word form** of the **same lexeme**. They also don't change the lexical category of words. They are productive^[According to the internet _productivity_ means in this context: those morphemes have a predictable usage/meaning. Like the enlish plural _'s'_. One can easily form a plural form by attaching the _'s'_ and apply this as a rule.]. Their function is marking relationships between words._ More concretely: - They can mark **tense and aspect**. > _Peter pick-**ed** red cherries._ $\Rightarrow$ past tense _Peter **has** pick-**ed** red cherries._ $\Rightarrow$ present (tense) perfect (aspect) - They function as agreement markers: - **person and number** > _Spanish: (yo) escrib**o** (nosotros) escrib**imos**_ - **gender** > _French: un pantalon vert vs un**e** pomme vert**e**_ - **case** > _German: Der Ball **des** klein**en** Kind**es**_ ### Types of morphological systems The main categorization of morphological systems is into _**synthetic** and **analytic** systems_. Synthetic : Rich inflextional system with many word forms for each lexeme. The word order is relatively free because inflection mark syntactic relations. Synthentic language types can be further differentiated into _**inflectional** and **agglutinating** types_. _E.g. Kiswahili_ Analytic : Poor inflectional system with few word forms for each lexeme. The word order is fixed because it marks syntactic relations. _**Isolating** types_ are a subcategory of analytic languages. _E.g. English_ Inflectional : Different grammatical information can be encoded by one morph. A clear segmentation into morphemes is not easily possible. _E.g. Latin_ ![](https://i.imgur.com/QwHwHzT.png) Agglutinating morphology : Words can have affixes. Typically each affix encodes a distinct meaning. Thus words can be easily segmented into morphemes. _E.g. Turkish_ ![](https://i.imgur.com/2ppZ3OV.png) Isolating morphology : Each word thends to be an isolated single morpheme. Inflectional and derivational morphemes are both missing. _E.g. Chinese:_ ![](https://i.imgur.com/ll49DUy.png) ## Inflection vs Derivation Overview ![](https://i.imgur.com/tvm6OR1.png) ## Summary slide * a _morpheme_ is the smallest meaning-bearing unit of language * words consist of one morpheme or many * in the mental lexicon each morpheme contains information about sounds, related words, phrasal co-occurence patterns, and meaning * there are _free morphemes_ and _bound morphemes_ * bound morphemes comprise _inflectional_ and _derivational affixes_ * affixes can be divided into _prefixes, suffixes, infixes, circumfixes_ *[comprise]: beinhalten # 07: Syntax Syntax : Syntax is the study of sentence formation. It is a system of _categories and rules_ that allow words to combine to form grammatical sentences. Those rules: * specify the _correct word order_ in a language * specify the relationship: _Word order_ $\rightarrow$ _meaning_ * specify _grammatical relations_ between words Grammaticality : Grammatical sentences are sentences which native speakers judge as possible statements. It does not depend on: * whether that statement has been heard before * its semantics The shared characteristics of words allow us to organize words in **_syntactic categories:_** _lexical categories and non-lexical categories_ **Lexical categories:** * noun (N) * verb (V) * adjective (A) * preposition ( P ) * adverb (Adv) **Non-lexical categoriees:** * determiner (Det): _the, a, this_ * degree word (Deg): _very, so, more_ * qualifier (Qual): _perhaps, almost_ * auxiliary (Aux): _may, have will_ * conjunction (Con): _and, but, or_ **Three criteria** help to identify the category of a word. They are usually needed **_together_**: * meaning * inflection* * distribution *[inflection*]: Beugung von Wörtern: Konjugation/Deklination usw. **Meaning** The meaning of words can help categorize them. * _nouns:_ entities, individuals or objects * _verbs:_ actions, sensations * _adjectives:_ properties of nouns * _adverbs:_ properties of verbs However problems exist: Abstract nouns can denote abstract concepts, some verbs may be used as nouns and words of similar meaning can be part of different categories. **Inflection** Inflection is associated with a certain lexical category. _E.g. nouns take -s to form plural_. However inflection does not always reveal the category (e.g. irregular plurals, adjectives that have no superlative: beautifulest). **Distribution** ![](https://i.imgur.com/w6wsYcA.png) 1,2 and 5 belong to the same category. $\Rightarrow$ they can be substituted for one another without loss of grammaticality. ## Constituency and phrase structure Sentences have a hierarchical structure in which words are grouped into successively larger syntactic structures. These structures are called **_constituents_**. Typical constituents are _phrases, clauses._ The hierarchical structure of sentences is as follows: **Sentences $\rightarrow$ clauses $\rightarrow$ phrases $\rightarrow$ words $\rightarrow$ morphemes** **Phrase structure trees** are quite useful becaues they fulfill 3 tasks at once. They consist of a _root, leafs, nodes, and sisters_ (nodes of the same parent node). Their tasks are: * They help visualize the _linear order_ of words * while helping to identify the grouping of words into _syntactic categories_ * and reveal the _hierarchical structure_ of syntactic categories **Simple phrases** A phrase is always built around a _head._ The lexical category of the _head_ decides about the name of the phrase: Noun phrase: NP : ![](https://i.imgur.com/kNT1Kc8.png) $\rightarrow$ Determiner and Noun Verb phrase: VP : ![](https://i.imgur.com/zJOm0tR.png) $\rightarrow$ Qualifier and Verb Adjective phrase: AP : ![](https://i.imgur.com/m3YsvZU.png) $\rightarrow$ Degree word and Verb Prepositional phrase: PP : ![](https://i.imgur.com/mGAqDBw.png) $\rightarrow$ Degree word and Preposition ### Phrase structure It is not possible to have a phrase without a **_head_**. But it is possible to have a phrase with only the head. Phrases may include a _**specifier** (i.e. a determiner, qualifier or degree word)_. They make the meaning of the head more precise. In english they appear at the left boundary of a phrase: ![](https://i.imgur.com/LxGP9dd.png) Additionally phrases may contain a _**complement,**_ that provides additional information about entities and locations whose existence is implied by the head. In english they are attached to the right of the head. ![](https://i.imgur.com/LqDOtTm.png) **Phrase structure rules** ensure that every word occupies the correct position in the structure of a phrase. ![](https://i.imgur.com/z0XFYHW.png) Those rules can be generalized into the **X-phrase rule**. With X being the _head_ one can generalize: ![](https://i.imgur.com/Imefhza.png) ## Constituency tests Constituency tests can be used to find the basic structure of sentences. * **Stand-alone test** If a set of words can occur on its own, e.g. as an answer to a question, they are a constituent. * **Substitution test** If a set of words can be replaced by one word (a pro-form), they are a constituent. ![](https://i.imgur.com/0BSgVWl.png) * **Movement test** Constituents can be moved as a whole to another location in the sentence. ![](https://i.imgur.com/ab9bDgE.png) * **Coordination test** Constituents can be coordinated (connected by a _coordinating conjunction_) with each other if they are of the same syntactic category. ![](https://i.imgur.com/8hJVUWf.png) ## Sentence structure Sentence : is the largest unit of syntactic analysis. It always consists of _NPs and VPs_: _S $\rightarrow$ NP and VP_ Dominance : Every node in a tree _dominates_ the nodes beneath it. Dominance may also be _immediate_ (if it refers to the immediate child node). Grammatical relations : refer to the relationship between syntactic categories and their _function_ in a sentence. ![](https://i.imgur.com/8vpzE0v.png) ### Grammatical relations - **Subject:** _NP_ immediately dominated by _S_ - **Predicate:** _V_ in VP immediately dominated by _S_ - **Direct Object:** _NP_ immediately dominated by _VP_ Example: ![](https://i.imgur.com/HbjoJhv.png) **Adverbs** are a lexical category that serve as _head of AdvPs._ * **Adverbials** * consist of either _**AdvP** (completely, often, quickly)_ or _**PP** (in an hour, last week)_. They may also occur in the form of an **adverbial clause.** Some examples: ![](https://i.imgur.com/xX1bO9m.png) * can serve as **_VP modifier_** They are then sisters to _VP_. They usually occur on the left, but may also appear on the right as adverbials of time and manner. ![](https://i.imgur.com/9enfoaq.png) ![](https://i.imgur.com/tDuOsFo.png) * can serve as **_sentence modifier_** They then occur as sisters to _S_. ![](https://i.imgur.com/5dnT3Fg.png) ### Clauses **Clauses** have a subject-predicate structure and can serve as sentence consituents. - finite clauses The predicate _V_ is inflected and tensed. ![](https://i.imgur.com/PRQZlYI.png) - non-finite clauses The predicate _V_ is not tensed. ![](https://i.imgur.com/tCcmaxV.png) **Adverbial clauses** (blue) are clauses that start with an _adverbial subordinator (while, if, because, ...)_. They specify the circumstances described in the **main clause** (red). ![](https://i.imgur.com/UFKdHzY.png) ![](https://i.imgur.com/k8wE6cF.png) **Complement clauses** are embedded phrases that start with a _complementizer: Whether/if, how, that._ ![](https://i.imgur.com/aSm8TuF.png) ![](https://i.imgur.com/GbQ4fGR.png) **Recursiveness** Sentences are unbounded. They can be infinitely expanded. **Auxiliary verbs (Aux)** designate possibility, obligation, futurity or permission. &nbsp; ![](https://i.imgur.com/cGKaQux.png) ### Restrictions Restrictions exist. Not all verbs take a compliment! * **_transitive verbs_** require a direct object _(NP)_ ![](https://i.imgur.com/K61r49k.png) * **_ditransitive verbs_** require both a direct object _(NP)_ and an indirect object _(PP, or NP)_ ![](https://i.imgur.com/LS2qqTu.png) * **_intransitive verbs_** do not take a direct object ![](https://i.imgur.com/NyawZxL.png) * some verbs may be both transitve and intransitive ![](https://i.imgur.com/0YqUjIi.png) * some verbs take an _AP_ or a sentence _CP_ ![](https://i.imgur.com/ndKf8bl.png) **Subcategorization and Selection** * some words take more than one compliment * heads occur in structures where they have compatible complement phrases (although some heads don't need one) * a word can belong to more than one subcategory Selection (lt. Wikipedia) : Predicates may only go with some complements. This is _semantic_ restriction. &nbsp; $\checkmark$ _Sam drank a coffee._ &nbsp; $\times$ _Sam drank a car._ Subcategorization (lt. Wikipedia) : Words (but mostly verbs) may require the co-occurence of a certain _syntactic_ argument (e.g. a direct object or an indirect object). ![](https://i.imgur.com/K61r49k.png) Another restriction is the sometimes occuring **structural ambiguity**. Sentences may sometimes be analyzed in different ways: ![](https://i.imgur.com/Blix9YC.png) ![](https://i.imgur.com/1sIFZPj.png) ## Syntactic operations So far we analysed simple statements. Syntactic operations / transformations refer to the things that can be derived from those statements. Yes-no questions : We can simply move the Aux to the fron to create a yes-no question. ![](https://i.imgur.com/KwsMTdr.png) ![](https://i.imgur.com/TwIe0BO.png) Wh-questions : Are questions starting with _what, who, which, why, how, when, where_. To perform the transormation we use the _CP_ node. **_All senctences have a CP node_**. It contains either a complementized or information about wheter a sentence is a question or not. For the transformation simply replace the _NP_ with a _wh-phrase_ and change its position. ![](https://i.imgur.com/CnAtGCH.png) ![](https://i.imgur.com/MHKrECp.png) Negation : _Not discussed._ Imperatives : _Not discussed._ # 09: Data analysis, visualization and presentation :::info * Look at the slides again * Do the case study with AntConc from Slide 31ff. ::: ## Obtaining data Raw frequency : simple frequency counts Relative frequency : frequency counts per corpus size ![](https://i.imgur.com/ZMEZW1o.png) When doing corpus queries one has to be careful about the bottom ranks of a query: It is usually filled by **_hapax legomena_**. Those are words that only occur once in the corpus. ## Analyzing quantitative data Data is organized in a spreadsheet. The steps are: 1. Saving: Save the data in a spreadsheet 2. Sorting: Find a clear structure * Independent variable leftmost * Dependent variable (e.g. frequencies) in the right columns * Add columns for later annotation 4. Taking stock: Count the hits and evaluate them 5. Manual screening: Delete irrelevant hits or mark them as false 6. Annotation: Categorize and annotate your data ## Do it yourself ## Visualizing quantitative data You can present data: * _verbally_ in the **text** * Include references to graphs and tables * illustrated: * _numerically_ in a **table** $\rightarrow$ details / important information * _visually_ in a **graph** $\rightarrow$ trends in information $\rightarrow$ bar-charts, line-graphs or scatterplots only! _Word clouds are not scientific!_ Bar charts : ![](https://i.imgur.com/YniDEt1.png) Line graphs : ![](https://i.imgur.com/iwtyEio.png) Scatterplots : ![](https://i.imgur.com/Y9Qg0ep.png) # 10: Semantics A random introduction to semantics. True synonyms : E.g. _quick_ and _fast._ Even though they may have different co-occurence patterns, those are true synonyms: _quick talker_ and _fast worker_. Converses : What are _uncle_ and _nephew_? Certainly not opposites. $\rightarrow$ They are converses. Among other meanings, hot means ‘not cold’ but nephew is not the same as ‘not uncle’. Opposites are usually extremes of words that can be arrayed along a continuum. Converses are not necessarily. Semantics : Is the study of _meaning_ in human language. All the other disciplines like phonetics, phonology, morphology, syntax are concerned with form. ### What is meaning? If two sentences describe the same thing, they are **synonymous**. One sentence may also imply another, but not vice versa (see below). ![](https://i.imgur.com/ojgaYJ2.png) The meaning can be broken in many ways. Sentences can be a **contradiction** if partial meanings exclude one another. If a sentence is not a contradiction but just does not make any sense semantically, it is **anomalous**, e.g. ![](https://i.imgur.com/b3WHeKa.png) Sentences' meaning may also be not entirely clear. If a word in a sentence (like _duck_) can refer to different things (like _bending over_ or the animal), the meaning is **ambiguous**. On the other hand, if a sentence is just imprecise, e.g. referring to things that are not properly introduced (like _the_ pie), it is called **vague**. ![](https://i.imgur.com/tIO76M0.png) $\rightarrow$ Meaning is a multi-faceted notion. Linguistics differentiates 3 different types of meaning: _Linguistic meaning, social meaning, affective meaning_. When trying to **define meaning**, one finds it is not that easy. Feridnand de Saussure first introduced the notion of **signifier** (the symbol) and the **signified** (the concept). Meaning may be determined by its **usage conditions**. But also it may be derived from the emotional associations which are suggested by the world. The last explanation implies that different meanings of words may also exist that are restricted to certain contexts, e.g. meanings specific to culture, region or social class. Linguistic meaning : encompases _sense_ and _reference._ * _reference -_ the meaning is the entity the word refers to: ![](https://i.imgur.com/zDZ54sM.png) * _sense -_ The idea or concept a word expresses: ![](https://i.imgur.com/qGu9vLy.png) Social meaning : is communicated through sentence meaning, word choice and pronounciation. It contains information about the _identity of the speaker._ ![](https://i.imgur.com/YCuU7ZN.png) Affective meaning : conveys the speakers stance, attitudes, opinions expressed through intonation and word choice. ![](https://i.imgur.com/QDKCTos.png) ## Overview of semantics Branches of semantics are: * _lexical semantics (paradigmatic semantics)_ studies the meaning of words (lexemes) and how they are related to each other $\rightarrow$ **semantic networks** * lexical-sense relations: _hot_ vs _cold_, _hound_ vs _dog_ * semantic fields: Structuring vocabulary into fields with related meanings * _syntagmatic semantics_ studies the meaning of words in combination $\rightarrow$ **sentence meaning** * selection[^see_syntax] restrictions: * meaning of complex sentences $\rightarrow$ **compositionality** * _pragmatics_ studies the meaning of words or sentences in context. The above parts of semantics all disregard the context! * **utterance meaning** [^see_syntax]: as defined at the end of the syntax chapter. Collocations : are frequently co-occuring words or word patterns. The _selection restrictions_ are particularly strong with less commonly used words: _herd of cattle, school of dolphins._ Compositionality : states that the meaning of a sentence is determined by the meaning of its component parts and the manner in which they are arranged in their syntactic structure. : It ensures that language users can interpret any number of sentences even if they have never encountered them. Utterance meaning : changes depending of the contextual circumstances. It is independent of the semantic meaning of the words/sentence. E.g. ![](https://i.imgur.com/4VE3D0v.png) &nbsp; $\rightarrow$ which window? who is _you_? ## Analysing meaning Meaning can be analyzed in two ways: * **semasiological approach** studies a given form of a word. It tries to reason: What does a given word mean? * **onomasiological approach** studies it the other way round: Which forms express a given meaning? Componential analysis : The meaning of a word can be broken down into smaller units. Word meaning is defined as bundle of smaller meanings. ![](https://i.imgur.com/MXgfmkD.png) : It can be applied e.g. in verb subcategorization[^see_syntax] to formulate rules that apply for it. ![](https://i.imgur.com/cv7kdqd.png) ## Lexical semantics By studying the relationship of words (to other words), lexical semtantics aims to build a model for the structure of a lexicon. &nbsp; ![](https://i.imgur.com/PYhGW10.png) **Lexical relations** or sense relations describe the meaning relations between words. The sense a word has, reveals itself by the relations of meaning a word has with its neighbors. Synonymy : describes high semantic similarity. It can be descriptive or total. Synonymous words have the same semantic features but are not always usable interchangeably. They can still differ subtly, e.g. in connotations. Antonymy : describes opposites. It implies the exclustion of at least one semantic feature. There are different types of antonymous words: * **_complementary antonymy:_** pairs of antonymous words ![](https://i.imgur.com/RA6zGrd.png) * **_gradable antonymy:_** antonyms of this type can be placed along a continuum ![](https://i.imgur.com/nNsyveY.png) * **_converseness:_** converses describe _relational_ opposites ![](https://i.imgur.com/g3xVmx8.png) * **_reversiveness:_** reverses describe _directional_ opposites ![](https://i.imgur.com/frG8Dq5.png) Hyponymy : A hierarchical lexical relation. A **hyponym** is subordinate to a **hyperym**. This is like subclassing in OOP. The hyponym contains all distinctive features of the hypernym + some additional features. : _E.g. red, blue, green_ are hyponyms of the hypernym _color._ In this case _red_ and _blue_ are called **heteronyms**. Meronymy : Another hierarchical lexical relation. Describes part-whole relationships. A **meronym** is part of a **holonym** (the whole). : _E.g. mouth, nose, cheek_ are meronyms of the holonym _face_ Polysemy : Polysemous words have two or more different meanings. The various meanings have at least one semantic feature in common. Often the different senses developed through metaphorical or metonymical processes. _E.g._ ![](https://i.imgur.com/hMthOmh.png) Homonymy : Homonymous words look and/or sound identical but have different (and usually unrelated) meanings. _Total homonyms_ have identical grammatical properties while _partial_ homonyms do not. ![](https://i.imgur.com/S8Hp9KE.png) * total: _bank (N)_ ('river bank') - _bank (N)_ (monetary institution) * partial: _bear (N)_ (animal) - _bear (V)_ (to put up with something) :::info Do exercise on Semantics slides, p. 45 ::: ## Cognitive semantics Meaning is linked to human cognition and perception. Ideas are categorized and expressed in words and sentences. **Conceptual categories** are expressed as **linguistic categories**. Understanding **concepts** (such as _restaurant_) requires world-knowledge, that is sometimes even culture-specific. Very few concepts have precise boundaries, most have **fuzzy boundaries**. We use **categorization** to organize our concepts. It is based on similarities. Conceptual categories are **prototype-based** categories^[as mentioned in Einf. in CogSci] $\rightarrow$ Some members of a category are better representatives of a category than others. Metaphor : is a conceptual principle. Based on perceived similarities it permits understanding one concept in terms of another. Often they contain a transfer between different cognitive domains. ![](https://i.imgur.com/2buw8jV.png) Metonymy (lt. Wikipedia^[Definition on the slides was too poorly understandable.]) : A concept is referred to by the name of something closely related * _drink a cup of tea_ $\rightarrow$ we do not drink the cup, but what's inside * _Berlin will not join the summit._ $\rightarrow$ Berlin stands for the german government # 11: Language variation and change :::info Study section on slide 38 contains a task that may be anwered for examp preparation. ::: Differences across the same language are called _**varieties** (e.g. Kiezdeutsch)_. They may be further categorized in **dialects** _(regional varieties)_, **sociolects** _(social varieties)_ and **registers[^registers]**. A **standard variety** is a dialect which has been chosen as official standard (for a mixture of reasons: political, historical, etc. It is not any better than others). [^registers]: A variety of language used for a specific purpose. For example 'official' english vs 'slang' english. Both would be used in different settings. dialect vs accent : A _dialect_ refers to grammar, vocabulary and distinctive pronounciation, whereas an _accent_ refers to distinctive pronounciation only. Speakers of the same dialect may have different accents. idiolect vs lect : An _idiolect_ is the language system found in a single speaker. Everyone has to some extent a _"personal dialect"_. Lect is just another word vor _variety._ ## Sociolinguistics Sociolinguistics studies variation between individuals of different social groups (age, gender, ethnicity, education). Random paper: Example : A guy (Labov) found out that the _rhotic /r/_ is more frequently used in higher social classes. ## Variation and change This - mainly corpus based research - studies historical and synchronic variation across text and speech. It may focus on: - _real-time language change:_ usage of _'agile'_ or _'uptight'_ - _long-term language change:_ different passive use over time Geographic variation : denotes language as spoken in different areas. _E.g. American vs. British English_ World englishes : English is spoken around the world. Someperson^[Braj Kachru] differentiates 3 circles of english-speaking countries: 1. _Inner circle_ UK and the first round of colonisation (USA, Canada, Australia), where english is the first language spoken. 2. _Outer circle_ Second round of colonialisation (India, Pakistan, ...), where english was used as **lingua franca[^lingua_franca]**. English is often still used as the or an official language. 3. _Expanding circle_ The rest of the world. [^lingua_franca]: A language that was used initially used among speakers of different origins for trade. ### eWAVE categorization eWAVE categorizes English varieties into different levels: * **Traditional dialects**: Long-established mother-tongue varieties like east-anglia, north of england English * **High-contact L1 varieties**: Transplanted L1 Englishes or colonial standards. Formed by settlers within the past 400 years. Examples: Colloquial BrE, AmE; Irish English; Malta English * **L2 varieties**: Non-native varieties, with not many native speakers. Emerged in countries where english was introduced in the colonial era. Examples: Indian English, Jamaican English * **Pidgins[^pidgin]**: Contact varieties that developed in trade colonies for purpose of communication, not sharing a common language _(lingua franca)_. Usually restricted to certain domains, but may over time acquire native speakers and enter further domains of use _**(expanded pidgins)**._ * **Creoles**: Contact varieties that developed with very limited exposure to native speakers. Usually strong pressure to use English as socio-economic superior language. Many creoles have become the native language of the majority of the population. [^pidgin]: Acc. Wikipedia: A grammatically simplified means of communication that was used by speakers of different origins as lingua franca[^lingua_franca]. Random paper: Variation in Canadian English : A current paper looked at the use of general extenders (like _"and stuff", "or whatever"_) in Canadian English. It found out that _stuff_ is predominant, and so especially in youth language. It is used as lexical replacement. ## Text linguistics Text linguistics is a type of **register analysis** based on corpora that sample different text types. It studies the linguistic properties of registers in functional terms. > _**Example registers from the Brown Corpus**_ > * _Press: reportage_ > * _Press: reviews_ > * _General fiction_ > * _Science fiction_ Random paper: Variation across speech and writing : A guy (Biber) looked at systematic differences between speech and writing. He looked at the frequency of 67 lexico-grammatical features of English across different registers. He put the registers along a continuum. Dimensions of variation in English: a continuum : One may notice that registers may be sorted along a continuum. The different dimensions include: *involved vs informational production*, *narrative vs non-narrative*, *explicit vs situation-dependent reference* (and some more). ## Variation in English Let's look at some differences of British English and American English. They are the standard varieties taught to learners. Both varieties are idealisations ### Pronounciation Rhotic accent : In AmE _/r/_ is pronounced in all positions while in BrE it is pronounced only immetidately before vowels. See: _arm, better._ /ae/ vs /a/ : In AmE the /a/ is often replaced by the /ae/. See: _after, dance._ intervocalic /t/ or /tt/ as /d/ : AmE typically uses the /d/ while BrE does not. See: _metal = medal, latter = ladder_ No pronounciation of /t/ or /d/ after /n/ and before a vowel : See: _inter city [BrE] = inner city [AmE], understand^[_[unnerstaent] vs [understaand]_]_ /j/ dropped before stressed /u/ : This only holds if the vowel is not initial. AmE frequently drops the /j/. See: _duke, tune, new._ ### Spelling In spelling BrE has a strong tendency to turn true homonyms into heterogryphs _(e.g. cheque vs check)_. Meanwhile AmE tends to shorter spellings _(catalog, color)_. Some differences include: * _-our vs -or_ * _-re vs -er_ * _-oe vs -e, -oe_ * _-gue vs -g_ *[true homonyms]: words that sound the same but mean something different *[heterogryphs]: words that are written entirely differently than they're pronounced ### Vocabulary There are differences in vocabulary and terminology due to cultural differences. There exist shared experessions that have additional specific expressions: * _while_ (BrE _whilst_) * _autumn_ (AmE _fall_) while also shared expressions that have different meanings: * _pants, shorts, vest_ * AmE _line_ vs BrE _queue_ ### Grammar The major differences occur in the verb phrase. * **past participle of _to get_**: BrE _got_ vs AmE _gotten_ * **irregular verbs**: tend to be regularised in AmE (e.g. _burn, learn, spell_) * **perfect**: AmE prefers past tense over present perfect for indefinite past use. It allows simple past with _just, recently, already_ * **voice**: _get_-passives preferred in AmE But also some, not neccessarily regarding the verb phrase: * **collective nouns** are singular or plural in AmE, but may only be plural in BrE: _The government/police/family is/are divided._ * **prepositions**: _live on the street_ (AmE) vs _live in the street_ * **article use**: AmE: _"be in **the** hospital"_ # 12: Language Classification The first question is: what to classify? What is a language (and what 'just' a variety/dialect) is often imposed by political, cultural or religious issues. Therefore the notion of _mutual intelligibility_ is introduced. Mutual intelligibility : Speakers can understand ech other in most cases (except for a few words or a particular pronunciation). Two varieties are considered different languages if speakers cannot understand each other (like in many Chinese dialects). There are around 6.500 languages in the world today. 25% of those have fewer than 1.000 speakers and 4% of them are spoken by as much as 96% of the speakers. Languages may die when all its speakers die or when they switch to another language. They may also revive when used more widely again. ## Language classification Genetic classification : is based on language families and language relationships. We know, from documents and reconstruction, that most of the languages spoken in Europe are related to each other. Areal classification : is based on the geographical area where languages are located. Languages that are not genetically related but that are in close contact sometimes exchange certain patterns. Such areas are also called **Sprachenbund** (sic!). Typological classification : is based on the characteristics of language, e.g. based on word order patterns. Linguistic typology searches for universals: * _absolute universals_ (that occur in all languages) * _universal tendencies_ (that occur in most languages but not in all) * _implicational universals_ (the presence of a feature determines other features) --- Markedness^[Slides are very bad. Not even with help of the internet I found out what she meant.] : Some marked characteristics are rarer and more complex than their unmarked counterparts. They only occur if the unmarked is also present. Phonological clasification : is based on the vowel system. Most languages have 3-9 vowels, most common are 5 vowels. Consonant inventories : can have a very big variation. The range is from 8-90 consonants. Some general characteristics apply^[The details didn't seem very important to me. If still interested, see slide 18 in classification.pdf.]. Syllable structure : denotes the combination of vowels (V) and consonants (_C_) in a syllable. E.g. all languages allow V and CV. ![](https://i.imgur.com/sFV9y1m.png) Morphological classification : Different systems for the combination of morphemes exist. Systems differ according to their state of _**synthesis** (how much languages use inflectional/derivational affixes)_ and _**fusion** (how much morpheme boundaries in words are fused together)_. --- Analytic languages : tend to encode information through single morphemes, but they also employ affixes. English is rather analytic. A special case are the **_isolating_** languages where each word is an isolated single morpheme. Those languages lack inflectional and derviational morphemes. Synthetic languages : extensively use affixation to encode information. They have rich inflectional systems. Examples feature Latin or Greek. Languages with extreme sysnthesis are called _**polysynthetic**_. Agglutinating languages : use affixation to encode information. Individual morphemes are easily segmentable and each morpheme encodes a distinct grammatical meaning. Examples feature Turkish or Japanese. Inflectional languages : (also called fusional languages) also use affixation to encode information. However a clear segmentation into individual morphemes is not possible. One morpheme may encode different grammatical information. Examples feature latin, Russian, German. --- Synthesis and fusion : Synthetic languages are very often agglutinating, but they can also be inflectional/fusional. : Fusion does not apply to isolating languages because they simply do not combine morphemes to form words. Syntax : Languages are classified according to word order (subject S, verb V, object O). The most common types are SVO, SVO and VSO. They all have the subject before the object, but exceptions to this exist! ## Language families 142 lanuage families exist in total. The most important ones are: Indo-European (~450 languages), Niger-Congo (~1500 languages), Sino-Tibetan (~450 languages), Afro-asiatic (~400 languages), Austronesian (~1200 languages), Trans-New guinea (~500 languages). The language families with the most languages are: * Niger-Congo (~1500 languages) * Austronesian (~1200 languages) When it comes to speakers, the biggest families are: * Indo-European (3.2B) * Sino-Tibetan (1.4B) Some proposals exist to group families even further into higher-order (macro) families however they also have critics. The grouping is fuelled by the idea that all languages originate from one single language: **Proto-World** or Proto-Sapiens. But this is basically a matter of belief because it is currently impossible to establish language relatedness that far back. Languages change too quickly. * Pronounciation and words change very fast * Cross-linguistic similarities may not be the result of contact but coincidence or similar tendencies in all human languages # Glossary ```shell # Automatically generate glossary from anki-cards export cat 'Lingu Glossar unformatted.txt' | sort | sed -E "s/([\w ]*)\t(.*)\t*([\w \d]*)\t*/\1\n: \2 \3\n\n/" | sed -E 's/"//g' | sed -E 's/style=(.*);>/style="\1;">/g' | sed -E 's/src=(.*)png>/src="\1png">/g' > ./markdown_glossary.md ``` [Intro to Linguistics Glossary](/WFe_wGcLQTuZkKZ-H0zplw) ###### tags: `2020` `overview` `recap` `TU Darmstadt` `Linguistics` `Linguistik` `Zusammenfassung` `CogSci` _Proud author of this recap: B.M._