Würzburg "Steckbriefe" 2019-07-06

--- tags: wuerzburg2019, diacollo --- # Würzburg "Steckbriefe" 2019-07-06 [TOC] ## 1. Martina Hoffmann - *Andersartigkeit* - *Homosexualität* - Topic: "Andersartigkeit" - "Homosexualität" - Problem: lexical realization / paraphrases & (historical) code - neither GermaNet nor OpenThesaurus are helpful here - you could potentially edit OpenThesaurus yourself, but that's probably [BTSOTS](https://www.tandfonline.com/doi/abs/10.1080/07350198.2012.684002) (="beyond the scope...") - Remarks - basically an **onomasiological** approach - suggest refinement, (semasiological) approximation, or alternate method (e.g. sampling) - Ideas - query multiple terms with `{...}` - [dta:{unmännlich,...,widernatürlich}](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D+&slice=50&kbest=30&profile=tdf&format=cloud&global=1&groupby=l) - for aggregation by "Wortfeld" (here I'm interpreting this as something like "normative attitude" ~ positive vs. negative), try GermaNet synsets "bewertungsspezfisch" (rsp. "positiv" vs. "negativ) - [positiv (s91)](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds91%7Cgn-asi%3D2&_s=submit&date=1600%3A1999&slice=100&score=ld&kbest=20&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - [negativ (s214)](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds214%7Cgn-asi%3D2&_s=submit&date=1600%3A1999&slice=100&score=ld&kbest=20&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - [bewertungsspezifisch](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Dbewertungsspezifisch%7Cgn-asi%3D2+&_s=submit&date=1600%3A1899&slice=100&score=ld&kbest=20&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - [diff (s91 vs 214)](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlichUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds91%7Cgn-asi%3D2+&_s=submit&bquery=%7Bunm%C3%A4nnlichUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds214%7Cgn-asi%3D2+&date=1600%3A1899&slice=100&bdate=1600%3A1899&bslice=100&score=ld&kbest=20&diff=adiff&profile=diff-ddc&format=cloud&groupby=l&eps=0) - Q&A - DiaCollo visualization for "external" data (e.g. from multiple queries) - not (yet) implemented; requires a bit of JavaScript hacking --- ## 2. Džana Fajić - *Bedrohung* - *Terrorismus* - Topic: "Bedrohung" - "Terrorismus" - Remarks: - these are actually pretty good candidates for DWDS Wortprofil - [Bedrohung](https://www.dwds.de/wp?q=Bedrohung) - [Terrorismus](https://www.dwds.de/wp?q=Terrorismus) - Q&A - _"große Wortwolken über mehrere Korpora"_ - see slide #79 (~54) "Fiendishly Awkward Questions / Can I use DiaCollo to directly compare different corpora?" - _"der Kollokationsrahmen muss für Argumentation sehr groß sein. Wie geht das?"_ + I don't understand this question - maybe you should look at: * DiaCollo [TDF relation](https://kaskade.dwds.de/dstar/dta/diacollo/help.perl#prf-tdf) (aka "term-document matrix": co-occurrence in a single "document" (=paragraph)) * DDC/TDF Boolean conjunction operator (`&&`) * DDC `#IN p` and/or `#IN file` operator? * DDC `NEAR(A,B,n)` and `NEAR(A,B,C,n)` limit parameter `n` (max=32 tokens) + Q: _"Wie lässt sich DiaCollo für diskurslinguistische Argumentationsanalysen nutzen?"_ A: I don't know... what are the characteristic (formal and/or observable) properties of a "diskurslinguistische Argumentationsanalyse"? + Q: _"Wie kann man formale (z.B. Konnektoren) und inhaltliche (z.B. Lexeme) für Argumentationen erheben?"_ A: I don't think you can "collect" them (as in "Daten erheben") as a matter of general principle * lexemes (whether lexical or functional) are abstract linguistic objects, and as such exist in our heads ... and maybe "in the (Platonic/rationalist) language-itself" * lexical realizations (tokens; word use) are items in the corpus; see competence/performance distinction * maybe I don't understand what *"erheben"* is supposed to mean here; some remarks regarding your specific examples: - *"diffus"*: maybe try GermaNet synset #4081 ("ungenau"), e.g. [this DDC query](https://kaskade.dwds.de/dstar/zeit/dstar.perl?q=Bedrohung%3D1+%26%26+%24l%3Ds4081%7Cgn-asi+%23sep&fmt=kwic&start=1&limit=10&ctx=8&debug=) - *"Kollektivierung"*: your best bet is to code this yourself; maybe try OpenThesaurus synsets [`weltweit`](http://kaskade.dwds.de/openthesaurus/?q=weltweit) and [`überall`](http://kaskade.dwds.de/openthesaurus/?q=%C3%BCberall): e.g. [`Bedrohung=1 && {weltweit,überall}|ot-asi`](https://kaskade.dwds.de/dstar/zeit/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=Bedrohung%3D1+%26%26+%24l%3D%7Bweltweit%2C%C3%BCberall%7D%7Cot-asi+%23sep&_s=submit) - *"Superlative"*: DWDS corpora don't offer morphological annotations; your best bet is to use a suffix-list or regex, e.g. [`$p=ADJA &= /ste?[ernm]$/`](https://kaskade.dwds.de/dstar/zeit/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=%24p%3DADJA+%26%3D+%2Fste%3F%5Bernm%5D%24%2F&_s=submit) - *"Argumentationen":* + DWDS corpora don't offer semantic annotations + you can use the $p (part-of-speech) field to search for STTS tags (but the tagger makes errors too) + try e.g. DDC [`$p={K,PAV}*`](https://kaskade.dwds.de/dstar/kern/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=%24p%3D%7BK%2CPAV%7D*&_s=submit) to find conjunctions & pro-adverbs: + or use a set-valued term-query, e.g. `{weil,deshalb,deswegen,obwohl,dennoch}` - Example: + if I've understood correctly, you might try: * use a large co-occurrence window around your target ("Bedrohung") (e.g. DDC `#in p` -> paragraphs) - DiaCollo TDF won't work for the DWDS corpora here --> no function words in TDF index! * assume presence of a conjunction (`$p=KON`) indicates an "argumentative" context * assume everything else in a matching document is a potential "argumentative" collocate + in the ZEIT corpus, this might get us something like [this DiaCollo query](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=near%28Bedrohung%2C%24p%3DKON%2C%24p%3DADJ*%3D2%2C4%29+%23fmin+2&_s=submit&date=&slice=10&score=ld&kbest=10&cutoff=&profile=ddc&format=html&groupby=l&eps=0) * QUERY: `near(Bedrohung=1,$p=KON=1,$p=ADJ*=2,4) #FMIN 2` - 3-argument "NEAR()", at most 4 tokens between - collocant condition "Bedrohung" (lemma) - collocant condition $p=KON (part-of-speech) - collocate placeholder $p=ADJ*=2 (any adjective) * PROFILE: DDC * GROUPBY: l + you could also search for e.g. noun collocates and selected (causal) connectors, e.g + [`near(Bedrohung,{weil,deshalb,deswegen,denn},$p=NN=2,4) #fmin 1`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=near%28Bedrohung%2C%7Bweil%2Cdeshalb%2Cdeswegen%2Cdenn%7D%2C%24p%3DNN%3D2%2C4%29+%23fmin+1&_s=submit&date=&slice=10&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + ... alternatively, you could require "Bedrohung" to precede the connector: - [`"Bedrohung #4 {weil,deshalb,deswegen,denn} #4 $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=%22Bedrohung+%234+%7Bweil%2Cdeshalb%2Cdeswegen%2Cdenn%7D+%234+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=10&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + Limitations * ddc NEAR() operator only supports up to 3 arguments - ... and we need to save 1 argument position for the collocate (=2) * this is a *VERY* expensive query: ***please*** use these techniques with caution & consideration! - raising the NEAR() limit (above -> 4 tokens) raises complexity ***exponentially*** - expect timeouts and/or `std::bad_alloc` errors from stuff like this --- ## 3. Nadine Löhle - _Angst_ (indivduell vs. kollektiv) - Topic: "Angst" (individuell vs. kollektiv) - Ziel: _"diachrone Erhebung des Schlagwortes 'Angst' ... in seinen wortfeldhaften und argumentationsbezogenen Kontexten"_ - **?**: what is a "wortfeldhafter" and/or "argumentationsbezogener Kontext"? - it sounds to me like this is an **onomasiological** question - neither I as a CL programmer nor DiaCollo can help with that as such - Examples + specific examples given are all fairly straightforward for DiaCollo using DDC relation: + *"Angst vor __"* - [`"Angst vor #2 $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+vor+%232+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + *"Angst der __"* - [`"Angst @der $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+der+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) * `@der` -> literal surface form (vs. lemma-search) + *"Angst weil __"* - [`"Angst #4 weil #8 $p=VVFIN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+%234+weil+%238+%24p%3DVVFIN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + "Angst und __" - [`"Angst und #2 $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+und+%232+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) --- ## 4. Nadine Kastenhofer - _Länderstereotypen_ - _Osteuropa_ - Topic: "Ländersterotypen" - "osteuropäische Länder" - Goals: + _"übergreifende Suchanfragen, die Kollokationen zu Gruppen von Lexemen anhand von abstrahierten Formulierungen zeigen"_ + _"weitere Suchanfragen, mit denen man Stereotype finden kann"_ - Ideas: + GermaNet * s34390 ("örtlich_bestimmter_Mensch") * s2245 ("ortsspezifisch") - s1299 ("länderspezifisch") - *s1439 ("osteuropäisch")* -> {polnisch, tschechisch, böhmisch, ...} - s1591 ("regionalspezifisch") - s2359 ("herkunftsspezifisch") * s44177 ("Land","Staat") -> {Afghanistan,Amerika,Agrarstaat, ..., Polen, ..., Zimbabwe} * s34416 ("Europäer") -> {Albaner,Brite,Detuscher,...,Zypriotin} + PoS : `$p=NE` will get tokens tagged as proper names * results will also include person- and organization-names... and probably a fair number of tagger errors too + DIY : use an explicit term-disjunction with "interesting" names, e.g. * `{Böhmen,...,Ungarn}` - Examples: + *"alle [Europäer] sind __"* * *very* rare -- statistical results (DiaCollo &c) will be *unreliable* + *"[Europäer] sind __"* * `"Europäer|gn-asi=1 sind=1 $p={ADJ,N}*=2" #sep` : [DDC/DTA](http://kaskade.dwds.de/dstar/dta/dstar.perl?q=%22Europ%C3%A4er%7Cgn-asi%3D1+sind%3D1+%24p%3D%7BADJ%2CN%7D*%3D2%22+%23sep&fmt=kwic&start=1&limit=10&ctx=8&debug=), [DiaCollo/DTA](https://kaskade.dwds.de/dstar/dta_beta/diacollo/?query=%22Europ%C3%A4er%7Cgn-asi+sind+%24p%3D%7BADJ%2CN%7D*%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + _"obwohl [osteuropäisch]"_ + DDC: `"obwohl osteurpäisch|gn-asi" #sep` + too rare for DiaCollo to be of much help (f_dta=0, f_zeit=7) + fixed-window DiaCollo collocates of selected eastern European adjectives * [`{polnisch,ungarisch,böhmisch,schlesisch}`](https://kaskade.dwds.de/dstar/dta_beta/diacollo/?query=%7Bpolnisch%2Cungarisch%2Cb%C3%B6hmisch%2Cschlesisch%7D&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) * ... not very helpful for "stereotypes" as such **Q&A** - Q: _"Wie kann man Suchabfragen gut zusammenfassen?"_ + A: in DDC e.g. set-valued term queries (e.g. {Haus,Hof,Garten}) or generic "||" (=Boolean disjunction, e.g. ("Haus und Hof" || Garten)) + A: in DiaCollo "native" syntax with "|" (Haus|Hof|Garten) + Qback: what does it mean for you to "collect different queries (well)"? - I can take (say) 3 different queries and write them down on 1 piece of paper; then they're "collected", but I suspect that's not what you want. - Q: _"Wie kann man nach übergreifenden Stereotypen suchen?"_ A: I don't know; stereotypes are (imho) often couched in innocuous-seeming syntax; bald assertions such as "all X are Y" are in fact very rare - Q: _"Was bringt es, wenn man z.B. GermaNet Ländernamen + Verbform 'sein' sucht?"_ + A1: "GermaNet Ländername" (s65176) is a single synset with no hyponyms. A DDC search-term (Ländername|gn-asi) will find all tokens with literal lemma "Ländername" or "Landesname". Probably not what you want; you're probably more interested in s44177 ("Land","Staat") which actually has a bunch of GermaNet hyponyms. + A2: It doesn't get anything for me (=Bryan) personally. What it gets *you* depends on (a) what you want, (b) the specific query you pose ("+ Verbform 'sein'" is not valid DDC syntax; maybe you mean (`ist`) i.e. (`$l=ist`) i.e. (`$l=ist|Lemma`) i.e. (`$l=@{sein}`). If you want to restrict the hits for 'sein' to verb forms, use a `WITH` or `&=` query: (`sein &= $p=V*`). In this case, querying e.g. (`Ländername|gn-asi && sein`) would "get" you just what you asked for: all sentences containing at least one instance of one of the lemmata associated with GermaNet synset s65176 (="Ländername", "Landesname") and at least one instance associated with the lemma "sein" tagged as a (finite or infinitive, auxiliary or lexical) verb. - Q: _"Kann man übergreifend nach Länderbezeichnugn + Weiterem suchen lassen und das Ergebnis in einer Wolke darstellen lassen"?_ + A: probably yes, within limits + AQ: * what exactly does "Länderbezeichnung" mean here? - all instances of any country name? --> try (`s44177|gn-asi`) + requires "DDC" profile-type if used from DiaCollo - all instances of a specific (set of) country names? --> try `{Polen,Böhmen,Tschchien,...}` * what exactly do "übergreifend", "+ Weiterem" and "das Ergebnis" mean here? - trawl near neighbors for common collocates? --> try a simple DiaCollo query like [`{Polen,Ungarn,Böhmen,Schlesien}`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BPolen%2CUngarn%2CB%C3%B6hmen%2CSchlesien%7D&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) - trawl whole paragraphs for common collocates? --> try a DiaCollo TDF query like [`{Polen,Ungarn,Böhmen,Schlesien}`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BPolen%2CUngarn%2CB%C3%B6hmen%2CSchlesien%7D&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - specific observable constructs with a single sentence or paragraph with "placeholders" for potential collocates? - use a DiaCollo DDC query with a placeholder subscript (`=2`) for the collocate positions you're interested in - **beware sparse data!** --- ## 5. Désirée Schneider - _Raumkonstruktion_ - _Wald_ - Topic: "sprachlicher Raumkonstruktion" - "Wald" - Goal: + _"Diachrone Erhebung von Kollokationsfeldern zum Wortfeld ('Frame') WALD"_ + _"Analyse diachron unterschiedlicher Konzepte anhand der erhobenen Wortfelder"_ - Remarks: **rejoice and be glad** + target term "Wald" is quite frequent * we can expect reliable results with compartively narrow epochs ("slices") + target phenomenon is easily identifiable --> we can use fairly simple queries - Ideas: + GermaNet * s43301 ("Wald","Waldgebiet") -> {Auenwald,...,Hain,...,Wäldchen} * s46042 ("Baum") -> {Acajubaum,...,Zimtbaum} - Examples + simple DiaCollo collocations for lemma-set `{Wald,Holz,Gehölz}` (DTA) [`(QUERY:{Wald,Holz,Geholz}, SLICE:50, PROFILE:collocations, GROUPBY:l)`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BWald%2CHolz%2CGeh%C3%B6lz%7D&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) * adjective collocates only: [`(..., GROUPBY:l,p=/^ADJ/)`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BWald%2CHolz%2CGeh%C3%B6lz%7D&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) * adjective colocates only, globally "best" 50: [`(..., GLOBAL:yes, KBEST:50)`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BWald%2CHolz%2CGeh%C3%B6lz%7D&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) + DDC-based adjective collcations using GermaNet s43301 ("Wald") [`QUERY:NEAR(Wald|gn-asi, $p=ADJ*=2, 4) #fmin 2`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=NEAR%28Wald%7Cgn-asi%2C%24p%3DADJ*%3D2%2C4%29+%23fmin+2&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - Q&A + Q: _"Wie kann man unterschiedliche Kollokationsfelder zu WALD vergleichend diachron darstellen"?_ A: I don't think I understand the question. * AQ: What are "unterschiedliche Kollokationsfelder zu WALD"? - what is a "Kollokationsfeld"? (-> maye the set of k-strongest collocates (per epoch) returned by a DiaCollo query?) - how do they differ, if they're all "Felder zu WALD"? + if you want to compare different WALD-lemmata (e.g. "Forst" vs. Gehölz"), you can use a DiaCollo `diff:` relation ... but beware that most of these lexemes are themselves *very infrequent* --> unreliable data + example [("Baum" vs. "Strauch")](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Baum&_s=submit&bquery=Strauch&date=1600%3A1899&slice=50&bdate=1600%3A1899&bslice=50&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0)  --- ## 6. Linnéa Behncke - _(fleichlose) Ernährung_ - Topic: "Ernährung / Ernährungsformen" - "fleischlose Ernährung" - Ideas: + GermaNet s39035 ("Nahrungsmittel","Viktualien") -> {Ambrosia,...,Essen,..,Verpflegung,Würze} - doesn't look very promising to me, unfortunately - Goal/Questions: + Q: "Welche semantischen Netze ergeben sich für Ernährung"? AQ: What is a "semantic net", and how do you hope to identify it (or expect the software to do so)? + Q: "Was unterscheidet sich besonders beim Thema 'fleischlose Ernährung'?" AQ: unterscheidet bzgl. wessen? Was ist die "Kontrollgruppe"? - Speculations & sandboxes: + have a look at DWDS Worptofil (e.g. [`Ernährung`](https://www.dwds.de/wp?q=Ernährung)) + DiaCollo/ibk_web_2016c [`"Ernährung" (slice=0, kbest=50)`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=Ern%C3%A4hrung&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=2&format=cloud&groupby=&eps=0) + [color=#0000ff] requires www.dwds.de credentials; register under https://www.dwds.de/profile/register + DiaCollo/ibk_web_2016c diff [`vegetarisch` vs. `vegan`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=vegetarisch&_s=submit&bquery=vegan&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=&eps=0) + DiaCollo/ibk_web_2016c diff [`{Fleisch,Fisch}` vs. `{Tofu,Soja}`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=%7BFleisch%2CFisch%7D&_s=submit&bquery=%7BTofu%2CSoja%7D&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=&eps=0) * [adj only](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=%7BFleisch%2CFisch%7D&_s=submit&bquery=%7BTofu%2CSoja%7D&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) * [adj only, similarities (diff=havg)](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=%7BFleisch%2CFisch%7D&_s=submit&bquery=%7BTofu%2CSoja%7D&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=havg&profile=diff-2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) + DiaCollo TDF (`Ernährung && {Sünde,Moral}`) * ibk_web_2016c - [TDF collocations](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - [diff vs. `Ernährung`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&bquery=Ern%C3%A4hrung&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-tdf&format=cloud&groupby=l&eps=0) * ZEIT - [TDF collocations](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - [diff vs. `Ernährung`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&bquery=Ern%C3%A4hrung&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-tdf&format=cloud&groupby=l&eps=0) + other ideas * expand target query sets (maybe use GermaNet or OpenThesaurus, if you can find a good synset) * consider collecting & manually categorizing a small sample --- ## 7. Julia Prez - _Sprachliche Gewissheiten_ - _BALKAN_ - Topic: "Sprachliche Gewissheiten" - "Balkan" + _"Es geht dabei um Grenzen und um Zuschreibungen und Wertungen"_ - Zuschreibungen + suche nach "Balkan" mit `groupby:l,p=/^VV/` --> verb collocates + manual ("close") inspection --> "Gewissheiten" oder nicht? - yup, sounds reasonable to me (Bryan) - Hyponyme: + diff (`diff:min`) *"LÄNDERNAME" vs "Balkan"* * **hint**: consider also `diff:havg` - see below + _Zweck: Zugehörigkeit zum Diskurs feststellen; Abgrenzung/Zuordnung Land<->Balkan_ * **hint**: also consider DDC back-end with GermaNet expansion, e.g. [`NEAR(Balkan, s44177|gn-asi=2, 4)`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=near%28Balkan%2C+s44177%7Cgn-asi%3D2%2C+4%29&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=ddc&format=html&groupby=l&eps=0) * GermaNet [s44177 = "Land,Staat" = {`Afghanistan`, ..., `Zimbabwe`}](https://kaskade.dwds.de/dstar/zeit/dstar.perl?fmt=expand-html&q=s44177&x=gn-asi) * surprising(?) result: top collocate = `Afghanistan`: more "balkan" than anything on the Balkan peninsula! - Synonyme + _"Balkaninsel", "Balkanhalbinsel", "Balkan-Halbinsel"_ + _"wenn man jedoch nach diesen sucht_ **[wo/wie? -Bryan]**_, tauchen weit weniger Treffer auf, als die KWIC-Analyse suggeriert"_ - AQ: where & how are you "searching", and what do you mean by "die KWIC-Analyse"? - I suspect you may be noticing the inconsistency between DiaCollo's "native" indices and the DDC search approximations DiaCollo offers as hyperlinks labelled "KWIC" in the DiaCollo web interface * try raising the search window in the (generated) DDC search approximations, e.g. `NEAR(Balkan, Serbien, 4)` -> `NEAR(Balkan, Serbien, 8)` - see also slide #59 (~84): *"Why don't the corpus KWIC links always return exactly f_{12} hits?"* - if you want/need exact results, use the DDC relation with the `#FMIN 1` operator - Grenzen + _Lexeme wie "westlich", Flüsse & Gebirge, Ländernamen, Sprachen, auch Verben wie "annektieren", "umfassen"_ + _"'Balkan' + besagtes Wort muss zusammen gesucht werden können"_ + ... kann es auch: TDF `&&`, DDC `NEAR(...)` or phrase query `"..."` * example (DiaCollo/DDC): [`Balkan && {Gebirge,Fluss}|gn-asi=2 #FMIN 2`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7BGebirge%2CFluss%7D%7Cgn-asi%3D2+%23fmin+2&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - only NE-collocates: [`Balkan && {Gebirge,Fluss,westlich,annektieren,umfassen}|gn-asi && $p=NE=2 #FMIN 2`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7BGebirge%2CFluss%2Cwestlich%2Cannektieren%2Cumfassen%7D%7Cgn-asi+%26%26+%24p%3DNE%3D2++%23fmin+2&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) * example (DiaCollo/TDF): [`Balkan && {westlich,umfassen,einschließen,annektieren}`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7Bwestlich%2Cumfassen%2Ceinschlie%C3%9Fen%2Cannektieren%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - only NE-collocates: [`groupby:l,p=NE`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7Bwestlich%2Cumfassen%2Ceinschlie%C3%9Fen%2Cannektieren%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l%2Cp%3DNE&eps=0) - Q: _"Gemeinsamkeiten in der Kollokationsverhlaten von 2 unabhängigen Abfragen ... diff:min oder diff:avg"_ A: also consider "diff:havg" - less strict than "min" but more sensitive to non-uniformity than "avg" Goals & Questions: - Q: _"Wolken/Listen, die diachron zeigen, wie BALKAN (bzw. JUGOSLAWIEN, ALBANIEN usw.) sprachlich konstruiert wird (als Ausgangspunkt für die Analyse unterschiedlicher Konzepte (Gebiet, Gefahr, Krise, ...)"_ A: looks to me like you're well on your way to that already... - Q: _"Gleichzeitige Suche nach mehreren Wörtern, zu denen gemeinsam Kollokationen vorkommen"_ A: use TDF or DDC back-ends (see above) - Q: _"Möglichst viele Ergebnisse (keine Beschränkungen)"_ + AQ: what kind of "limitations" do you mean? * have you looked at the `kbest`, `cutoff`, and/or `global` parameters? - I suspect that you don't really want **all** potential collocates for every search term - if you really **do** want all potential collocates, you might consider sidestepping the DiaCollo layer and going straight for a DDC count()-query, e.g. [`count(Balkan && $p=ADJ*=2) #by[$l=2] #desc_count`](https://kaskade.dwds.de/dstar/zeit/dstar.perl?q=count%28Balkan+%26%26+%24p%3DADJ*%3D2%29+%23by%5B%24l%3D2%5D+%23desc_count&fmt=kwic&start=1&limit=10&ctx=8&debug=) * have you looked at the `#FMIN` operator for the DDC profiling relation? * have you considered indexing your own corpus with the `-use-all-the-data` option? * for multiple corpora, have you considered using a `list://` URL for the command-line `dcdb-query.perl` tool?  --- ## 8. Bremen - _Autorität_ - Topic: "Autorität" (diachron) - Example: > *"Man glaubts nit/ wie viel ein gelehrter Mann an seiner Authoritet verlust hat/ wenn er nicht von höfflichen zierlichen Sitten ist"* [name=Otto, Melander: 1605] - "Autorität" ⇒ Herrschertugenden; sittliches Verhalten ↝ Legitimierung - _"... untersuchen, bis wann dieses Framing in Diskursen noch wirkmächtig war und wann es an Relevanz verlor."_ - Goals: - _"den Autoritätsbegriff in seinem diachronen Wandel zu beschreiben und zentrale Wendepunkte seiner Denotation und Konnotation zu erfassen."_ - **Remark**: close reading required, esp. for *connotation* - _"... besseres Verständnis für zeitgenössische Diskurse über Autorität zu erlangen, insbesondere ... "soften Autoritarismus"_ - **Remark**: sounds reasonable, but too far removed from observable (corpus) phenomena for me to be able to offer any practical suggestions... sorry. - _"Vom Workshop erhoffen wir uns die Möglichkeit, unser methodisches Wissen für sprachwissenschaftliches Arbeiten zu erweitern, um dieses effizient auf unser Projekt anwenden zu können._ - **Remark**: thanks... I hope that too! - Thoughts, Ideas, & Speculations (<Bryan) - in gernal a promising choice for computer-assisted research - focus = _Autorität_ is a single-token lemma - lexical ambiguity should not be a problem - quite frequent (f_dta=4k, f_zeit=12k, f_web2016c=19k) - "fiddly bits": historical variation (lat. _auctoritas_) - see e.g. [dta/lexdb `l regexp '(?i:auc?torit)'`](https://kaskade.dwds.de/dstar/dta/lexdb/view.perl?select=*&from=l&where=l+regexp+%27%28%3Fi%3Aauc%3Ftorit%29%27&groupby=&orderby=f+desc&offset=0&limit=100) - typically [Zipfian](https://en.wikipedia.org/wiki/Zipf%27s_law) distribution - lemma-type `Autorität` itself ↦ only ca. 66% of all (potentially relevant) tokens in DTA - CAB canonicalization errors, e.g. `Auctorität` (→ exlex entry created; should be fixed in corpus next week!) - foreign material e.g. _auctoritate, auctoritas, l'autorite, ..._ - compounds e.g. _Staatsautorität, Militärautorität, Zivilautorität, ..._ - adjectives e.g. _autoritättisch, autoritätslos, ..._ - ... most of these won't even make it past DiaCollo's compile-time frequency filters - Examples - DWDS Wortprofil [`Autorität`](https://www.dwds.de/wp?q=Autorität) - DTA SemCloud - [`Autorität`](https://kaskade.dwds.de/dstar/dta/semcloud/terms.perl?to=terms&q=Autorit%C3%A4t&k=50&_s=submit) - lemmata which tend to occur in similar distributional contexts (by page) - DDC/DTA - [`Autorität`](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=Autorit%C3%A4t+%23sep&_s=submit) - [Autorität=1 && 'Autorität@100'|sem=2 != Autorität` #ASC_DATE](https://kaskade.dwds.de/dstar/dta_beta/dstar.perl?fmt=html&corpus=&limit=10&ctx=8&q=Autorit%C3%A4t%3D1+%26%26+%27Autorit%C3%A4t%40100%27%7Csem%3D2+%21%3D+Autorit%C3%A4t+%23asc_date&_s=submit) - Time Series - [`Autorität`,+smoothed](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t+%23&grand=1) - [`Autorität`,-smoothed](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t&_s=submit&n=date%2Bclass&smooth=none&sl=1&w=0&wb=0&pr=0&xr=*%3A*&yr=0%3A*&psize=840%2C480&grand=1) - [`Autorität`,-outliers](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t&_s=submit&n=date%2Bclass&smooth=none&sl=1&w=25&wb=0&pr=0.05&xr=*%3A*&yr=0%3A*&psize=840%2C480&grand=1) - ... pretty sparse <1650 and also 1750-1800 - [`Autorität@50|sem`](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t%4050%7Csem&_s=submit&n=date%2Bclass&smooth=none&gr=1&sl=1&w=25&wb=0&pr=0.05&xr=*%3A*&yr=0%3A*&psize=840%2C480) - including distributionally similar lemmata alleviates sparsity problem... but at the cost of precision! - DiaCollo/DTA: "collocations" relation - [`Autorität`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Autorit%C3%A4t&_s=submit&date=&slice=50&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) - [`Autorität`,p=NN,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Autorit%C3%A4t&_s=submit&date=&slice=100&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3DNN&eps=0) - [`Autorität`,p=ADJA,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Autorit%C3%A4t&_s=submit&date=&slice=100&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3DADJA&eps=0) - [`/(?i:auc?torit)/` "collocations",p=ADJA,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%2F%28%3Fi%3Aauc%3Ftorit%29%2F&_s=submit&date=&slice=100&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3DADJA&eps=0) - DiaCollo/DTA : TDF relation (→ document-wide search window) - [`/(?i:auc?torit)/` TDF,p=NN,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%2F%28%3Fi%3Aauc%3Ftorit%29%2F&_s=submit&date=*%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&global=1&groupby=l%2Cp%3DNN&eps=0) - [`/(?i:auc?torit)/` TDF,p=ADJA,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%2F%28%3Fi%3Aauc%3Ftorit%29%2F&_s=submit&date=*%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&global=1&groupby=l%2Cp%3DADJA&eps=0) - DiaCollo/DTA : DDC relation - [`NEAR(Autorität@50|sem, $p=NN=2, 4)`,+global](https://kaskade.dwds.de/dstar/dta_beta/diacollo/?query=NEAR%28Autorit%C3%A4t%4050%7Csem%2C+%24p%3DNN%3D2%2C+4%29&_s=submit&date=1500%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=ddc&format=cloud&global=1&groupby=l&eps=0) - noun collocations with either `Autorität` itself **or** distributionally similar lemmata (uses SemCloud) --- # See also - [Workshop Notes](https://hackmd.io/lxjF_oOFR5-oxGvtbPyS3Q) - kaskade.dwds.de/~jurish/diacollo/