--- tags: wuerzburg2019, diacollo --- # Würzburg "Steckbriefe" 2019-07-06 [TOC] ## 1. Martina Hoffmann - *Andersartigkeit* - *Homosexualität* - Topic: "Andersartigkeit" - "Homosexualität" - Problem: lexical realization / paraphrases & (historical) code - neither GermaNet nor OpenThesaurus are helpful here - you could potentially edit OpenThesaurus yourself, but that's probably [BTSOTS](https://www.tandfonline.com/doi/abs/10.1080/07350198.2012.684002) (="beyond the scope...") - Remarks - basically an **onomasiological** approach - suggest refinement, (semasiological) approximation, or alternate method (e.g. sampling) - Ideas - query multiple terms with `{...}` - [dta:{unmännlich,...,widernatürlich}](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D+&slice=50&kbest=30&profile=tdf&format=cloud&global=1&groupby=l) - for aggregation by "Wortfeld" (here I'm interpreting this as something like "normative attitude" ~ positive vs. negative), try GermaNet synsets "bewertungsspezfisch" (rsp. "positiv" vs. "negativ) - [positiv (s91)](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds91%7Cgn-asi%3D2&_s=submit&date=1600%3A1999&slice=100&score=ld&kbest=20&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - [negativ (s214)](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds214%7Cgn-asi%3D2&_s=submit&date=1600%3A1999&slice=100&score=ld&kbest=20&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - [bewertungsspezifisch](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlich%2CUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Dbewertungsspezifisch%7Cgn-asi%3D2+&_s=submit&date=1600%3A1899&slice=100&score=ld&kbest=20&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - [diff (s91 vs 214)](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7Bunm%C3%A4nnlichUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds91%7Cgn-asi%3D2+&_s=submit&bquery=%7Bunm%C3%A4nnlichUnm%C3%A4nnlichkeit%2CSodomie%2CUnzucht%2Cwidernat%C3%BCrlich%7D%3D1+%26%26+%24l%3Ds214%7Cgn-asi%3D2+&date=1600%3A1899&slice=100&bdate=1600%3A1899&bslice=100&score=ld&kbest=20&diff=adiff&profile=diff-ddc&format=cloud&groupby=l&eps=0) - Q&A - DiaCollo visualization for "external" data (e.g. from multiple queries) - not (yet) implemented; requires a bit of JavaScript hacking --- ## 2. Džana Fajić - *Bedrohung* - *Terrorismus* - Topic: "Bedrohung" - "Terrorismus" - Remarks: - these are actually pretty good candidates for DWDS Wortprofil - [Bedrohung](https://www.dwds.de/wp?q=Bedrohung) - [Terrorismus](https://www.dwds.de/wp?q=Terrorismus) - Q&A - _"große Wortwolken über mehrere Korpora"_ - see slide #79 (~54) "Fiendishly Awkward Questions / Can I use DiaCollo to directly compare different corpora?" - _"der Kollokationsrahmen muss für Argumentation sehr groß sein. Wie geht das?"_ + I don't understand this question - maybe you should look at: * DiaCollo [TDF relation](https://kaskade.dwds.de/dstar/dta/diacollo/help.perl#prf-tdf) (aka "term-document matrix": co-occurrence in a single "document" (=paragraph)) * DDC/TDF Boolean conjunction operator (`&&`) * DDC `#IN p` and/or `#IN file` operator? * DDC `NEAR(A,B,n)` and `NEAR(A,B,C,n)` limit parameter `n` (max=32 tokens) + Q: _"Wie lässt sich DiaCollo für diskurslinguistische Argumentationsanalysen nutzen?"_ A: I don't know... what are the characteristic (formal and/or observable) properties of a "diskurslinguistische Argumentationsanalyse"? + Q: _"Wie kann man formale (z.B. Konnektoren) und inhaltliche (z.B. Lexeme) für Argumentationen erheben?"_ A: I don't think you can "collect" them (as in "Daten erheben") as a matter of general principle * lexemes (whether lexical or functional) are abstract linguistic objects, and as such exist in our heads ... and maybe "in the (Platonic/rationalist) language-itself" * lexical realizations (tokens; word use) are items in the corpus; see competence/performance distinction * maybe I don't understand what *"erheben"* is supposed to mean here; some remarks regarding your specific examples: - *"diffus"*: maybe try GermaNet synset #4081 ("ungenau"), e.g. [this DDC query](https://kaskade.dwds.de/dstar/zeit/dstar.perl?q=Bedrohung%3D1+%26%26+%24l%3Ds4081%7Cgn-asi+%23sep&fmt=kwic&start=1&limit=10&ctx=8&debug=) - *"Kollektivierung"*: your best bet is to code this yourself; maybe try OpenThesaurus synsets [`weltweit`](http://kaskade.dwds.de/openthesaurus/?q=weltweit) and [`überall`](http://kaskade.dwds.de/openthesaurus/?q=%C3%BCberall): e.g. [`Bedrohung=1 && {weltweit,überall}|ot-asi`](https://kaskade.dwds.de/dstar/zeit/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=Bedrohung%3D1+%26%26+%24l%3D%7Bweltweit%2C%C3%BCberall%7D%7Cot-asi+%23sep&_s=submit) - *"Superlative"*: DWDS corpora don't offer morphological annotations; your best bet is to use a suffix-list or regex, e.g. [`$p=ADJA &= /ste?[ernm]$/`](https://kaskade.dwds.de/dstar/zeit/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=%24p%3DADJA+%26%3D+%2Fste%3F%5Bernm%5D%24%2F&_s=submit) - *"Argumentationen":* + DWDS corpora don't offer semantic annotations + you can use the $p (part-of-speech) field to search for STTS tags (but the tagger makes errors too) + try e.g. DDC [`$p={K,PAV}*`](https://kaskade.dwds.de/dstar/kern/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=%24p%3D%7BK%2CPAV%7D*&_s=submit) to find conjunctions & pro-adverbs: + or use a set-valued term-query, e.g. `{weil,deshalb,deswegen,obwohl,dennoch}` - Example: + if I've understood correctly, you might try: * use a large co-occurrence window around your target ("Bedrohung") (e.g. DDC `#in p` -> paragraphs) - DiaCollo TDF won't work for the DWDS corpora here --> no function words in TDF index! * assume presence of a conjunction (`$p=KON`) indicates an "argumentative" context * assume everything else in a matching document is a potential "argumentative" collocate + in the ZEIT corpus, this might get us something like [this DiaCollo query](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=near%28Bedrohung%2C%24p%3DKON%2C%24p%3DADJ*%3D2%2C4%29+%23fmin+2&_s=submit&date=&slice=10&score=ld&kbest=10&cutoff=&profile=ddc&format=html&groupby=l&eps=0) * QUERY: `near(Bedrohung=1,$p=KON=1,$p=ADJ*=2,4) #FMIN 2` - 3-argument "NEAR()", at most 4 tokens between - collocant condition "Bedrohung" (lemma) - collocant condition $p=KON (part-of-speech) - collocate placeholder $p=ADJ*=2 (any adjective) * PROFILE: DDC * GROUPBY: l + you could also search for e.g. noun collocates and selected (causal) connectors, e.g + [`near(Bedrohung,{weil,deshalb,deswegen,denn},$p=NN=2,4) #fmin 1`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=near%28Bedrohung%2C%7Bweil%2Cdeshalb%2Cdeswegen%2Cdenn%7D%2C%24p%3DNN%3D2%2C4%29+%23fmin+1&_s=submit&date=&slice=10&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + ... alternatively, you could require "Bedrohung" to precede the connector: - [`"Bedrohung #4 {weil,deshalb,deswegen,denn} #4 $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=%22Bedrohung+%234+%7Bweil%2Cdeshalb%2Cdeswegen%2Cdenn%7D+%234+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=10&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + Limitations * ddc NEAR() operator only supports up to 3 arguments - ... and we need to save 1 argument position for the collocate (=2) * this is a *VERY* expensive query: ***please*** use these techniques with caution & consideration! - raising the NEAR() limit (above -> 4 tokens) raises complexity ***exponentially*** - expect timeouts and/or `std::bad_alloc` errors from stuff like this --- ## 3. Nadine Löhle - _Angst_ (indivduell vs. kollektiv) - Topic: "Angst" (individuell vs. kollektiv) - Ziel: _"diachrone Erhebung des Schlagwortes 'Angst' ... in seinen wortfeldhaften und argumentationsbezogenen Kontexten"_ - **?**: what is a "wortfeldhafter" and/or "argumentationsbezogener Kontext"? - it sounds to me like this is an **onomasiological** question - neither I as a CL programmer nor DiaCollo can help with that as such - Examples + specific examples given are all fairly straightforward for DiaCollo using DDC relation: + *"Angst vor __"* - [`"Angst vor #2 $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+vor+%232+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + *"Angst der __"* - [`"Angst @der $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+der+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) * `@der` -> literal surface form (vs. lemma-search) + *"Angst weil __"* - [`"Angst #4 weil #8 $p=VVFIN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+%234+weil+%238+%24p%3DVVFIN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + "Angst und __" - [`"Angst und #2 $p=NN=2" #fmin 1`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%22Angst+und+%232+%24p%3DNN%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) --- ## 4. Nadine Kastenhofer - _Länderstereotypen_ - _Osteuropa_ - Topic: "Ländersterotypen" - "osteuropäische Länder" - Goals: + _"übergreifende Suchanfragen, die Kollokationen zu Gruppen von Lexemen anhand von abstrahierten Formulierungen zeigen"_ + _"weitere Suchanfragen, mit denen man Stereotype finden kann"_ - Ideas: + GermaNet * s34390 ("örtlich_bestimmter_Mensch") * s2245 ("ortsspezifisch") - s1299 ("länderspezifisch") - *s1439 ("osteuropäisch")* -> {polnisch, tschechisch, böhmisch, ...} - s1591 ("regionalspezifisch") - s2359 ("herkunftsspezifisch") * s44177 ("Land","Staat") -> {Afghanistan,Amerika,Agrarstaat, ..., Polen, ..., Zimbabwe} * s34416 ("Europäer") -> {Albaner,Brite,Detuscher,...,Zypriotin} + PoS : `$p=NE` will get tokens tagged as proper names * results will also include person- and organization-names... and probably a fair number of tagger errors too + DIY : use an explicit term-disjunction with "interesting" names, e.g. * `{Böhmen,...,Ungarn}` - Examples: + *"alle [Europäer] sind __"* * *very* rare -- statistical results (DiaCollo &c) will be *unreliable* + *"[Europäer] sind __"* * `"Europäer|gn-asi=1 sind=1 $p={ADJ,N}*=2" #sep` : [DDC/DTA](http://kaskade.dwds.de/dstar/dta/dstar.perl?q=%22Europ%C3%A4er%7Cgn-asi%3D1+sind%3D1+%24p%3D%7BADJ%2CN%7D*%3D2%22+%23sep&fmt=kwic&start=1&limit=10&ctx=8&debug=), [DiaCollo/DTA](https://kaskade.dwds.de/dstar/dta_beta/diacollo/?query=%22Europ%C3%A4er%7Cgn-asi+sind+%24p%3D%7BADJ%2CN%7D*%3D2%22+%23fmin+1&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) + _"obwohl [osteuropäisch]"_ + DDC: `"obwohl osteurpäisch|gn-asi" #sep` + too rare for DiaCollo to be of much help (f_dta=0, f_zeit=7) + fixed-window DiaCollo collocates of selected eastern European adjectives * [`{polnisch,ungarisch,böhmisch,schlesisch}`](https://kaskade.dwds.de/dstar/dta_beta/diacollo/?query=%7Bpolnisch%2Cungarisch%2Cb%C3%B6hmisch%2Cschlesisch%7D&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) * ... not very helpful for "stereotypes" as such **Q&A** - Q: _"Wie kann man Suchabfragen gut zusammenfassen?"_ + A: in DDC e.g. set-valued term queries (e.g. {Haus,Hof,Garten}) or generic "||" (=Boolean disjunction, e.g. ("Haus und Hof" || Garten)) + A: in DiaCollo "native" syntax with "|" (Haus|Hof|Garten) + Qback: what does it mean for you to "collect different queries (well)"? - I can take (say) 3 different queries and write them down on 1 piece of paper; then they're "collected", but I suspect that's not what you want. - Q: _"Wie kann man nach übergreifenden Stereotypen suchen?"_ A: I don't know; stereotypes are (imho) often couched in innocuous-seeming syntax; bald assertions such as "all X are Y" are in fact very rare - Q: _"Was bringt es, wenn man z.B. GermaNet Ländernamen + Verbform 'sein' sucht?"_ + A1: "GermaNet Ländername" (s65176) is a single synset with no hyponyms. A DDC search-term (Ländername|gn-asi) will find all tokens with literal lemma "Ländername" or "Landesname". Probably not what you want; you're probably more interested in s44177 ("Land","Staat") which actually has a bunch of GermaNet hyponyms. + A2: It doesn't get anything for me (=Bryan) personally. What it gets *you* depends on (a) what you want, (b) the specific query you pose ("+ Verbform 'sein'" is not valid DDC syntax; maybe you mean (`ist`) i.e. (`$l=ist`) i.e. (`$l=ist|Lemma`) i.e. (`$l=@{sein}`). If you want to restrict the hits for 'sein' to verb forms, use a `WITH` or `&=` query: (`sein &= $p=V*`). In this case, querying e.g. (`Ländername|gn-asi && sein`) would "get" you just what you asked for: all sentences containing at least one instance of one of the lemmata associated with GermaNet synset s65176 (="Ländername", "Landesname") and at least one instance associated with the lemma "sein" tagged as a (finite or infinitive, auxiliary or lexical) verb. - Q: _"Kann man übergreifend nach Länderbezeichnugn + Weiterem suchen lassen und das Ergebnis in einer Wolke darstellen lassen"?_ + A: probably yes, within limits + AQ: * what exactly does "Länderbezeichnung" mean here? - all instances of any country name? --> try (`s44177|gn-asi`) + requires "DDC" profile-type if used from DiaCollo - all instances of a specific (set of) country names? --> try `{Polen,Böhmen,Tschchien,...}` * what exactly do "übergreifend", "+ Weiterem" and "das Ergebnis" mean here? - trawl near neighbors for common collocates? --> try a simple DiaCollo query like [`{Polen,Ungarn,Böhmen,Schlesien}`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BPolen%2CUngarn%2CB%C3%B6hmen%2CSchlesien%7D&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) - trawl whole paragraphs for common collocates? --> try a DiaCollo TDF query like [`{Polen,Ungarn,Böhmen,Schlesien}`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BPolen%2CUngarn%2CB%C3%B6hmen%2CSchlesien%7D&_s=submit&date=&slice=100&score=ld&kbest=10&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - specific observable constructs with a single sentence or paragraph with "placeholders" for potential collocates? - use a DiaCollo DDC query with a placeholder subscript (`=2`) for the collocate positions you're interested in - **beware sparse data!** --- ## 5. Désirée Schneider - _Raumkonstruktion_ - _Wald_ - Topic: "sprachlicher Raumkonstruktion" - "Wald" - Goal: + _"Diachrone Erhebung von Kollokationsfeldern zum Wortfeld ('Frame') WALD"_ + _"Analyse diachron unterschiedlicher Konzepte anhand der erhobenen Wortfelder"_ - Remarks: **rejoice and be glad** + target term "Wald" is quite frequent * we can expect reliable results with compartively narrow epochs ("slices") + target phenomenon is easily identifiable --> we can use fairly simple queries - Ideas: + GermaNet * s43301 ("Wald","Waldgebiet") -> {Auenwald,...,Hain,...,Wäldchen} * s46042 ("Baum") -> {Acajubaum,...,Zimtbaum} - Examples + simple DiaCollo collocations for lemma-set `{Wald,Holz,Gehölz}` (DTA) [`(QUERY:{Wald,Holz,Geholz}, SLICE:50, PROFILE:collocations, GROUPBY:l)`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BWald%2CHolz%2CGeh%C3%B6lz%7D&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) * adjective collocates only: [`(..., GROUPBY:l,p=/^ADJ/)`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BWald%2CHolz%2CGeh%C3%B6lz%7D&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) * adjective colocates only, globally "best" 50: [`(..., GLOBAL:yes, KBEST:50)`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%7BWald%2CHolz%2CGeh%C3%B6lz%7D&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) + DDC-based adjective collcations using GermaNet s43301 ("Wald") [`QUERY:NEAR(Wald|gn-asi, $p=ADJ*=2, 4) #fmin 2`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=NEAR%28Wald%7Cgn-asi%2C%24p%3DADJ*%3D2%2C4%29+%23fmin+2&_s=submit&date=1600%3A1899&slice=50&score=ld&kbest=10&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - Q&A + Q: _"Wie kann man unterschiedliche Kollokationsfelder zu WALD vergleichend diachron darstellen"?_ A: I don't think I understand the question. * AQ: What are "unterschiedliche Kollokationsfelder zu WALD"? - what is a "Kollokationsfeld"? (-> maye the set of k-strongest collocates (per epoch) returned by a DiaCollo query?) - how do they differ, if they're all "Felder zu WALD"? + if you want to compare different WALD-lemmata (e.g. "Forst" vs. Gehölz"), you can use a DiaCollo `diff:` relation ... but beware that most of these lexemes are themselves *very infrequent* --> unreliable data + example [("Baum" vs. "Strauch")](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Baum&_s=submit&bquery=Strauch&date=1600%3A1899&slice=50&bdate=1600%3A1899&bslice=50&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) <!-- * Snarky Observation: I'm not a native German speaker, but I believe that "im diachronen Vergleich darstellen" and "vergleichend diachron darstellen" are semantically distinct, and that you meant the former when you wrote the latter. --> --- ## 6. Linnéa Behncke - _(fleichlose) Ernährung_ - Topic: "Ernährung / Ernährungsformen" - "fleischlose Ernährung" - Ideas: + GermaNet s39035 ("Nahrungsmittel","Viktualien") -> {Ambrosia,...,Essen,..,Verpflegung,Würze} - doesn't look very promising to me, unfortunately - Goal/Questions: + Q: "Welche semantischen Netze ergeben sich für Ernährung"? AQ: What is a "semantic net", and how do you hope to identify it (or expect the software to do so)? + Q: "Was unterscheidet sich besonders beim Thema 'fleischlose Ernährung'?" AQ: unterscheidet bzgl. wessen? Was ist die "Kontrollgruppe"? - Speculations & sandboxes: + have a look at DWDS Worptofil (e.g. [`Ernährung`](https://www.dwds.de/wp?q=Ernährung)) + DiaCollo/ibk_web_2016c [`"Ernährung" (slice=0, kbest=50)`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=Ern%C3%A4hrung&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=2&format=cloud&groupby=&eps=0) + [color=#0000ff] requires www.dwds.de credentials; register under https://www.dwds.de/profile/register + DiaCollo/ibk_web_2016c diff [`vegetarisch` vs. `vegan`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=vegetarisch&_s=submit&bquery=vegan&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=&eps=0) + DiaCollo/ibk_web_2016c diff [`{Fleisch,Fisch}` vs. `{Tofu,Soja}`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=%7BFleisch%2CFisch%7D&_s=submit&bquery=%7BTofu%2CSoja%7D&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=&eps=0) * [adj only](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=%7BFleisch%2CFisch%7D&_s=submit&bquery=%7BTofu%2CSoja%7D&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) * [adj only, similarities (diff=havg)](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=%7BFleisch%2CFisch%7D&_s=submit&bquery=%7BTofu%2CSoja%7D&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=havg&profile=diff-2&format=cloud&groupby=l%2Cp%3D%2F%5EADJ%2F&eps=0) + DiaCollo TDF (`Ernährung && {Sünde,Moral}`) * ibk_web_2016c - [TDF collocations](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - [diff vs. `Ernährung`](http://kaskade.dwds.de/dstar/ibk_web_2016c/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&bquery=Ern%C3%A4hrung&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-tdf&format=cloud&groupby=l&eps=0) * ZEIT - [TDF collocations](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - [diff vs. `Ernährung`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Ern%C3%A4hrung+%26%26+%7BS%C3%BCnde%2CMoral%7D&_s=submit&bquery=Ern%C3%A4hrung&date=&slice=0&bdate=&bslice=0&score=ld&kbest=50&diff=adiff&profile=diff-tdf&format=cloud&groupby=l&eps=0) + other ideas * expand target query sets (maybe use GermaNet or OpenThesaurus, if you can find a good synset) * consider collecting & manually categorizing a small sample --- ## 7. Julia Prez - _Sprachliche Gewissheiten_ - _BALKAN_ - Topic: "Sprachliche Gewissheiten" - "Balkan" + _"Es geht dabei um Grenzen und um Zuschreibungen und Wertungen"_ - Zuschreibungen + suche nach "Balkan" mit `groupby:l,p=/^VV/` --> verb collocates + manual ("close") inspection --> "Gewissheiten" oder nicht? - yup, sounds reasonable to me (Bryan) - Hyponyme: + diff (`diff:min`) *"LÄNDERNAME" vs "Balkan"* * **hint**: consider also `diff:havg` - see below + _Zweck: Zugehörigkeit zum Diskurs feststellen; Abgrenzung/Zuordnung Land<->Balkan_ * **hint**: also consider DDC back-end with GermaNet expansion, e.g. [`NEAR(Balkan, s44177|gn-asi=2, 4)`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=near%28Balkan%2C+s44177%7Cgn-asi%3D2%2C+4%29&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=ddc&format=html&groupby=l&eps=0) * GermaNet [s44177 = "Land,Staat" = {`Afghanistan`, ..., `Zimbabwe`}](https://kaskade.dwds.de/dstar/zeit/dstar.perl?fmt=expand-html&q=s44177&x=gn-asi) * surprising(?) result: top collocate = `Afghanistan`: more "balkan" than anything on the Balkan peninsula! - Synonyme + _"Balkaninsel", "Balkanhalbinsel", "Balkan-Halbinsel"_ + _"wenn man jedoch nach diesen sucht_ **[wo/wie? -Bryan]**_, tauchen weit weniger Treffer auf, als die KWIC-Analyse suggeriert"_ - AQ: where & how are you "searching", and what do you mean by "die KWIC-Analyse"? - I suspect you may be noticing the inconsistency between DiaCollo's "native" indices and the DDC search approximations DiaCollo offers as hyperlinks labelled "KWIC" in the DiaCollo web interface * try raising the search window in the (generated) DDC search approximations, e.g. `NEAR(Balkan, Serbien, 4)` -> `NEAR(Balkan, Serbien, 8)` - see also slide #59 (~84): *"Why don't the corpus KWIC links always return exactly f_{12} hits?"* - if you want/need exact results, use the DDC relation with the `#FMIN 1` operator - Grenzen + _Lexeme wie "westlich", Flüsse & Gebirge, Ländernamen, Sprachen, auch Verben wie "annektieren", "umfassen"_ + _"'Balkan' + besagtes Wort muss zusammen gesucht werden können"_ + ... kann es auch: TDF `&&`, DDC `NEAR(...)` or phrase query `"..."` * example (DiaCollo/DDC): [`Balkan && {Gebirge,Fluss}|gn-asi=2 #FMIN 2`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7BGebirge%2CFluss%7D%7Cgn-asi%3D2+%23fmin+2&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) - only NE-collocates: [`Balkan && {Gebirge,Fluss,westlich,annektieren,umfassen}|gn-asi && $p=NE=2 #FMIN 2`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7BGebirge%2CFluss%2Cwestlich%2Cannektieren%2Cumfassen%7D%7Cgn-asi+%26%26+%24p%3DNE%3D2++%23fmin+2&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=ddc&format=cloud&groupby=l&eps=0) * example (DiaCollo/TDF): [`Balkan && {westlich,umfassen,einschließen,annektieren}`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7Bwestlich%2Cumfassen%2Ceinschlie%C3%9Fen%2Cannektieren%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l&eps=0) - only NE-collocates: [`groupby:l,p=NE`](https://kaskade.dwds.de/dstar/zeit/diacollo/?query=Balkan+%26%26+%7Bwestlich%2Cumfassen%2Ceinschlie%C3%9Fen%2Cannektieren%7D&_s=submit&date=&slice=0&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&groupby=l%2Cp%3DNE&eps=0) - Q: _"Gemeinsamkeiten in der Kollokationsverhlaten von 2 unabhängigen Abfragen ... diff:min oder diff:avg"_ A: also consider "diff:havg" - less strict than "min" but more sensitive to non-uniformity than "avg" Goals & Questions: - Q: _"Wolken/Listen, die diachron zeigen, wie BALKAN (bzw. JUGOSLAWIEN, ALBANIEN usw.) sprachlich konstruiert wird (als Ausgangspunkt für die Analyse unterschiedlicher Konzepte (Gebiet, Gefahr, Krise, ...)"_ A: looks to me like you're well on your way to that already... - Q: _"Gleichzeitige Suche nach mehreren Wörtern, zu denen gemeinsam Kollokationen vorkommen"_ A: use TDF or DDC back-ends (see above) - Q: _"Möglichst viele Ergebnisse (keine Beschränkungen)"_ + AQ: what kind of "limitations" do you mean? * have you looked at the `kbest`, `cutoff`, and/or `global` parameters? - I suspect that you don't really want **all** potential collocates for every search term - if you really **do** want all potential collocates, you might consider sidestepping the DiaCollo layer and going straight for a DDC count()-query, e.g. [`count(Balkan && $p=ADJ*=2) #by[$l=2] #desc_count`](https://kaskade.dwds.de/dstar/zeit/dstar.perl?q=count%28Balkan+%26%26+%24p%3DADJ*%3D2%29+%23by%5B%24l%3D2%5D+%23desc_count&fmt=kwic&start=1&limit=10&ctx=8&debug=) * have you looked at the `#FMIN` operator for the DDC profiling relation? * have you considered indexing your own corpus with the `-use-all-the-data` option? * for multiple corpora, have you considered using a `list://` URL for the command-line `dcdb-query.perl` tool? <!-- + A(snarky): if you wanted only "viele Ergebnisse", you could search for a wildcard matching every token in the corpus ... that would produce "many results", but would probably not be very useful + A(snarkier): DWDS server resources are limited; I *cannot* in good conscience offer anyone "unlimited" access to those resources ... even many of the example queries I have sketched here are shamefully expensive. If you wish to search your own corpora with your own computational resources, you are free to install the DiaCollo source distribution from CPAN, index your corpus, provide your own search engine and query approximations, or configure your own DDC instance, and/or write specialized scripts to find the data relevant to your particular research question of the moment. --> --- ## 8. Bremen - _Autorität_ - Topic: "Autorität" (diachron) - Example: > *"Man glaubts nit/ wie viel ein gelehrter Mann an seiner Authoritet verlust hat/ wenn er nicht von höfflichen zierlichen Sitten ist"* [name=Otto, Melander: 1605] - "Autorität" ⇒ Herrschertugenden; sittliches Verhalten ↝ Legitimierung - _"... untersuchen, bis wann dieses Framing in Diskursen noch wirkmächtig war und wann es an Relevanz verlor."_ - Goals: - _"den Autoritätsbegriff in seinem diachronen Wandel zu beschreiben und zentrale Wendepunkte seiner Denotation und Konnotation zu erfassen."_ - **Remark**: close reading required, esp. for *connotation* - _"... besseres Verständnis für zeitgenössische Diskurse über Autorität zu erlangen, insbesondere ... "soften Autoritarismus"_ - **Remark**: sounds reasonable, but too far removed from observable (corpus) phenomena for me to be able to offer any practical suggestions... sorry. - _"Vom Workshop erhoffen wir uns die Möglichkeit, unser methodisches Wissen für sprachwissenschaftliches Arbeiten zu erweitern, um dieses effizient auf unser Projekt anwenden zu können._ - **Remark**: thanks... I hope that too! - Thoughts, Ideas, & Speculations (<Bryan) - in gernal a promising choice for computer-assisted research - focus = _Autorität_ is a single-token lemma - lexical ambiguity should not be a problem - quite frequent (f_dta=4k, f_zeit=12k, f_web2016c=19k) - "fiddly bits": historical variation (lat. _auctoritas_) - see e.g. [dta/lexdb `l regexp '(?i:auc?torit)'`](https://kaskade.dwds.de/dstar/dta/lexdb/view.perl?select=*&from=l&where=l+regexp+%27%28%3Fi%3Aauc%3Ftorit%29%27&groupby=&orderby=f+desc&offset=0&limit=100) - typically [Zipfian](https://en.wikipedia.org/wiki/Zipf%27s_law) distribution - lemma-type `Autorität` itself ↦ only ca. 66% of all (potentially relevant) tokens in DTA - CAB canonicalization errors, e.g. `Auctorität` (→ exlex entry created; should be fixed in corpus next week!) - foreign material e.g. _auctoritate, auctoritas, l'autorite, ..._ - compounds e.g. _Staatsautorität, Militärautorität, Zivilautorität, ..._ - adjectives e.g. _autoritättisch, autoritätslos, ..._ - ... most of these won't even make it past DiaCollo's compile-time frequency filters - Examples - DWDS Wortprofil [`Autorität`](https://www.dwds.de/wp?q=Autorität) - DTA SemCloud - [`Autorität`](https://kaskade.dwds.de/dstar/dta/semcloud/terms.perl?to=terms&q=Autorit%C3%A4t&k=50&_s=submit) - lemmata which tend to occur in similar distributional contexts (by page) - DDC/DTA - [`Autorität`](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=kwic&corpus=&limit=10&ctx=8&q=Autorit%C3%A4t+%23sep&_s=submit) - [Autorität=1 && 'Autorität@100'|sem=2 != Autorität` #ASC_DATE](https://kaskade.dwds.de/dstar/dta_beta/dstar.perl?fmt=html&corpus=&limit=10&ctx=8&q=Autorit%C3%A4t%3D1+%26%26+%27Autorit%C3%A4t%40100%27%7Csem%3D2+%21%3D+Autorit%C3%A4t+%23asc_date&_s=submit) - Time Series - [`Autorität`,+smoothed](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t+%23&grand=1) - [`Autorität`,-smoothed](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t&_s=submit&n=date%2Bclass&smooth=none&sl=1&w=0&wb=0&pr=0&xr=*%3A*&yr=0%3A*&psize=840%2C480&grand=1) - [`Autorität`,-outliers](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t&_s=submit&n=date%2Bclass&smooth=none&sl=1&w=25&wb=0&pr=0.05&xr=*%3A*&yr=0%3A*&psize=840%2C480&grand=1) - ... pretty sparse <1650 and also 1750-1800 - [`Autorität@50|sem`](https://kaskade.dwds.de/dstar/dta/dstar.perl?fmt=hist&pformat=svg&q=Autorit%C3%A4t%4050%7Csem&_s=submit&n=date%2Bclass&smooth=none&gr=1&sl=1&w=25&wb=0&pr=0.05&xr=*%3A*&yr=0%3A*&psize=840%2C480) - including distributionally similar lemmata alleviates sparsity problem... but at the cost of precision! - DiaCollo/DTA: "collocations" relation - [`Autorität`](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Autorit%C3%A4t&_s=submit&date=&slice=50&score=ld&kbest=10&cutoff=&profile=2&format=cloud&groupby=l&eps=0) - [`Autorität`,p=NN,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Autorit%C3%A4t&_s=submit&date=&slice=100&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3DNN&eps=0) - [`Autorität`,p=ADJA,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=Autorit%C3%A4t&_s=submit&date=&slice=100&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3DADJA&eps=0) - [`/(?i:auc?torit)/` "collocations",p=ADJA,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%2F%28%3Fi%3Aauc%3Ftorit%29%2F&_s=submit&date=&slice=100&score=ld&kbest=50&cutoff=&profile=2&format=cloud&global=1&groupby=l%2Cp%3DADJA&eps=0) - DiaCollo/DTA : TDF relation (→ document-wide search window) - [`/(?i:auc?torit)/` TDF,p=NN,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%2F%28%3Fi%3Aauc%3Ftorit%29%2F&_s=submit&date=*%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&global=1&groupby=l%2Cp%3DNN&eps=0) - [`/(?i:auc?torit)/` TDF,p=ADJA,+global](https://kaskade.dwds.de/dstar/dta/diacollo/?query=%2F%28%3Fi%3Aauc%3Ftorit%29%2F&_s=submit&date=*%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=tdf&format=cloud&global=1&groupby=l%2Cp%3DADJA&eps=0) - DiaCollo/DTA : DDC relation - [`NEAR(Autorität@50|sem, $p=NN=2, 4)`,+global](https://kaskade.dwds.de/dstar/dta_beta/diacollo/?query=NEAR%28Autorit%C3%A4t%4050%7Csem%2C+%24p%3DNN%3D2%2C+4%29&_s=submit&date=1500%3A1899&slice=50&score=ld&kbest=50&cutoff=&profile=ddc&format=cloud&global=1&groupby=l&eps=0) - noun collocations with either `Autorität` itself **or** distributionally similar lemmata (uses SemCloud) --- # See also - [Workshop Notes](https://hackmd.io/lxjF_oOFR5-oxGvtbPyS3Q) - kaskade.dwds.de/~jurish/diacollo/ <!-- Local Variables: mode: Markdown coding: utf-8 End: -->