Try   HackMD

Würzburg "Steckbriefe" 2019-07-06

1. Martina Hoffmann - Andersartigkeit - Homosexualität

  • Topic: "Andersartigkeit" - "Homosexualität"
  • Problem: lexical realization / paraphrases & (historical) code
    • neither GermaNet nor OpenThesaurus are helpful here
      • you could potentially edit OpenThesaurus yourself, but that's probably BTSOTS (="beyond the scope")
  • Remarks
    • basically an onomasiological approach
    • suggest refinement, (semasiological) approximation, or alternate method (e.g. sampling)
  • Ideas
  • Q&A
    • DiaCollo visualization for "external" data (e.g. from multiple queries)
      • not (yet) implemented; requires a bit of JavaScript hacking

2. Džana Fajić - Bedrohung - Terrorismus

  • Topic: "Bedrohung" - "Terrorismus"

  • Remarks:

  • Q&A

    • "große Wortwolken über mehrere Korpora"
      • see slide #79 (~54) "Fiendishly Awkward Questions / Can I use DiaCollo to directly compare different corpora?"
    • "der Kollokationsrahmen muss für Argumentation sehr groß sein. Wie geht das?"
      • I don't understand this question - maybe you should look at:
        • DiaCollo TDF relation (aka "term-document matrix": co-occurrence in a single "document" (=paragraph))
        • DDC/TDF Boolean conjunction operator (&&)
        • DDC #IN p and/or #IN file operator?
        • DDC NEAR(A,B,n) and NEAR(A,B,C,n) limit parameter n (max=32 tokens)
    • Q: "Wie lässt sich DiaCollo für diskurslinguistische Argumentationsanalysen nutzen?"
      A: I don't know what are the characteristic (formal and/or observable) properties of a "diskurslinguistische Argumentationsanalyse"?
    • Q: "Wie kann man formale (z.B. Konnektoren) und inhaltliche (z.B. Lexeme) für Argumentationen erheben?"
      A: I don't think you can "collect" them (as in "Daten erheben") as a matter of general principle
      • lexemes (whether lexical or functional) are abstract linguistic objects, and as such exist in our heads and maybe "in the (Platonic/rationalist) language-itself"
      • lexical realizations (tokens; word use) are items in the corpus; see competence/performance distinction
      • maybe I don't understand what "erheben" is supposed to mean here; some remarks regarding your specific examples:
        • "diffus": maybe try GermaNet synset #4081 ("ungenau"), e.g. this DDC query
        • "Kollektivierung": your best bet is to code this yourself; maybe try OpenThesaurus synsets weltweit and überall: e.g. Bedrohung=1 && {weltweit,überall}|ot-asi
        • "Superlative": DWDS corpora don't offer morphological annotations; your best bet is to use a suffix-list or regex, e.g. $p=ADJA &= /ste?[ernm]$/
        • "Argumentationen":
          • DWDS corpora don't offer semantic annotations
          • you can use the $p (part-of-speech) field to search for STTS tags (but the tagger makes errors too)
          • try e.g. DDC $p={K,PAV}* to find conjunctions & pro-adverbs:
          • or use a set-valued term-query, e.g. {weil,deshalb,deswegen,obwohl,dennoch}
  • Example:

    • if I've understood correctly, you might try:
      • use a large co-occurrence window around your target ("Bedrohung") (e.g. DDC #in p -> paragraphs)
        • DiaCollo TDF won't work for the DWDS corpora here > no function words in TDF index!
      • assume presence of a conjunction ($p=KON) indicates an "argumentative" context
      • assume everything else in a matching document is a potential "argumentative" collocate
    • in the ZEIT corpus, this might get us something like this DiaCollo query
      • QUERY: near(Bedrohung=1,$p=KON=1,$p=ADJ*=2,4) #FMIN 2
        • 3-argument "NEAR()", at most 4 tokens between
        • collocant condition "Bedrohung" (lemma)
        • collocant condition $p=KON (part-of-speech)
        • collocate placeholder $p=ADJ*=2 (any adjective)
      • PROFILE: DDC
      • GROUPBY: l
    • you could also search for e.g. noun collocates and selected (causal) connectors, e.g
    • alternatively, you could require "Bedrohung" to precede the connector:
    • Limitations
      • ddc NEAR() operator only supports up to 3 arguments
        • and we need to save 1 argument position for the collocate (=2)
      • this is a VERY expensive query: please use these techniques with caution & consideration!
        • raising the NEAR() limit (above -> 4 tokens) raises complexity exponentially
        • expect timeouts and/or std::bad_alloc errors from stuff like this

3. Nadine Löhle - Angst (indivduell vs. kollektiv)

  • Topic: "Angst" (individuell vs. kollektiv)

  • Ziel: "diachrone Erhebung des Schlagwortes 'Angst' in seinen wortfeldhaften und argumentationsbezogenen Kontexten"

    • ?: what is a "wortfeldhafter" and/or "argumentationsbezogener Kontext"?
    • it sounds to me like this is an onomasiological question - neither I as a CL programmer nor DiaCollo can help with that as such
  • Examples


4. Nadine Kastenhofer - Länderstereotypen - Osteuropa

  • Topic: "Ländersterotypen" - "osteuropäische Länder"

  • Goals:

    • "übergreifende Suchanfragen, die Kollokationen zu Gruppen von Lexemen anhand von abstrahierten Formulierungen zeigen"
    • "weitere Suchanfragen, mit denen man Stereotype finden kann"
  • Ideas:

    • GermaNet
      • s34390 ("örtlich_bestimmter_Mensch")
      • s2245 ("ortsspezifisch")
        • s1299 ("länderspezifisch")
        • s1439 ("osteuropäisch") -> {polnisch, tschechisch, böhmisch, }
        • s1591 ("regionalspezifisch")
        • s2359 ("herkunftsspezifisch")
      • s44177 ("Land","Staat") -> {Afghanistan,Amerika,Agrarstaat, , Polen, , Zimbabwe}
      • s34416 ("Europäer") -> {Albaner,Brite,Detuscher,,Zypriotin}
    • PoS : $p=NE will get tokens tagged as proper names
      • results will also include person- and organization-names and probably a fair number of tagger errors too
    • DIY : use an explicit term-disjunction with "interesting" names, e.g.
      • {Böhmen,...,Ungarn}
  • Examples:

    • "alle [Europäer] sind __"
      • very rare statistical results (DiaCollo &c) will be unreliable
    • "[Europäer] sind __"
    • "obwohl [osteuropäisch]"
      • DDC: "obwohl osteurpäisch|gn-asi" #sep
      • too rare for DiaCollo to be of much help (f_dta=0, f_zeit=7)
    • fixed-window DiaCollo collocates of selected eastern European adjectives

Q&A

  • Q: "Wie kann man Suchabfragen gut zusammenfassen?"

    • A: in DDC e.g. set-valued term queries (e.g. {Haus,Hof,Garten}) or generic "" (=Boolean disjunction, e.g. ("Haus und Hof" Garten))
    • A: in DiaCollo "native" syntax with "|" (Haus|Hof|Garten)
    • Qback: what does it mean for you to "collect different queries (well)"?
      • I can take (say) 3 different queries and write them down on 1 piece of paper; then they're "collected", but I suspect that's not what you want.
  • Q: "Wie kann man nach übergreifenden Stereotypen suchen?"
    A: I don't know; stereotypes are (imho) often couched in innocuous-seeming syntax; bald assertions such as "all X are Y" are in fact very rare

  • Q: "Was bringt es, wenn man z.B. GermaNet Ländernamen + Verbform 'sein' sucht?"

    • A1: "GermaNet Ländername" (s65176) is a single synset with no hyponyms. A DDC search-term (Ländername|gn-asi) will find all tokens with literal lemma "Ländername" or "Landesname". Probably not what you want; you're probably more interested in s44177 ("Land","Staat") which actually has a bunch of GermaNet hyponyms.
    • A2: It doesn't get anything for me (=Bryan) personally. What it gets you depends on (a) what you want, (b) the specific query you pose ("+ Verbform 'sein'" is not valid DDC syntax; maybe you mean (ist) i.e. ($l=ist) i.e. ($l=ist|Lemma) i.e. ($l=@{sein}). If you want to restrict the hits for 'sein' to verb forms, use a WITH or &= query: (sein &= $p=V*). In this case, querying e.g. (Ländername|gn-asi && sein) would "get" you just what you asked for: all sentences containing at least one instance of one of the lemmata associated with GermaNet synset s65176 (="Ländername", "Landesname") and at least one instance associated with the lemma "sein" tagged as a (finite or infinitive, auxiliary or lexical) verb.
  • Q: "Kann man übergreifend nach Länderbezeichnugn + Weiterem suchen lassen und das Ergebnis in einer Wolke darstellen lassen"?

    • A: probably yes, within limits
    • AQ:
      • what exactly does "Länderbezeichnung" mean here?
        • all instances of any country name? > try (s44177|gn-asi)
      • requires "DDC" profile-type if used from DiaCollo
        • all instances of a specific (set of) country names? > try {Polen,Böhmen,Tschchien,...}
      • what exactly do "übergreifend", "+ Weiterem" and "das Ergebnis" mean here?
        • trawl near neighbors for common collocates? > try a simple DiaCollo query like {Polen,Ungarn,Böhmen,Schlesien}
        • trawl whole paragraphs for common collocates? > try a DiaCollo TDF query like {Polen,Ungarn,Böhmen,Schlesien}
        • specific observable constructs with a single sentence or paragraph with "placeholders" for potential collocates?
          • use a DiaCollo DDC query with a placeholder subscript (=2) for the collocate positions you're interested in
          • beware sparse data!

5. Désirée Schneider - Raumkonstruktion - Wald

  • Topic: "sprachlicher Raumkonstruktion" - "Wald"

  • Goal:

    • "Diachrone Erhebung von Kollokationsfeldern zum Wortfeld ('Frame') WALD"
    • "Analyse diachron unterschiedlicher Konzepte anhand der erhobenen Wortfelder"
  • Remarks: rejoice and be glad

    • target term "Wald" is quite frequent
      • we can expect reliable results with compartively narrow epochs ("slices")
    • target phenomenon is easily identifiable > we can use fairly simple queries
  • Ideas:

    • GermaNet
      • s43301 ("Wald","Waldgebiet") -> {Auenwald,,Hain,,Wäldchen}
      • s46042 ("Baum") -> {Acajubaum,,Zimtbaum}
  • Examples

  • Q&A

    • Q: "Wie kann man unterschiedliche Kollokationsfelder zu WALD vergleichend diachron darstellen"?
      A: I don't think I understand the question.
      • AQ: What are "unterschiedliche Kollokationsfelder zu WALD"?
        • what is a "Kollokationsfeld"? (-> maye the set of k-strongest collocates (per epoch) returned by a DiaCollo query?)
        • how do they differ, if they're all "Felder zu WALD"?
          • if you want to compare different WALD-lemmata (e.g. "Forst" vs. Gehölz"), you can use a DiaCollo diff: relation but beware that most of these lexemes are themselves very infrequent > unreliable data
          • example ("Baum" vs. "Strauch")

6. Linnéa Behncke - (fleichlose) Ernährung


7. Julia Prez - Sprachliche Gewissheiten - BALKAN

  • Topic: "Sprachliche Gewissheiten" - "Balkan"

    • "Es geht dabei um Grenzen und um Zuschreibungen und Wertungen"
  • Zuschreibungen

    • suche nach "Balkan" mit groupby:l,p=/^VV/ > verb collocates
    • manual ("close") inspection > "Gewissheiten" oder nicht?
      • yup, sounds reasonable to me (Bryan)
  • Hyponyme:

    • diff (diff:min) "LÄNDERNAME" vs "Balkan"
      • hint: consider also diff:havg - see below
    • Zweck: Zugehörigkeit zum Diskurs feststellen; Abgrenzung/Zuordnung Land<->Balkan
  • Synonyme

    • "Balkaninsel", "Balkanhalbinsel", "Balkan-Halbinsel"
    • "wenn man jedoch nach diesen sucht [wo/wie? -Bryan], tauchen weit weniger Treffer auf, als die KWIC-Analyse suggeriert"
      • AQ: where & how are you "searching", and what do you mean by "die KWIC-Analyse"?
      • I suspect you may be noticing the inconsistency between DiaCollo's "native" indices and the DDC search approximations DiaCollo offers as hyperlinks labelled "KWIC" in the DiaCollo web interface
        • try raising the search window in the (generated) DDC search approximations,
          e.g. NEAR(Balkan, Serbien, 4) -> NEAR(Balkan, Serbien, 8)
      • see also slide #59 (~84): "Why don't the corpus KWIC links always return exactly f_{12} hits?"
      • if you want/need exact results, use the DDC relation with the #FMIN 1 operator
  • Grenzen

  • Q: "Gemeinsamkeiten in der Kollokationsverhlaten von 2 unabhängigen Abfragen diff:min oder diff:avg"
    A: also consider "diff:havg" - less strict than "min" but more sensitive to non-uniformity than "avg"

Goals & Questions:

  • Q: "Wolken/Listen, die diachron zeigen, wie BALKAN (bzw. JUGOSLAWIEN, ALBANIEN usw.) sprachlich konstruiert wird (als Ausgangspunkt für die Analyse unterschiedlicher Konzepte (Gebiet, Gefahr, Krise, )"
    A: looks to me like you're well on your way to that already
  • Q: "Gleichzeitige Suche nach mehreren Wörtern, zu denen gemeinsam Kollokationen vorkommen"
    A: use TDF or DDC back-ends (see above)
  • Q: "Möglichst viele Ergebnisse (keine Beschränkungen)"
    • AQ: what kind of "limitations" do you mean?
      • have you looked at the kbest, cutoff, and/or global parameters?
        • I suspect that you don't really want all potential collocates for every search term
        • if you really do want all potential collocates, you might consider sidestepping the DiaCollo layer and going straight for a DDC count()-query, e.g. count(Balkan && $p=ADJ*=2) #by[$l=2] #desc_count
      • have you looked at the #FMIN operator for the DDC profiling relation?
      • have you considered indexing your own corpus with the -use-all-the-data option?
      • for multiple corpora, have you considered using a list:// URL for the command-line dcdb-query.perl tool?

8. Bremen - Autorität

  • Topic: "Autorität" (diachron)

  • Example:

    "Man glaubts nit/ wie viel ein gelehrter Mann an seiner Authoritet verlust hat/ wenn er nicht von höfflichen zierlichen Sitten ist" Otto, Melander: 1605

    • "Autorität" ⇒ Herrschertugenden; sittliches Verhalten ↝ Legitimierung
    • " untersuchen, bis wann dieses Framing in Diskursen noch wirkmächtig war und wann es an Relevanz verlor."
  • Goals:

    • "den Autoritätsbegriff in seinem diachronen Wandel zu beschreiben und zentrale Wendepunkte seiner Denotation und Konnotation zu erfassen."
      • Remark: close reading required, esp. for connotation
    • " besseres Verständnis für zeitgenössische Diskurse über Autorität zu erlangen, insbesondere "soften Autoritarismus"
      • Remark: sounds reasonable, but too far removed from observable (corpus) phenomena for me to be able to offer any practical suggestions sorry.
    • "Vom Workshop erhoffen wir uns die Möglichkeit, unser methodisches Wissen für sprachwissenschaftliches Arbeiten zu erweitern, um dieses effizient auf unser Projekt anwenden zu können.
      • Remark: thanks I hope that too!
  • Thoughts, Ideas, & Speculations (<Bryan)


See also