or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Würzburg Workshop 2019-07-06
General Questions
General Remarks
¬∃x(
size
(x) & ∀yfit
(x,y) )#RANDOM #LIMIT[100] #IN p
for a sample of 100 paragraphs#RANDOM #LIMIT[100] #CNTXT 1
for a sample of 100 sentences with 1 sentence of context (right+left)accidents will happen
sparse data, sample size, & statistical reliability
slice
parameter, or setting it to0
(zero) for a corpus-global synchronic profileDDC or LexDB if you're concerned
DWDS corpus annotations
what we've got
https://kaskade.dwds.de/dstar/
CORPUS/details.perl
dta
,zeit
), collection description)#IN file
#IN p
#IN s
(default)$Token,$w
)$Utf8,$u
) and ($CanonicalToken,$v
) as well$Lemma,$l
: default for bareword queries)$Pos,$p
) - typically STTSHause -> $Lemma=Hause|Lemma -> $Lemma=@{Haus}
Hause|eqlemma -> $Lemma=Hause|eqlemma -> $Lemma=@{Haus,Hause,Häuser,Häusern,Hauses}
Haus|gn-syn -> $Lemma=Haus|gn-syn -> $Lemma=@{Sternzeichen,...,Dynastie,...,Haus}
what we do not have
collocations
,term-document matrix
) do not include:$w
,$u
,$v
)ART,APPR,KON,
…)semasiology vs. onmasiology
DiaCollo & other corpus-linguistic software tools are basically semasiological (=word-primary) methods
onomasiological tools (=concept-primary) are worth pursuing… but don't hold your breath (im(ns)ho, it ain't going to happen any time soon)
Other (potentially useful) tools
DWDS Wortprofil
has-adjective-attribute
,has-dative-object
, etc.)DDC
(A B)
=("A B")
: phrase query(A B)
=(A && B)
: Boolean conjunction (within a single sentence) - computationally expensive and probably not what you want#WITHIN
=1
,=2
)dstar/hist
dstar/LexDB
dta SemCloud
Thesauri
Tips & Caveats
Haus|gn-syn -> {Sternzeichen,...,Dynastie,...,Haus}
Öffentliches_Recht
) will find no hits"Öffentliches Recht"
but:{...}
(only disjunctions of atomic values for the given token attribute)haben|gn-sub1 -> ({aufbewahren,...,zusammenhaben} || "übrig haben"
)GermaNet
OpenThesaurus
DiaCollo
DiaCollo Tips
GROUPBY:l
to aggregate candidate collocates by lemma only (disregarding PoS){Polen,Ungarn,Böhmen,Schlesien}
$p
(PoS) and phrase-queries to approximate syntactic constraints"Angst vor #2 $p=NN=2" #fmin 1
"Bedrohung #4 {weil,deshalb,deswegen,denn} #4 $p=NN=2" #fmin 1
$p
(PoS) andNEAR()
queries for finer-grained proximity queriesBalkan && {westlich,umfassen,einschließen,annektieren}
NEAR(Wald|gn-asi, $p=ADJ*=2, 4) #fmin 2
{...}
) e.g.{unmännlich,Unzucht,widernatürlich}
{Fleisch,Fisch}
vs.{Tofu,Soja}
|gn-asi
,|ot-asi
) with an appropriate synsetNEAR(Balkan, s44177|gn-asi=2, 4)
|sem
)NEAR(Autorität@50|sem, $p=NN=2, 4)
,+globalSee also