# Documentverwerking keywords
# Chapter 1: Document models
- XML
- DOM (Document object model)
- Schema languages
- DTD (Document Type Definition)
- XML Schema Definition (XSD)
- Relax NG
- Formal document models
- Regular tree grammar
# Chapter 2: Transformation and Search
- Document Search vs Information Retrieval
- Text search algo's
- Regexp (note: extra notation for lookaround)
- XPath
- // : short notation for "/descendant-or-self::node()" i.e. the node itself, or all nodes below it
- Tranformations
- HTML transforms (tag balance, css), LaTeX compilation
- XSLT (eXtensible Stylesheet Language Tranformations)
# Chapter 3: Digital typography
- Typeface, Font, Glyph, Ligatures
- Font taxonomy
- Legibility
- Kerning, spacing
- Line breaking
- TeX algorithm: box, glue, penalty, strechtability, shrikability
# Chapter 4: Page Description languages
- Vector image vs raster image
- Postscript, Postfix
- PDF (Portable Document Format)
- Transparancy, Blending
- SVG (Scalable Vector Graphics)
- aspect ratio, grouping, coordinate transformations, paths, clipping
# Chapter 5: Information retrieval
- Boolean retrieval
- Term-Document Incidence Matrix (not scalable)
- Inverted index
- Boolean Queries
- Inverted index construction
- Tokenization, stop words, token normalization, index compression
- Advanced index techniques
- Phrase queries
- Wildcard queries
- Vector space model
- Ranked retrieval
- Score computation
- Term Frequency, Inverse Docuemtn Frequency, TF-IDF weighting
- Web search
- Query characteristics
- User expectations
- Context
- Dynamic content
- Crawlers
- Link Analysis
- Web is Directed Graph
- Hyperlink is quality signal
- Random walks vs PageRank
- Recall, (pseudo) relevance feedback
- Rocchio, Thesaurus