# Documentverwerking keywords # Chapter 1: Document models - XML - DOM (Document object model) - Schema languages - DTD (Document Type Definition) - XML Schema Definition (XSD) - Relax NG - Formal document models - Regular tree grammar # Chapter 2: Transformation and Search - Document Search vs Information Retrieval - Text search algo's - Regexp (note: extra notation for lookaround) - XPath - // : short notation for "/descendant-or-self::node()" i.e. the node itself, or all nodes below it - Tranformations - HTML transforms (tag balance, css), LaTeX compilation - XSLT (eXtensible Stylesheet Language Tranformations) # Chapter 3: Digital typography - Typeface, Font, Glyph, Ligatures - Font taxonomy - Legibility - Kerning, spacing - Line breaking - TeX algorithm: box, glue, penalty, strechtability, shrikability # Chapter 4: Page Description languages - Vector image vs raster image - Postscript, Postfix - PDF (Portable Document Format) - Transparancy, Blending - SVG (Scalable Vector Graphics) - aspect ratio, grouping, coordinate transformations, paths, clipping # Chapter 5: Information retrieval - Boolean retrieval - Term-Document Incidence Matrix (not scalable) - Inverted index - Boolean Queries - Inverted index construction - Tokenization, stop words, token normalization, index compression - Advanced index techniques - Phrase queries - Wildcard queries - Vector space model - Ranked retrieval - Score computation - Term Frequency, Inverse Docuemtn Frequency, TF-IDF weighting - Web search - Query characteristics - User expectations - Context - Dynamic content - Crawlers - Link Analysis - Web is Directed Graph - Hyperlink is quality signal - Random walks vs PageRank - Recall, (pseudo) relevance feedback - Rocchio, Thesaurus