# Information Science in Databases ###### tags: `Information Science` `Database` `System Design` `Software` Studies of how information is created, organized, managed, stored, retrieved, & used. ## Elasticsearch *Based on [Apache Lucene](https://lucene.apache.org/) (Open-Source Search Software)* An application for full-text searching that can be implemented in distributed databases. ![](https://hackmd.io/_uploads/BJGPFHLK1e.png =400x) ### Inverted Index > ![image](https://hackmd.io/_uploads/Sy1cVVpFJl.png =400x) > ![](https://hackmd.io/_uploads/ryqHOBIYJx.png =400x) > > (Source: [Everything You NEED to KNOW About Web Applications - ByteByteGo](https://www.youtube.com/watch?v=_higfXfhjdo)) ### tf-idf (term frequency–inverse document frequency) *The Core Concept of Elasticserach* :::info **在向量空間模型中的應用** 利用 tf-idf 權重內積於向量空間中判斷兩份檔案之間的相似性。 假設: (1) In a file, words matched `"AAA"`: $3/100\text{ words}$ (2) In a DB, files found with `"AAA"`: $1,000/100,000\text{ files}$ => ==$\text{tf-idf} = (\frac{3}{100}) \times log_{10}(\frac{100,000}{1,000}) = 0.03 \times 2 = 0.06$== ::: ## Neo4J *A Native Property Graph NoSQL DBMS for Storing Connection Relationships among Nodes* Use cases: search engine, recommandation system, social media platform, data analysis, knowledge graph for machine learning, ... ![image](https://hackmd.io/_uploads/SyKJVE6YJl.png =400x) [Neo4j in 100 Seconds - Fireship](https://www.youtube.com/watch?v=T6L9EoBy8Zk) ### Property Graph Model > ![image](https://hackmd.io/_uploads/B1CWj46t1l.png =400x) > > (Source: [Introduction to Neo4j - Aakash Sorathiya](https://medium.com/techpanel/introduction-to-neo4j-84bd1ccfb37e)) - **Nodes:** - **Labels:** can represent roles, classes, metadata, ... - **Properties:** store key-value pair data. - **Relationships (Edges):** semantically-relevant connections that have direction (from-to), type, name, properties, ...