# Information Science in Databases
###### tags: `Information Science` `Database` `System Design` `Software`
Studies of how information is created, organized, managed, stored, retrieved, & used.
## Elasticsearch
*Based on [Apache Lucene](https://lucene.apache.org/) (Open-Source Search Software)*
An application for full-text searching that can be implemented in distributed databases.

### Inverted Index
> 
> 
>
> (Source: [Everything You NEED to KNOW About Web Applications - ByteByteGo](https://www.youtube.com/watch?v=_higfXfhjdo))
### tf-idf (term frequency–inverse document frequency)
*The Core Concept of Elasticserach*
:::info
**在向量空間模型中的應用**
利用 tf-idf 權重內積於向量空間中判斷兩份檔案之間的相似性。
假設:
(1) In a file, words matched `"AAA"`: $3/100\text{ words}$
(2) In a DB, files found with `"AAA"`: $1,000/100,000\text{ files}$
=> ==$\text{tf-idf} = (\frac{3}{100}) \times log_{10}(\frac{100,000}{1,000}) = 0.03 \times 2 = 0.06$==
:::
## Neo4J
*A Native Property Graph NoSQL DBMS for Storing Connection Relationships among Nodes*
Use cases: search engine, recommandation system, social media platform, data analysis, knowledge graph for machine learning, ...

[Neo4j in 100 Seconds - Fireship](https://www.youtube.com/watch?v=T6L9EoBy8Zk)
### Property Graph Model
> 
>
> (Source: [Introduction to Neo4j - Aakash Sorathiya](https://medium.com/techpanel/introduction-to-neo4j-84bd1ccfb37e))
- **Nodes:**
- **Labels:** can represent roles, classes, metadata, ...
- **Properties:** store key-value pair data.
- **Relationships (Edges):** semantically-relevant connections that have direction (from-to), type, name, properties, ...