# Elasticsearch and Kibana
## About Elasticsearch
**Elasticsearch** is an open source **search and analytics engine** for all types of data (textual, numerical, geospatial, structured and unstructured).
It is part of the ELK stack, which consists of:
* Elasticsearch
* Logstash
* Kibana
Elasticsearch can be used for a number of cases, including:
* Application search
* Website search
* Enterprise search
* Logging and log analytics
* Infrastructure metrics and container monitoring
* Application performance monitoring
* Geospatial data analysis and visualization
* Security analytics
* Business analytics
* Machine Learning
It works through the process of ingesting raw data, which is then parsed, normalized and enriched before being indexed in Elasticsearch. Once indexed, queries can be run against the data to retrieve complex summaries of the data.
### Topics
#### Nodes
Each instance of Elasticsearch is a Node and a collection of nodes is called Cluster. Every Node can handle both HTTP and Transport layer (4th layer in the osi model) traffic. The HTTP traffic is used by REST clients while the Transport layer is used exclusively for node-to-node communication.
There are 5 node roles:
* **master-eligible:** Can vote on the Master Node which controls the cluster [Discovery and cluster formation](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html)
* **data:** Holds data and performs data related operations. (ex: CRUD)
* **ingest:** Can apply ingest pipelines to a document to transform and enrich the document before indexing.
* **ml (machine learning):** To be able to use machine learning features, at least one node needs to be of this type
* **remote-eligible node:** Is eligible to act as a cross-cluster client
By default, all nodes are of all types above, but it's advised to separate the nodes by funcionality, especially as the cluster grows. Using this roles, we can achieve different kinds of nodes:
* **Voting-only master-eligible node:** Master-eligible actually means that the node participates in the voting, not that it can be master. By adding "voting_only" to a master-eligible node, it becomes this type.
* **Coordinating only node:** If we take all the roles from a Node, we are left with a Coordinating only node that can only route requests, handle the search reduce phase, and distribute bulk indexing. This nodes are like smart load balancers. This Nodes can benefit large clusters by offloading the coordinating tasks from other Nodes.
* **Transform node:** Transform nodes run transforms and handle transform API requests.
##### How to add a Node to an existing Cluster
1. Set up a new Elasticsearch instance.
2. Specify the name of the cluster with the *cluster.name* setting in elasticsearch.yml. For example, to add a node to the logging-prod cluster, add the line cluster.name: "logging-prod" to elasticsearch.yml.
3. Start Elasticsearch. The node automatically discovers and joins the specified cluster.
#### Cluster
A cluster is collection of interconnected nodes. If you have only one node of Elasticsearch then you have a cluster with only one node.
One functionality that cluster offers is **Cross-cluster search** against one or more remote [remote clusters ](https://https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-remote-clusters.html).
Cross-cluster examples/use cases:
* Remote cluster setup
* Single remote cluster
* Multiple remote cluster
* Cross-cluster search in proxy mode
* Skip unavailable clusters
When you hava multiple remote cluster for example one Portugal, Japan and Mexico and an user makes a search **Elasticsearch** handles network delays with two methods:
* Minimize network roundtrips
* Don't minimize network roundtrips
The next figures show us the minimized network with and without roundtrips.
Minimized roundtrips:

Without minimized roudtrips:

#### Index
An **index** in Elasticsearch **is a collection of documents** that are related to eachother.
**Data is stored as JSON documents**, and each document **correlates a set of keys** (names of fields or properties) **with their corresponding values** (strings, numbers, Booleans, dates, arrays of values, geolocations, etc.)
The data structure in Elasticsearch is called an **inverted index**, that is **designed to allow very fast full-text searches**. It lists every unique word that appears in any document, and indentifies all of the documents in which the word occurs.
During the indexing process, documents are stored and an inverted index is built to make document data searchable in near real-time.
Indexing is initiated with the index API through which you can update a JSON document in a specific index.

### Document
A document is a basic unit of information that can be indexed. Documents are JSON objects that are stored within an Elasticsearch Index and are considered the base unit of storage. Every document is associated with a unique identifier called the UID. Documents can be compared to a row in table in a relational database.
As an exemple let's say we have an online store. There could be one document per product or one document per order. There is no limit to the amount of documents that we can store in a particular index.
Data in documents is defined with fields comprised of keys and values. A key is the name of the field, and a value can be an item of many different types such as a string, a number, a boolean expression, another object, or an array of values.
An example of a document:
```json
{
"_id": 5,
“_type”: [“your index type”],
“_index”: [“your index name”],
"_source":{
"age": 20,
"name": ["alexandre”],
"year":2000,
}
}
```
#### Index Modules and settings
Index Modules are modules created per index and control all aspects related to an index.
The main index settings are either static or dynamic, although, there are many other settings that can be assigned to your index. A full list of the settings you can apply to your index can be found here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
#### Shards
Data in Elasticsearch is organized into [indices](#Index) that can grow to massive proportions. In order to keep it manageable, it is split into one or more shards. Each Elasticsearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the Elasticsearch index, which you can think of as a self-contained search engine that indexes and handles queries for a subset of the data in an Elasticsearch cluster. Splitting indices in this way keeps resource usage under control.
When an index is created, the number of shards is set and this cannot be changed later without reindexing the data. When creating an index, you can set the number of shards and replicas as properties of the index:
`PUT /sensor`
```json
{
"settings" : {
"index" : {
"number_of_shards" : 6,
"number_of_replicas" : 2
}
}
}
```
The ideal number of shards should be determined based on the amount of data in an index. In order to view all shards, their states, and other metadata, use the following request:
`GET _cat/shards`
To view shards for a specific index, append the name of the index to the URL, for example:
`GET _cat/shards/sensor`
This command produces output, such as in the following example. By default, the columns shown include the name of the index, the name (i.e. number) of the shard, whether it is a primary shard or a replica, its state, the number of documents, the size on disk, the IP address, and the node ID.
```
sensor 5 p STARTED 0 283b 127.0.0.1 ziap
sensor 5 r UNASSIGNED
sensor 2 p STARTED 1 3.7kb 127.0.0.1 ziap
sensor 2 r UNASSIGNED
sensor 3 p STARTED 3 7.2kb 127.0.0.1 ziap
sensor 3 r UNASSIGNED
sensor 1 p STARTED 1 3.7kb 127.0.0.1 ziap
sensor 1 r UNASSIGNED
sensor 4 p STARTED 2 3.8kb 127.0.0.1 ziap
sensor 4 r UNASSIGNED
sensor 0 p STARTED 0 283b 127.0.0.1 ziap
sensor 0 r UNASSIGNED
```
#### Replicas
The data within an index is divided and distributed among the data nodes in your cluster, these partitions are called shards, where within the shards, we have the documents and metadata for the index.
But what happens if we lose a shard? Are we unable to access the data then?
That's why we have the replicas! The replicas have the same role as the original shards and carry the same data, and serve not only as a contingency but also assist in performance when searching for the data.

Let's assume that NODE 1, which loads data into shard P1, is overloaded with aggregations and some queries, instead of the cluster fetching data from it, it will look at NODE 2 which is more relieved and has the same data in its R1 replica. .
By default, when an index is created, Elasticsearch defines that it will have a configuration similar to the one below:
```
GET meuteste/_settings
{
"meuteste" : {
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "7020099"
},
"provided_name" : "meuteste"
}
}
}
}
```
That is, we will have 5 shards that are what we call “primary” and for each primary shard we will have a replica, which means that we will have 10 shards in total in our cluster.
The following figure shows the shards( P ) and their replicas( R )
* P0 → R0
* P1 → R1
* P2 → R2
* P3 → R3
* P4 → R4
In conclusion:
-The replicas are copies of the shards;
-Replicas are never allocated on the same node as the main shard;
-The replicas work as a "backup" in case any shard / node fails.
-The replicas, unlike shards, are dynamic, that is, after the creation of an index it is possible to change the number of replicas.
## About Kibana
Kibana is an open source frontend application that sits on top of the Elastic Stack, providing search and data virtualization capabilities for data index in Elasticsearch. Kibana can also act as the user interface for monitoring, managing, and securing an Elastic Stack cluster.
Kibana main uses are:
1. Searching, viewing and visualizing data indexed in Elasticsearch and analyzing the data through the creating of all kind of charts.
2. Monitoring, managing and securing an Elastic Stack instance via web interface
3. Centralizing access for built-in solutions developed on the Elastic Stack for oberservability, security and enterprise search applications.
### Visualization of data
The visualization is worth a thousand logs lines, and Kibana provides many options for showcasing you data. Kibana offers the user the ability to build charts, tables, metrics and more.

Kibana also offers these visualization features:
* **Visualize** allows you to display your data in charts, graphs and tables. Visualise supports the ability to add interactive controls to your dashboard and your own images.
* **Canvas** is a data visualization and presentation tool that sits within Kibana. It allows you to create your work space background, boarders, colors, fonts and more.
* **Maps** enables you to see and analyse location-based data. Maps supports multiple layers and data sources, mapping of individual geo points and shapes.
* **TSVB** is a time series data visualizer that allows you to use the full power of Elasticsearch and aggreation framework. With TSVB, you can combine an infinite number of aggregations to display complex data.
### Security
Kibana provides field-level and document-level security, encryption, role-based access controls (RBAC), single sign-on (SSO), security APIs, and more. Custom security controls can be configured in Kibana.
## References
* https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
* https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cross-cluster-search.html
* https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
* https://opster.com/elasticsearch-glossary/elasticsearch-shards/
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
* https://medium.com/@fqueirooz80/elasticsearch-tudo-que-voc%C3%AA-precisa-saber-sobre-a-ferramenta-de-buscas-da-elastic-parte-5-73895e0e7e65
* https://www.elastic.co/what-is/kibana
* https://www.elastic.co/what-is/elasticsearch
<!--
About ElasticSearch José
Nodes Bernardo
Cluster Diogo Martins
Index José
Document Diogo Marques
Shards Canoso
Replicas Fábio
About Kibana José
Visualization of data Diogo Marques
Security José
--Demo--
Our apprach Canoso
Dockerfile Diogo Martins
Docker-compose Diogo Martins
script(preload) Diogo Martins
Kibana Bernardo
-->