# Elastic search
## Elastic search instalation and quering
- Elastic stack:
- Kibana: Visualize (visualize & manage)
- Elastic search: search engine (store, search, analyze)
- Beats: lightweight data shipper (ingest)
- Logstash: Allows modifying documents before sending them (ingest)
- Benefits of elastic search:
- Scalable
- realtime
- High available and fault tolerant
- ...
- Installing elastic search (running on their service)
- go to cloud.elastic.co and sing up
- deployments page, shows you all deployments
- Create a new deployment
- Name your deployment
- select a cloud platform (aws)
- optimize your deployment
- customize deployment is necesarry because default deployment charges arround $350
- Kibana is included
- You could also install it on local, download from webpage, unzip, run elasticsearch (./bin/elasticsearch) will run on port 9200
- download kibana the same way, will run on port 5601
- Devtools
- Inside kibana you can manage the endpoint through devtools
- rest point: Each indexed documents use a single rest point with its name, and you can do a number of actions:
- post or put to index a new json doc `POST /index/type` (allows only 1 type per index _doc)
- PUT requires to espicify the id of the document `PUT /index/_doc/1234`
- if you use the same id, it will increase the version of the doc, not create new ones
- The index in our case would be actions, or any name wich describe the docs that will be saved on the rest point.
- `DELETE /index` will delete the endpoint
- If you want to create the index ahead of time:
```
PUT /nameOfNewIndex
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
}
}
```
- Bulk operations: `POST /indexName/_bulk` to add multiple of documents in a single POST
```
{"index":{"_id":1}}
{"key":"all the doc keys in json format"}
{"index":{"_id":2}}
{"key":"Second doc keys in json format"}
```
- Searches
- action: `GET /indexName/_search` will bring all documents on the index endpoint
- Example:
```
GET /indexName/_search
{
"query":{
"match": {
"business_name": "soup"
}
}
}
```
- Inside the json query you can use operators: match, range, must_not, match_phrase, bool: {must: [match: {...},match_phrase: {...}]} (combine queries)
- `boost` parameter inside query allows you to give more relevance to results which match that query
- `highlight` allows you to highlight a field of the query
- `sort` allows you to sort results
- You can aLso query using sql
```
POST _sql?format=txt
{
"query":"SELECT business_name, inspection_score FROM inspections ORDER BY inspection_score DESC LIMIT 5"
}
```
- Other way you can use sql is from cli form the bin directory
- Mapping: shows the types of the attributes (like a schema). Ypu can manually set the type.
- `GET /indexName/_mapping/`
- Update a doc
```
POST /indexName/_update/idOfDoc
{
"doc": {
"flagged": true,
"views": 0
}
}
```
- Delete a doc
`DELETE /indexName/_doc/idOfDocToDelete`
- Tokenization breaks sentences into discrete tokens for optimizing queries
```
GET /INDEXNAME/_analize
{
"tokenizer": "standard",
"filter": ["lowercase","unique"]
"text": "my email address test123@company.com"
}
```
## Getting started with kibana
- Kibana is the "window" to the elastic stack
- Visualize & explore
- Inside kibana, go to explore on the menu on the left
- You can change the time scope on the right of the search
- kql: Kibana query language (default in search bar)
- You can also make text search to filter your documents
- From the documents, you can choose an atribute to include it on the search
- In the left hand of the search you can select fields that are most relevant to your query
- You can save the queries for later use, you can add it to the dashboard or export it as a csv
- Dashboard
- Costumizables inputs controls allows you to curate the experience of your dashboard. Applying this changes will update all your dashboard
- Interaction with the dashboards changes the query and allows you to "zoom in" on data
- Canvas: Allows you to present the results that were obtain analyzing the dashboard
- Elastic maps: Allows you to visualize date on maps
- Kibana allows you to organize your dashboards and other saved objects into meaningful categories called spaces (teams, application, environment and more)
- To change them just click on the default space and click manage spaces. Which allows you to create anew space from 0
- You can define roles to access different spaces
- For loading data into your elasticsearch cluster there are a number of step by step guides to start streaming data (modules) they come withdashboards, visualitations and more. Once you create a new space you can load the data
- You can upload data too as cvs or logs and other formats
- index pattern: identifies one or more elastic search indices that you want to explore in kibana.
- Kibana uses index patterns to retrieve data from elasticsearch indices for things like visualizations
- Use file formater to format dates and attributes, and allows you to add even new attr. Once you have the index patterns you can go and explore your data
- Go to dashboard and click new dashboard
- There are two ways to create visualizations: you can go directly to the visualize aplication in the left hand navigation. this visualizations will be available for any dashbord inside this space 2) Add a new visualization from the dashboard itself (click add in the dashboard and the add new visualization)
## elastic search basic concepts
- Cluster: Collection of nodes
- Node: Machine where shard are hosted. If there are more nodes available, the shards will automatically distribute throught them. This allows scailability
- Shard: How indexes are divided logically to make the searchable. Can be hosted in one or more machines
- Replicas: Shards hace replicas which contains the info of shards. SO if a shard is down, the replica will answer
- Index: Collection of Json documents
- Documents: Json object
- Mapping: Allows to define types of the attributes of an index
## logstash
- Is the streaming ETL (extract, transform, load) engine which provides centrilize data collection, proccesing and enrichment data on the fly.
- Data source agnostic
- Plugin ecosystem of 200+ integrations & processors
- data processing of logstash:
- Inputs: Beats, tcp, udp, http...
- Filters: Structure, Transform, normalize, GeoIP enrichment, external Lookup Enrichment, ...
- Outputs: You can send it to elasticsearch or other forms like TCP, UDO HTTP,... with uotput plugins
- Resilent data transport: At least once delivery guarantees and adaptive buffering with persistent queues. Dead letter queue for offline instrospection and replay
- Pipleline dynamics:
- Direct dataflow traffic with conditionals & multiple pipelines
- Secure transport with auth & wire encription
- Anatomy of logstash
- Minimal config
```
input {
beats {port => 5043}
}
filter {
mutate {lowercase => ["message"]}
}
output {
elasticsearch {}
}
```
- Events: Primary unit of data in logstash. They are documents, very similar to a JSON documents, with arbitrary hierarchies and types supported
- Execution model
- Pipeline: Logical flow of data. Take data through inputs, pass them through to a queue and the hand them off to workers to process (mutate and send) We can have many workers
- A logstash instance corresponds to a single logstash process, and usually one box
- Queuing and delivery guarantees
- Default: in memery queue. But if you kill the process you delete data not sent
- Persistent Queue: Stores in disk, so no loose of data
- Guarantees: at least once
- The dead letter queue: An event that is unprocessables are undelivered. Are save in the dead letter queue DLQ that you can repeat later.
-