Elastic search

# Elastic search ## Elastic search instalation and quering - Elastic stack: - Kibana: Visualize (visualize & manage) - Elastic search: search engine (store, search, analyze) - Beats: lightweight data shipper (ingest) - Logstash: Allows modifying documents before sending them (ingest) - Benefits of elastic search: - Scalable - realtime - High available and fault tolerant - ... - Installing elastic search (running on their service) - go to cloud.elastic.co and sing up - deployments page, shows you all deployments - Create a new deployment - Name your deployment - select a cloud platform (aws) - optimize your deployment - customize deployment is necesarry because default deployment charges arround $350 - Kibana is included - You could also install it on local, download from webpage, unzip, run elasticsearch (./bin/elasticsearch) will run on port 9200 - download kibana the same way, will run on port 5601 - Devtools - Inside kibana you can manage the endpoint through devtools - rest point: Each indexed documents use a single rest point with its name, and you can do a number of actions: - post or put to index a new json doc `POST /index/type` (allows only 1 type per index _doc) - PUT requires to espicify the id of the document `PUT /index/_doc/1234` - if you use the same id, it will increase the version of the doc, not create new ones - The index in our case would be actions, or any name wich describe the docs that will be saved on the rest point. - `DELETE /index` will delete the endpoint - If you want to create the index ahead of time: ``` PUT /nameOfNewIndex { "settings": { "index.number_of_shards": 1, "index.number_of_replicas": 0 } } ``` - Bulk operations: `POST /indexName/_bulk` to add multiple of documents in a single POST ``` {"index":{"_id":1}} {"key":"all the doc keys in json format"} {"index":{"_id":2}} {"key":"Second doc keys in json format"} ``` - Searches - action: `GET /indexName/_search` will bring all documents on the index endpoint - Example: ``` GET /indexName/_search { "query":{ "match": { "business_name": "soup" } } } ``` - Inside the json query you can use operators: match, range, must_not, match_phrase, bool: {must: [match: {...},match_phrase: {...}]} (combine queries) - `boost` parameter inside query allows you to give more relevance to results which match that query - `highlight` allows you to highlight a field of the query - `sort` allows you to sort results - You can aLso query using sql ``` POST _sql?format=txt { "query":"SELECT business_name, inspection_score FROM inspections ORDER BY inspection_score DESC LIMIT 5" } ``` - Other way you can use sql is from cli form the bin directory - Mapping: shows the types of the attributes (like a schema). Ypu can manually set the type. - `GET /indexName/_mapping/` - Update a doc ``` POST /indexName/_update/idOfDoc { "doc": { "flagged": true, "views": 0 } } ``` - Delete a doc `DELETE /indexName/_doc/idOfDocToDelete` - Tokenization breaks sentences into discrete tokens for optimizing queries ``` GET /INDEXNAME/_analize { "tokenizer": "standard", "filter": ["lowercase","unique"] "text": "my email address test123@company.com" } ``` ## Getting started with kibana - Kibana is the "window" to the elastic stack - Visualize & explore - Inside kibana, go to explore on the menu on the left - You can change the time scope on the right of the search - kql: Kibana query language (default in search bar) - You can also make text search to filter your documents - From the documents, you can choose an atribute to include it on the search - In the left hand of the search you can select fields that are most relevant to your query - You can save the queries for later use, you can add it to the dashboard or export it as a csv - Dashboard - Costumizables inputs controls allows you to curate the experience of your dashboard. Applying this changes will update all your dashboard - Interaction with the dashboards changes the query and allows you to "zoom in" on data - Canvas: Allows you to present the results that were obtain analyzing the dashboard - Elastic maps: Allows you to visualize date on maps - Kibana allows you to organize your dashboards and other saved objects into meaningful categories called spaces (teams, application, environment and more) - To change them just click on the default space and click manage spaces. Which allows you to create anew space from 0 - You can define roles to access different spaces - For loading data into your elasticsearch cluster there are a number of step by step guides to start streaming data (modules) they come withdashboards, visualitations and more. Once you create a new space you can load the data - You can upload data too as cvs or logs and other formats - index pattern: identifies one or more elastic search indices that you want to explore in kibana. - Kibana uses index patterns to retrieve data from elasticsearch indices for things like visualizations - Use file formater to format dates and attributes, and allows you to add even new attr. Once you have the index patterns you can go and explore your data - Go to dashboard and click new dashboard - There are two ways to create visualizations: you can go directly to the visualize aplication in the left hand navigation. this visualizations will be available for any dashbord inside this space 2) Add a new visualization from the dashboard itself (click add in the dashboard and the add new visualization) ## elastic search basic concepts - Cluster: Collection of nodes - Node: Machine where shard are hosted. If there are more nodes available, the shards will automatically distribute throught them. This allows scailability - Shard: How indexes are divided logically to make the searchable. Can be hosted in one or more machines - Replicas: Shards hace replicas which contains the info of shards. SO if a shard is down, the replica will answer - Index: Collection of Json documents - Documents: Json object - Mapping: Allows to define types of the attributes of an index ## logstash - Is the streaming ETL (extract, transform, load) engine which provides centrilize data collection, proccesing and enrichment data on the fly. - Data source agnostic - Plugin ecosystem of 200+ integrations & processors - data processing of logstash: - Inputs: Beats, tcp, udp, http... - Filters: Structure, Transform, normalize, GeoIP enrichment, external Lookup Enrichment, ... - Outputs: You can send it to elasticsearch or other forms like TCP, UDO HTTP,... with uotput plugins - Resilent data transport: At least once delivery guarantees and adaptive buffering with persistent queues. Dead letter queue for offline instrospection and replay - Pipleline dynamics: - Direct dataflow traffic with conditionals & multiple pipelines - Secure transport with auth & wire encription - Anatomy of logstash - Minimal config ``` input { beats {port => 5043} } filter { mutate {lowercase => ["message"]} } output { elasticsearch {} } ``` - Events: Primary unit of data in logstash. They are documents, very similar to a JSON documents, with arbitrary hierarchies and types supported - Execution model - Pipeline: Logical flow of data. Take data through inputs, pass them through to a queue and the hand them off to workers to process (mutate and send) We can have many workers - A logstash instance corresponds to a single logstash process, and usually one box - Queuing and delivery guarantees - Default: in memery queue. But if you kill the process you delete data not sent - Persistent Queue: Stores in disk, so no loose of data - Guarantees: at least once - The dead letter queue: An event that is unprocessables are undelivered. Are save in the dead letter queue DLQ that you can repeat later. -