# Introduction Elasticsearch is a powerful distributed search and analytics engine, widely used for real-time indexing, search, and analysis of data. In this guide, we'll delve into various Elasticsearch operations, covering everything from creating and deleting indices to advanced topics like batch processing and optimistic concurrency control. Along the way, we'll provide detailed coding examples to help you grasp each concept effectively. --- ## Creating & Deleting Indices Creating an index in Elasticsearch is the first step towards storing your data. Indices are logical namespaces that map to physical data storage. Here's how you can create and delete indices using Elasticsearch's RESTful API: ```python # Creating an index PUT /shopping { "settings": { "number_of_shards": 1, "number_of_replicas": 0 } } # Output { "acknowledged": true, "shards_acknowledged": true, "index": "shopping" } # Deleting an index DELETE /shopping # Output { "acknowledged": true } ``` ## Indexing Documents Indexing documents involves adding structured JSON data to an index. Let's create an example document representing shopping details and index it: ```python # Indexing a document PUT /shopping/_doc/1 { "product": "Laptop", "price": 1000, "quantity": 5, "category": "Electronics" } # Output { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 } ``` ## Retrieving Documents by ID Retrieving documents by their unique identifier (ID) is a common operation in Elasticsearch: ```python # Retrieving a document by ID GET /shopping/_doc/1 # Output { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 1, "_seq_no": 0, "_primary_term": 1, "found": true, "_source": { "product": "Laptop", "price": 1000, "quantity": 5, "category": "Electronics" } } ``` ## Updating Documents You can update existing documents in Elasticsearch: ```python # Updating a document POST /shopping/_update/1 { "doc": { "price": 1200 } } # Output { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 } } ``` ## Scripted Updates Elasticsearch allows you to perform scripted updates using Painless scripting language: ```python # Scripted update POST /shopping/_update/1 { "script": { "source": "ctx._source.price += params.increment", "params": { "increment": 200 } } } # Output { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 3, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 } } ``` ## Upserts Upserts are a combination of "update" and "insert" operations. If a document exists, it will be updated; otherwise, a new document will be created: ```python # Upsert operation POST /shopping/_update/2 { "doc": { "product": "Smartphone", "price": 800, "quantity": 10, "category": "Electronics" }, "doc_as_upsert": true } # Output: { "_index": "shopping", "_type": "_doc", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 } } ``` ## Replacing Documents Replacing documents involves completely replacing an existing document with a new one: ```python # Replace operation PUT /shopping/_doc/1 { "product": "Tablet", "price": 500, "quantity": 3, "category": "Electronics" } # Output: { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 4, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 } } ``` ## Deleting Documents Deleting documents removes them from the index: ```python # Deleting a document DELETE /shopping/_doc/1 # Output: { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 5, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 } } ``` ## Understanding Routing Routing determines which shard a document will be stored in: ```python # Indexing with routing PUT /shopping/_doc/1?routing=user123 { "product": "Headphones", "price": 50, "quantity": 20, "category": "Electronics" } # Output: ```json { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 } ``` ### How Elasticsearch Reads Data * Elasticsearch reads data efficiently by utilizing inverted indexes and distributed search capabilities. ### How Elasticsearch Writes Data * Elasticsearch writes data by indexing documents into shards and replicas, ensuring fault tolerance and scalability. ### Understanding Document Versioning * Document versioning in Elasticsearch allows you to track changes to documents over time. ### Optimistic Concurrency Control * Optimistic concurrency control ensures data consistency by checking document versions before updates. ## Update by Query * Update by query allows you to perform bulk updates based on a query criteria: ```python # Update by query POST /shopping/_update_by_query { "script": { "source": "ctx._source.price += params.increment", "params": { "increment": 50 } }, "query": { "match": { "category": "Electronics" } } } # Output: { "took": 16, "timed_out": false, "total": 1, "updated": 1, "deleted": 0, "batches": 1, "version_conflicts": 0, "noops": 0, "retries": { "bulk": 0, "search": 0 }, "throttled_millis": 0, "requests_per_second": -1, "throttled_until_millis": 0, "failures": [] } ``` ## Delete by Query Delete by query allows you to delete documents based on a query criteria: ```python # Delete by query POST /shopping/_delete_by_query { "query": { "range": { "price": { "lte": 100 } } } } # Output: { "took": 19, "timed_out": false, "total": 1, "deleted": 1, "batches": 1, "version_conflicts": 0, "noops": 0, "retries": { "bulk": 0, "search": 0 }, "throttled_millis": 0, "requests_per_second": -1, "throttled_until_millis": 0, "failures": [] } ``` ## Batch Processing Batch processing involves performing operations on multiple documents in a single request: ```python # Bulk indexing POST /shopping/_bulk {"index": {"_id": "1"}} {"product": "Keyboard", "price": 30, "quantity": 15, "category": "Electronics"} {"index": {"_id": "2"}} {"product": "Mouse", "price": 20, "quantity": 25, "category": "Electronics"} # Output: { "took": 16, "errors": false, "items": [ { "index": { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 9, "_primary_term": 1 } }, { "index": { "_index": "shopping", "_type": "_doc", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 10, "_primary_term": 1 } } ] } ``` ## Importing Data with cURL You can import data into Elasticsearch using cURL commands: ```python # Importing data with cURL curl -XPOST 'localhost:9200/shopping/_bulk' -H 'Content-Type: application/json' --data-binary @data.json # Ouptut: { "took": 16, "errors": false, "items": [ { "index": { "_index": "shopping", "_type": "_doc", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 9, "_primary_term": 1 } }, { "index": { "_index": "shopping", "_type": "_doc", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 10, "_primary_term": 1 } } ] } ```