# ArangoDB hands-on session 2 The previous ArangoDB hands-on session 1: https://hackmd.io/@valterUo/BycCtYtBO. ## Setting up Indexes and transactions are not fully supported in the browser's UI, which we used in the last hands-on session. Anyway, you can see the created indexes and create new ones in Collection &rarr; Packages &rarr; Indexes. Even if you never use ArangoDB, these principles in this hands-on tutorial apply to most databases with minor modifications. So it is useful to learn them anyway! ## Connect to Arangosh If `arangosh` is not in your PATH, navigate to the folder `\usr\bin` and start `arangosh` there. If you use Docker, open the client and run `arangosh` there. Because we are not using the `_system` database, we need to specify that we are going to connect to the `DebianGraph` database: ```javascript > arangosh --server.database debianGraph ``` Or you can connect to `_system` and then switch to DebianGraph: ```javascript > db._useDatabase("debianGraph") ``` ## Run queries You can run queries with `_query` function with `.toArray()` in the end: ```javascript > db._query('FOR package IN packages LIMIT 1 RETURN package').toArray() ``` ## Creating indexes You find more details about [ArangoDB's indexes](https://www.arangodb.com/docs/stable/indexing.html). We use the `_explain` function to execute queries in the following. Then we see how the indexes are used. ### Primary index ```javascript > db._explain('FOR package IN packages FILTER package._key == "emacs" RETURN package') ``` In this case, we see that both attributes, `_key` and `_id`, rely on the primary index. ### Edge index ```javascriptjavascript > db._explain('FOR v, e IN 1..2 INBOUND "packages/libc6" GRAPH "debian_dependency_graph" RETURN { v, e }') ``` Edge indexes are used when the graph is traversed. On the other hand, indexes are not necessarily the best solution for graph traversing: https://dmccreary.medium.com/how-to-explain-index-free-adjacency-to-your-manager-1a8e68ec664a. ### Persistent index The following command ensures that a unique persistent index exists. ```javascript collection.ensureIndex({ type: "persistent", fields: [ "field1", ..., "fieldn" ], unique: true }) ``` Let us first examine a query that does not use a persistent index: ```javascript > db._query('FOR package IN packages FILTER package.SHA1 == "dbb1343a3d24f60e5038994e3528dd7486e40943" RETURN package').getExtra() ``` The query returns, for example, ```javascript { "stats" : { "writesExecuted" : 0, "writesIgnored" : 0, "scannedFull" : 48811, "scannedIndex" : 0, "filtered" : 48810, "httpRequests" : 0, "executionTime" : 0.02892499999870779, "peakMemoryUsage" : 23660 }, "warnings" : [ ] } ``` Especially pay attention to the execution time `0.02892499999870779`. We then create an index (which is not unique, but you can try to change unique to true before running): ```javascript > db.packages.ensureIndex({ type: "persistent", fields: [ "SHA1" ], unique: false }) ``` When we rerun the previous query, the result is ```javascript { "stats" : { "writesExecuted" : 0, "writesIgnored" : 0, "scannedFull" : 0, "scannedIndex" : 1, "filtered" : 0, "httpRequests" : 0, "executionTime" : 0.0003885000005539041, "peakMemoryUsage" : 24012 }, "warnings" : [ ] } ``` So the query is over 74 times faster when it uses the index. ### TTL index For more about TTL indexes, you can read https://www.arangodb.com/docs/stable/indexing-ttl.html. ### Fulltext index Last time we did not study full-text queries, but they are simple to understand. We can create a full-text index by calling: ```javascript > db.packages.ensureIndex({ type: "fulltext", fields: [ "Tag" ], minLength: 3 }) ``` We cannot execute full-text queries on ArangoDB before creating the index, so there is no way to test how fast the query would be without the index. After creating the index, we can examine, for example, the following full-text query which uses the FULLTEXT-function: ```javascript > db._explain('FOR package IN FULLTEXT(packages, "Tag", "security") RETURN package') ``` More information on how to write full-text queries: https://www.arangodb.com/docs/stable/aql/functions-fulltext.html. ### Geo index For more about the Geo index: https://www.arangodb.com/docs/stable/indexing-geo.html. ### Hash and skiplist indices Since the hash index is supported as a persistent index, it may be good to know it existed. If you want to know more about it, you can find it in the old docs: https://www.arangodb.com/docs/3.6/indexing-hash.html. Similarly, the skiplist index is supported as a persistent index. You can still find it from the old docs: https://www.arangodb.com/docs/3.6/indexing-skiplist.html. ## Transactions More about transactions: https://www.arangodb.com/docs/stable/transactions.html. ### Begin Generally, a transaction block has the following structure: ```javascript > db._executeTransaction({ collections: { write: "collection_name1", read: "collection_name2" }, action: function () { // All operations go here } }) ``` Attributes `read` and `write` are optional. The `action` attribute defines the actions performed on the database. The JavaScript function contains JavaScript code in string format. ### Commit If the action function executes without exception, the changes are automatically committed. Create a demo database: ```javascript db._create("c1") ``` Execute the transaction on the database: ```javascript db._executeTransaction({ collections: { write: [ "c1" ] }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); db.c1.save({ _key: "key2" }); db.c1.save({ _key: "key3" }); } }) ``` All three documents were created: ```javascript db.c1.count() ``` Besides, you can write the action function so that it returns any valid JavaScript value. ### Rollback A transaction aborts and rolls back all the changes if an exception is thrown and not caught/handled inside the transaction function. Let's drop the database from the previous example: ```javascript db._drop("c1") ``` We try to execute the transaction, but because it ends with an exception, the changes are rolled back: ```javascript db._executeTransaction({ collections: { write: [ "c1" ] }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); db.c1.count(); // 1 db.c1.save({ _key: "key2" }); db.c1.count(); // 2 throw "doh!"; } }) ``` No documents were created: ```javascript db.c1.count() ``` Besides, you can write [your exceptions](https://www.arangodb.com/docs/stable/transactions-transaction-invocation.html#custom-exceptions) or use [ArangoDB expections](https://www.arangodb.com/docs/stable/transactions-transaction-invocation.html#throwing-arango-exceptions). Again, let's initialize the database: ```javascript db._drop("c1") db._create("c1") ``` Also, any other exception from the function aborts the transaction: ```javascript db._executeTransaction({ collections: { write: "c1" }, action: function () { var db = require("@arangodb").db; db.c1.save({ _key: "key1" }); // will throw duplicate a key error which the user didn't explicitly code db.c1.save({ _key: "key1" }); // We'll never get here } }) ``` The database stayed empty: ```javascript db.c1.count() ``` ### Cross-collection transactions You can add any collections to the transaction and refer them to the action function. These are called [cross-collection transactions](https://www.arangodb.com/docs/stable/transactions-transaction-invocation.html#cross-collection-transactions). ### Other properties ArangoDB offers a good introduction to transactions: * [Passing parameters](https://www.arangodb.com/docs/stable/transactions-passing.html) * [Locking and isolation](https://) * [Durability](https://) * [Limitations](https://) ## Exercises 2 Write your queries (and short explanations if needed) to the following questions. Then return them in Moodle in **one** PDF by **15.5.2023**. Total number of points is 7. The documentation is useful: https://www.arangodb.com/docs/stable/index.html. 1. Write a query that filters based on SHA1 and SHA256 hashes. Then create a persistent index that is based on both of these attributes. Compare how much faster the query is with or without the index. How does it affect that the index is combined of two attributes? Is there a difference if your FILTER clause contains `AND` or `OR`? (1 point) 2. Create a full-text index on the attribute `Description`. Query those packages whose descriptions contain 'Python'. Group the result by architecture and return the count of packages for each architecture type. The result can look something like `[{"architecture": x, "number_of_packages": y}, ...]` (2 points) 3. Create an index on the attribute `Size`. Query packages whose sizes are between 50 and 1000 and sort the result into ascending order (the smaller first). (1 point) 4. Write a transaction that inserts the following data into the `packages` collection if it does not already exist. In the case it exists, throw a proper error. (1 point) ``` { "_id": packages/elinks-data "_rev": _cDljlvG--- "_key": elinks-data "Architecture": "all", "Description": "advanced text-mode WWW browser - data files", "Filename": "pool/main/e/elinks/elinks-data_0.12~pre6-5_all.deb", "Maintainer": "Moritz Muehlenhoff <jmm@debian.org>", "Package": "elinks-data", "Priority": "optional", "SHA1": "4ec9dbb32065b80ae5bc9d3f6d53edd499caf74a", "SHA256": "9dbeaaadb9d69bb1bceb183bfda4f8f87e4d4a2f10dd2e2da95827c1f7c4a55c", "Section": "web", "Size": "389458", "Source": "elinks", "Tag": "role::app-data", "Version": "0.12~pre6-5", "InstalledSize": 1733 } ``` 5. Query some packages returning their `Installed-Size` attributes using the recommended notation `package.Installed-Size`. What happens? In this case, you can use the notation `package["Installed-Size"]`, but the attribute `Installed-Size` is not good. Write a query that updates all `Installed-Size` attributes to `InstalledSize` attributes and removes `Installed-Size` key-value pairs from the documents. Because you want to be sure that no data are lost during the update, and the output is correct, perform this update using transactions. Besides the update query, write a query or queries that check the correctness of the update. You can abort the transaction and roll it back if your update fails. Hint: You benefit from https://www.arangodb.com/docs/stable/aql/operations-update.html. (2 points) --- ## Introduction presentation 2 [Presentation](https://helsinkifi-my.sharepoint.com/:b:/g/personal/vauotila_ad_helsinki_fi/Ea75yTzvxbVGicLPRTxiWhkBst2cNxYCNu8olXqdNoT9gg?e=7ttXMQ) ---