ArangoDB hands-on session 1 with example solutions to exercises

# ArangoDB hands-on session 1 with example solutions to exercises 1. How many packages are there in total? (0.5 points) ``` RETURN count(packages) ``` or ``` RETURN length(packages) ``` 2. Find the packages that are maintained by Rob Browning. Return only their names. Hint: you benefit from the operators https://www.arangodb.com/docs/stable/aql/operators.html. (0.5 points) You can use regular expressions ``` FOR package in packages FILTER package.Maintainer =~ "Rob Browning*" RETURN package.Package ``` Or ArangoDB's own notation which is much slower ``` FOR package in packages FILTER package.Maintainer LIKE "%Rob Browning%" RETURN package.Package ``` Or use CONTAINS-function which happens to be the fastest in this case ``` FOR package in packages FILTER CONTAINS(package.Maintainer, "Rob Browning") RETURN package.Package ``` 3. Write at least two different queries that find the maximum size of the packages. Return only the maximum value. Which methods perform better and what is the difference in time? Use explain feature (next to the run button) to figure out the reasons for the difference. Hint: MAX, [*] array operator, COLLECT, AGGREGATE, SORT, etc. ``` // This used to work but I cannot figure out why not anymore RETURN MAX(packages[*].Size) ``` ``` FOR package IN packages COLLECT AGGREGATE max = MAX(package.Size) RETURN max ``` ``` FOR package IN packages SORT package.Size DESC LIMIT 1 RETURN package.Size ``` Some answers also contained comparision between warm and cold caches. That is an interesting comparison too! 4. Write a graph query that starts from the package `libc6` and returns vertices and edges within one or two hops. Pay attention that we do not want edges that are coming to the package `libc6` (i.e. do not include incoming edges). ``` FOR v, e IN 1..2 OUTBOUND 'packages/libc6' GRAPH 'debian_dependency_graph' RETURN { v, e } ``` 5. Find the shortest path between dependencies `debconf` and `gfxboot-dev`. Use EXPLAIN to find information on how the query is processed in the system. ``` FOR vertex, edge IN ANY SHORTEST_PATH "packages/debconf" TO "packages/gfxboot-dev" GRAPH debian_dependency_graph RETURN [vertex._key, edge._key] ``` 6. Find the shortest path from `libxrandr2` to `freecraft` which contains vertex `chromium`. The direction of edges does not matter. AQL has a limitation that you cannot use filtering in shortest path queries but they claim that there is a workaround. Hints: https://www.arangodb.com/docs/stable/aql/graphs-shortest-path.html#conditional-shortest-path and https://www.arangodb.com/docs/stable/aql/graphs-traversals.html. ``` FOR v, e, p IN 1..5 ANY 'packages/libxrandr2' GRAPH 'debian_dependency_graph' OPTIONS { bfs : true } FILTER v._id == "packages/freecraft" FILTER p.vertices[*]._key ANY == "chromium" LIMIT 1 RETURN { start: p.vertices[0]._key, vertices_on_path: p.vertices[*]._key, edges_on_path: p.edges[*]._key, target: p.vertices[-1]._key } ``` On the other hand, the above query is quite slow (for me it took ~50s to finish it). Much faster method is to divide the query into two parts ``` LET shortest_path_to_chromium = ( FOR vertex, edge IN ANY SHORTEST_PATH "packages/libxrandr2" TO "packages/chromium" GRAPH debian_dependency_graph RETURN [vertex._key, edge._key] ) LET shortest_path_from_chromium = ( FOR vertex, edge IN ANY SHORTEST_PATH "packages/chromium" TO "packages/freecraft" GRAPH debian_dependency_graph RETURN [vertex._key, edge._key] ) RETURN APPEND(shortest_path_to_chromium, SHIFT(shortest_path_from_chromium), true) ``` But this division might not work in cases we want to filter with respect to some other attributes. I tried to write a query using `PRUNE` but I just couldn't make it work ``` FOR v, e, p IN 1..5 ANY 'packages/libxrandr2' GRAPH 'debian_dependency_graph' PRUNE v._key == "freecraft" OPTIONS { bfs : true } FILTER p.vertices[*]._key ANY == "chromium" LIMIT 1 RETURN { start: p.vertices[0]._key, vertices_on_path: p.vertices[*]._key, target: target_vertex._key } ``` `PRUNE` defines a condition, like in a `FILTER` statement, which will be evaluated in every step of the traversal, as early as possible. In this exercise it happens to replace `TO` keyword. If you see the resulting path, the target vertex is not included there. Now if you run the query without `FILTER` statement, your query should be equivalent to ordinary shortest path query as in the exercise 5. The line ``` OPTIONS { bfs : true } ``` defines that we are using breadth-first traversal algorithm. The condition ``` FILTER p.vertices[*]._key ANY == "chromium" ``` checks that the path contains a vertex whose key is chromium. In this query we are allowed to use `FILTER`. In my opinion ``` LIMIT 1 ``` contains a hidden idea. It seems (I run some examples) that ArangoDB automatically sorts the results so that the result with the shortest path is always the first in the result list. That is why `LIMIT 1` works here. If you do not include the limit in your query, you will see multiple results but the shortest path is still the first in the list. ``` LET target_vertex = (FOR package IN packages FILTER package._key == "freecraft" RETURN package)[0] FOR v, e, p IN 1..5 ANY 'packages/libxrandr2' GRAPH 'debian_dependency_graph' PRUNE v._key == target_vertex._key OPTIONS { bfs : true } FILTER p.vertices[*]._key ANY == "chromium" LIMIT 1 RETURN { start: p.vertices[0]._key, vertices_on_path: p.vertices[*]._key, target: target_vertex._key } ``` 7. You saw that ArangoDB supports certain kinds of graph queries. Describe a graph query that is difficult to write using ArangoDB's AQL query language. You can just use natural language to describe the query. For example, you can compare AQL to [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) and [Cypher](https://neo4j.com/developer/cypher/) which are the most popular graph query languages. (1 point) *In my opinion, pattern matching is difficult with AQL. For example, if you want to query with a pattern ```(Boss) -[supervisor]-> (Person) <-[knows]- (Friend)``` and you want to set some filtering conditions on vertices and edges, this is difficult to write in AQL. Both Cypher and SPARQL are developed around graph pattern matching concept so the above query is relatively easy to write using those query languages. When writing AQL you fix a starting vertex or starting vertices and start traversing until you face the stop condition. You can write sort of pattern matching queries with AQL (https://www.arangodb.com/learn/graphs/pattern-matching/). Generally, graph queries can be divided into two categories: pattern matching query languages and graph traversing query language. Besides AQL, Gremlin is a famous graph traversing query language. Anyway, most of the graph query languages are developed for both purposes. Other points: if you want to traverse multiple named graphs at once. This is also impossible in Neo4j but some RDF databases might support it.* ## Hands-on session 2 The next hands-on session → https://hackmd.io/@valterUo/SJ7P1DyVd/edit.