# AGE Multilabel
## Storing graph nodes with multilabel
When we talk about assigning multiple labels to one node, it means that a node can belong to two different label classes. For example, considering two class names `Person` and `Student`, a node `n` can belong to both labels, and this is perfectly logical. However, the challenge here is how to store this information in a database.
There are two aspects to consider:
- Storage optimization: This involves finding ways to efficiently store data. By optimizing storage, we can maximize the usage of available resources and improve overall system performance.
- Query optimization: Another important aspect of system performance is optimizing queries. This includes analyzing and fine-tuning database queries to ensure they run efficiently and return results in a timely manner. By optimizing queries, we can reduce response times and improve the overall responsiveness of the system.
Both aspects need to work together, so we need to find a tradeoff between them. Additionally, there may be alternative approaches that can achieve both objectives to some extent.
I won't object to the current modeling approach. However, I will point out that it is not scalable when dealing with a large set of multi-labels.
The proposed approach can create problems during the creation and updating process. For example, when a node is part of multiple tables and one property needs to be updated, we have to update it in each and every table where the node is present.
Here are the problems:
- Identifying all tables where the node is present requires maintaining a repository that keeps track of which node is part of which table.
- If the update is successful in one table but fails in another table, the problem of data correctness arises.
Therefore, I advocate that nodes should be stored in a centralized nodes table, and all of their properties should be stored there. From there, they can be inherited into label tables, with each label stored in its own table.
When performing queries, the tables need to be joined together based on the nature of the query. This may slow down the query to some extent, but it should not significantly impact general queries with two or three labels.
As implemented in Neo4j, the clause can be transformed into a WHERE clause, treating the labels as filters. The image below illustrates this:

Here, the query has been transformed into a WHERE clause and joined using boolean logic. I advocate for the same solution, following these steps:
- To optimize the query, you can transform it using boolean logic and various logical operators such as AND, OR, and NOT.
- When joining tables, it is recommended to start with the smallest table and join it with the largest table to improve performance.
- Additionally, you can use joins on other relevant tables to gather more comprehensive data.
- In order to refine the results, consider applying filter methods to narrow down the data according to specific criteria.