# 2.6.3 Database Project Analysis
Translator: Fei Teng; Reviewer: Ted Liu
### 6.3 Database Project Analysis
This section analyzes the growth trend of the database field in terms of OpenRank, Activity and other indicators in the past five years, as well as the concentration trend of the top 10 projects. It also quotes the open source database information disclosed in [Database of Databases](https://dbdb.io/) and [DB-Engines Ranking](https://db-engines.com/en/ranking). The focus area is divided into 18 categories according to the database structure and purpose of the database, namely Relational, Key-Value, Document, Wide Column, Search Engine, Time Series, Vector, Graph, Object Oriented, Hierarchical, RDF, Array, Event, Spatial, Columnar, Native XML, and Content. The collaboration log data of the corresponding open source projects on GitHub are collected and analyzed.
#### 6.3.1 Growth Trends in the Database Domain Over the Past Five Years and the Changing Trends in the Concentration of Top 10 Leading Projects

<center>
Figure 6.6 Trends in OpenRank Changes in the Database Domain from 2020 to 2024
</center>
<br>

<center>
Figure 6.7 Trends in Activity Changes in the Database Domain from 2020 to 2024
</center>
<br>

<center>
Figure 6.8 Trends in the Concentration of OpenRank for the Top 10 Projects in the Database Domain from 2020 to 2024
</center>
<br>

<center>Figure 6.9 Trends in the Concentration of Activity for the Top 10 Projects in the Database Domain from 2020 to 2024
</center>
<br>
**1. Analysis of Concentration Changes in Leading Projects in the Database Domain**
Over the past five years, the concentration of OpenRank and concentration of Activity for the Top 10 leading projects in the database domain have remained within the range of [29%, 35%]. However, in the most recent three years (2022-2024), there has been a decline of approximately 3 percentage points compared to 2020 and 2021, with a slight rebound observed in 2024. Specifically:
- The concentration of OpenRank reached its peak in 2021 at 33.9455%, and dropped to its lowest point in 2023 at 29.42372%.
- The concentration of Activity peaked at 34.29604% in 2020 and fell to its lowest point of 29.96794% in 2022.
This indicates that the concentration of top database projects shows a consistent change in both OpenRank and Activity metrics. Moreover, by comparing the peak and trough years and trends of the two metrics, it can be observed that **OpenRank changes lag slightly behind Activity**, with the time lag being roughly on a monthly to quarterly scale. This lag reflects the temporal logic between activity and influence in database top projects: changes in activity may occur earlier, while changes in influence gradually follow.
2. The Recovery of Concentration in 2024 and Future Trend Predictions
In 2024, all concentration metrics for leading projects showed an upward trend, and the month-on-month increase in Activity concentration was greater than that of OpenRank concentration. This phenomenon indicates that the resurgence in activity among top database projects will further drive the accumulation of influence. Based on past trends, it can be predicted that the OpenRank concentration in 2025 may accelerate its recovery, and the influence of leading projects over the entire domain will also significantly strengthen as a result.
As the influence of top projects increases, an important challenge they face is **how to convert this influence into higher activity levels to further consolidate their position in the field**. This dynamic relationship is particularly crucial for top projects to maintain an advantage in the increasingly competitive database sector.
### 3. Intensified Industry Competition and Resource Allocation Challenges
Looking at the OpenRank and Activity trends over the past five years, although the indicators for top projects have rebounded in 2024, overall growth has slowed. This suggests that **competition for resources in the database sector is intensifying, and the pressure among leading projects is increasing**. In this context, how to leverage existing advantages and maintain a leading position will be a critical issue for the future development of top projects.
Overall, the changes in concentration among leading projects in the database domain reveal the temporal relationship between activity and the dissemination of influence, while also reflecting the intensification of competition within the field. In the future, leading projects will need to place greater emphasis on resource integration and the conversion of influence to address domain competition and further solidify their central position in the database technology ecosystem.
#### 6.3.2 Growth Trends in Various Subdomains of Databases Over the Past Five Years

<center>
Figure 6.10 Trends in OpenRank Changes Across Various Database Subdomains from 2020 to 2024**
</center>
<br>

<center>
Figure 6.11 Trends in Activity Changes Across Various Database Subdomains from 2020 to 2024
</center>
<br>
* The development of database categoriess has remained relatively stable over the past five years, with relational databases dominating the field. Although growth slowed in 2024, they still demonstrate strong dominance.
* Key-value databases saw a decline in influence and activity in 2024, with document databases catching up and even surpassing them to some extent.
* Document databases have maintained steady growth over the years.
The top three database categories together account for over 70% of the total OpenRank and activity indicators in the database sector.
As a sector that has existed since the birth of computing, databases have shown a stable development trend over the past five years. It is foreseeable that relational databases will continue to lead the industry, while various types of non-relational databases will serve as important branches in the long-term future.
#### 6.3.3 OpenRank Rankings and Activity Rankings with Proportions in Database Subdomains
<center>
Table 6.1 OpenRank Rankings in Database Subdomains
</center>
<br>
| Rank | Category | OpenRank | openrank_ratio(%) |
| :--: | :-------------: |-----------:|-----------------:|
| 1 | Relational | 55440.5 | 41.334 |
| 2 | Document | 18780.1 | 14.0016 |
| 3 | Key-value | 18262 | 13.6154 |
| 4 | Wide Column | 11285.4 | 8.41389 |
| 5 | Search Engine | 7575.18 | 5.64772 |
| 6 | Time Series | 7111.37 | 5.30192 |
| 7 | Vector | 5187.47 | 3.86755 |
| 8 | Graph | 4262.87 | 3.17821 |
| 9 | Object Oriented | 3532.3 | 2.63353 |
| 10 | Hierarchical | 1036.42 | 0.772709 |
| 11 | RDF | 430.36 | 0.320857 |
| 12 | Array | 319.34 | 0.238086 |
| 13 | Event | 281.65 | 0.209986 |
| 14 | Spatial | 239.08 | 0.178248 |
| 15 | Columnar | 228.52 | 0.170374 |
| 16 | Native XML | 132.76 | 0.09898 |
| 17 | Content | 22.77 | 0.0169763 |

<center>
Figure 6.12 Aggregate Proportions of OpenRank Across Subdomains in the Database Field
</center>
<br>
<center>
Table 6.2 Activity Rankings in Database Subdomains
</center>
<br>
| Rank | Category | Activity | activity_ratio(%) |
| :--: | :-------------: |-----------:|-----------------:|
| 1 | Relational | 166707 | 40.4575 |
| 2 | Document | 58567.1 | 14.2134 |
| 3 | Key-value | 57491.4 | 13.9524 |
| 4 | Wide Column | 32835.4 | 7.96871 |
| 5 | Search Engine | 24881.8 | 6.03848 |
| 6 | Time Series | 22610.5 | 5.48727 |
| 7 | Vector | 17463.4 | 4.23814 |
| 8 | Graph | 13128 | 3.18599 |
| 9 | Object Oriented | 10190.1 | 2.47299 |
| 10 | Hierarchical | 3021.28 | 0.733224 |
| 11 | RDF | 1405.37 | 0.341064 |
| 12 | Array | 1009.34 | 0.244953 |
| 13 | Spatial | 812.11 | 0.197088 |
| 14 | Event | 735.62 | 0.178525 |
| 15 | Columnar | 568.63 | 0.137999 |
| 16 | Native XML | 549.4 | 0.133332 |
| 17 | Content | 77.83 | 0.0188883 |
From the 2024 OpenRank and activity rankings across various categories in the database sector, the following observations can be made:
- Relational, Key-value, and Document databases consistently rank in the top three in both metrics. These top three categories collectively account for over 70% of the total metrics in the database sector.
- Relational databases dominate significantly, with their metrics exceeding the combined totals of the second to fifth places. They represent over 40% of the total metrics in the database sector, making it a super-large category.
- Columnar, as a newly listed database category, is experiencing rapid development momentum.
- Vector databases have also seen notable growth in 2024.
#### 6.3.4 Open Source Quadrant Charts for Projects in Various Subdomains of the Database Field
The Open Source Quadrant Chart evaluates database categories based on three key metrics: Activity, OpenRank, and CommunityVolume. The CommunityVolume metric follows the same formula as the Attention metric in open-digger project, calculated as the weighted sum of stars and forks over a given time period: `sum(1*star+2*fork)`.
Methodology for Quadrant Chart Construction:
1. Select the top 10 projects from each database subfield based on Activity.
2. Plot a `log(x)-log(y)` scatter plot using `log(openrank)-log(communityvolume)`, where the base of the logarithm is 2. This represents the number of half-lives required for the spatial influence (openrank) and temporal influence (communityvolume) to decay to 1.
3. Divide the plot into four quadrants using a vertical line corresponding to the mean of the horizontal coordinates (x-axis) of all points as the vertical axis, and a horizontal line corresponding to the mean of the vertical coordinates (y-axis) of all points as the horizontal axis.
There are 18 database categories in total. For the analysis, we selected 9 categories with an activity proportion greater than 1% in 2023: Relational, Key-value, Document, Wide Column, Search Engine, Time Series, Vector, Graph, and Object Oriented. The Open Source Quadrant Chart based on these categories is shown below:

<!-- <iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/OpenRank-CommunityVolume%20log-log%20quadrant%20diagram.html" width="100%" height="702px" frameborder="0"></iframe> -->
<center>Figure 6.12 Quadrant Chart of Activity Top 10 Projects in Database Categories </center>
<br>
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/relational.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.13 Quadrant Chart of Activity Top 10 Relational Databases</center>
<br>
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/key_value.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.14 Quadrant Chart of Activity Top 10 Key-value Databases</center><br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/document.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.15 Quadrant Chart of Activity Top 10 Document Databases</center><br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/wide_column.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.16 Quadrant Chart of Activity Top 10 Wide Column Databases</center>
<br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/search_engine.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.17 Quadrant Chart of Activity Top 10 Search Engine Databases</center><br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/time_series.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.18 Quadrant Chart of Activity Top 10 Time Series Databases</center><br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/vector.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.19 Quadrant Chart of Activity Top 10 Vector Databases</center><br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/graph.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.20 Quadrant Chart of Activity Top 10 Graph Databases</center><br />
<!--  -->
<iframe src="https://birdflyi.github.io/open-digger/notebook/database_analysis/object_oriented.html" width="100%" height="702px" frameborder="0"></iframe>
<center>Figure 6.21 Quadrant Chart of Activity Top 10 Object Oriented Databases</center>
The Search Engine category exhibits significant polarization, with projects like ElasticSearch having both high OpenRank and CommunityVolume, while others like Lucene-Solr and Xapian have relatively low values in both metrics.
Insights from the First Quadrant: Relational, Document, Search Engine, Vector, and Wide Column databases exhibit strong OpenRank influence as well as high CommunityVolume engagement. In contrast, Object-Oriented and Graph databases show weaker performance in both aspects.
From the vertical distribution in the open-source quadrant chart of the top 9 subcategories by activity, it can be observed that subcategories such as key_value and search_engine, represented by projects like valkey and meilisearch, exhibit higher CommunityVolume relative to their OpenRank, indicating a stronger community presence and faster growth expectations compared to other subcategories. The vector subcategory shows a strong linear correlation between the log-log values of CommunityVolume and OpenRank for its top 10 projects, suggesting a balanced relationship between community presence and collaborative influence.
#### 6.3.5 Analysis of Working Hours of Open Source Database Projects

<center>
Figure 6.22 Working Hours Distribution of Open Source Database Projects
</center>
<br />
From the chart, it can be observed that the peak working hours for open-source database projects are mainly concentrated between 2:00 to 10:00 UTC from Monday to Friday, while the active hours span from 1:00 to 18:00 UTC from Monday to Friday. This pattern may be related to the fact that most database-related projects have corporate backing. Based on the active UTC time, the chart shows that the active time of the day begins at 2:00 UTC, reaching a peak time at 6:00 UTC and continuing until 10:00 UTC. At 11:00 UTC, activity significantly decreases, and by 18:00 UTC, the projects are no longer active. The two distinct peak time — 2:00 to 6:00 UTC and 6:00 to 10:00 UTC — correspond to the working hours in Asia and Europe, respectively (assuming a typical work start time of 9:00 local time, aligning with UTC+7 to UTC+3 and UTC+3 to UTC-1). As the overlap in working hours gradually decreases afterward, the work peak quickly diminishes. This analysis highlights the critical role of collaboration between Asia and Europe in the open-source database domain, underscoring the importance of their contributions to the field.