owned this note changed 2 months ago
Published Linked with GitHub

5.6.3 Database Project Analysis

Translator: Fei Teng; Reviewer: Ted Liu

6.3 Database Project Analysis

This section analyzes the growth trend of the database field in terms of OpenRank, Activity and other indicators in the past five years, as well as the concentration trend of the top 10 projects. It also quotes the open source database information disclosed in Database of Databases and DB-Engines Ranking. The focus area is divided into 18 categories according to the database structure and purpose of the database, namely Relational, Key-Value, Document, Wide Column, Search Engine, Time Series, Vector, Graph, Object Oriented, Hierarchical, RDF, Array, Event, Spatial, Columnar, Native XML, and Content. The collaboration log data of the corresponding open source projects on GitHub are collected and analyzed.

image

Figure 6.6 Trends in OpenRank Changes in the Database Domain from 2020 to 2024

image

Figure 6.7 Trends in Activity Changes in the Database Domain from 2020 to 2024

image

Figure 6.8 Trends in the Concentration of OpenRank for the Top 10 Projects in the Database Domain from 2020 to 2024

image

Figure 6.9 Trends in the Concentration of Activity for the Top 10 Projects in the Database Domain from 2020 to 2024

1. Analysis of Concentration Changes in Leading Projects in the Database Domain

Over the past five years, the concentration of OpenRank and concentration of Activity for the Top 10 leading projects in the database domain have remained within the range of [29%, 35%]. However, in the most recent three years (2022-2024), there has been a decline of approximately 3 percentage points compared to 2020 and 2021, with a slight rebound observed in 2024. Specifically:

  • The concentration of OpenRank reached its peak in 2021 at 33.9455%, and dropped to its lowest point in 2023 at 29.42372%.
  • The concentration of Activity peaked at 34.29604% in 2020 and fell to its lowest point of 29.96794% in 2022.

This indicates that the concentration of top database projects shows a consistent change in both OpenRank and Activity metrics. Moreover, by comparing the peak and trough years and trends of the two metrics, it can be observed that OpenRank changes lag slightly behind Activity, with the time lag being roughly on a monthly to quarterly scale. This lag reflects the temporal logic between activity and influence in database top projects: changes in activity may occur earlier, while changes in influence gradually follow.

  1. The Recovery of Concentration in 2024 and Future Trend Predictions

In 2024, all concentration metrics for leading projects showed an upward trend, and the month-on-month increase in Activity concentration was greater than that of OpenRank concentration. This phenomenon indicates that the resurgence in activity among top database projects will further drive the accumulation of influence. Based on past trends, it can be predicted that the OpenRank concentration in 2025 may accelerate its recovery, and the influence of leading projects over the entire domain will also significantly strengthen as a result.

As the influence of top projects increases, an important challenge they face is how to convert this influence into higher activity levels to further consolidate their position in the field. This dynamic relationship is particularly crucial for top projects to maintain an advantage in the increasingly competitive database sector.

3. Intensified Industry Competition and Resource Allocation Challenges

Looking at the OpenRank and Activity trends over the past five years, although the indicators for top projects have rebounded in 2024, overall growth has slowed. This suggests that competition for resources in the database sector is intensifying, and the pressure among leading projects is increasing. In this context, how to leverage existing advantages and maintain a leading position will be a critical issue for the future development of top projects.

Overall, the changes in concentration among leading projects in the database domain reveal the temporal relationship between activity and the dissemination of influence, while also reflecting the intensification of competition within the field. In the future, leading projects will need to place greater emphasis on resource integration and the conversion of influence to address domain competition and further solidify their central position in the database technology ecosystem.

6-10

Figure 6.10 Trends in OpenRank Changes Across Various Database Subdomains from 2020 to 2024**

6-11

Figure 6.11 Trends in Activity Changes Across Various Database Subdomains from 2020 to 2024

  • The development of database categoriess has remained relatively stable over the past five years, with relational databases dominating the field. Although growth slowed in 2024, they still demonstrate strong dominance.
  • Key-value databases saw a decline in influence and activity in 2024, with document databases catching up and even surpassing them to some extent.
  • Document databases have maintained steady growth over the years.

The top three database categories together account for over 70% of the total OpenRank and activity indicators in the database sector.

As a sector that has existed since the birth of computing, databases have shown a stable development trend over the past five years. It is foreseeable that relational databases will continue to lead the industry, while various types of non-relational databases will serve as important branches in the long-term future.

6.3.3 OpenRank Rankings and Activity Rankings with Proportions in Database Subdomains

Table 6.1 OpenRank Rankings in Database Subdomains

Rank Category OpenRank openrank_ratio(%)
1 Relational 55440.5 41.334
2 Document 18780.1 14.0016
3 Key-value 18262 13.6154
4 Wide Column 11285.4 8.41389
5 Search Engine 7575.18 5.64772
6 Time Series 7111.37 5.30192
7 Vector 5187.47 3.86755
8 Graph 4262.87 3.17821
9 Object Oriented 3532.3 2.63353
10 Hierarchical 1036.42 0.772709
11 RDF 430.36 0.320857
12 Array 319.34 0.238086
13 Event 281.65 0.209986
14 Spatial 239.08 0.178248
15 Columnar 228.52 0.170374
16 Native XML 132.76 0.09898
17 Content 22.77 0.0169763

newplot

Figure 6.12 Aggregate Proportions of OpenRank Across Subdomains in the Database Field

Table 6.2 Activity Rankings in Database Subdomains

Rank Category Activity activity_ratio(%)
1 Relational 166707 40.4575
2 Document 58567.1 14.2134
3 Key-value 57491.4 13.9524
4 Wide Column 32835.4 7.96871
5 Search Engine 24881.8 6.03848
6 Time Series 22610.5 5.48727
7 Vector 17463.4 4.23814
8 Graph 13128 3.18599
9 Object Oriented 10190.1 2.47299
10 Hierarchical 3021.28 0.733224
11 RDF 1405.37 0.341064
12 Array 1009.34 0.244953
13 Spatial 812.11 0.197088
14 Event 735.62 0.178525
15 Columnar 568.63 0.137999
16 Native XML 549.4 0.133332
17 Content 77.83 0.0188883

From the 2024 OpenRank and activity rankings across various categories in the database sector, the following observations can be made:

  • Relational, Key-value, and Document databases consistently rank in the top three in both metrics. These top three categories collectively account for over 70% of the total metrics in the database sector.
  • Relational databases dominate significantly, with their metrics exceeding the combined totals of the second to fifth places. They represent over 40% of the total metrics in the database sector, making it a super-large category.
  • Columnar, as a newly listed database category, is experiencing rapid development momentum.
  • Vector databases have also seen notable growth in 2024.

6.3.4 Open Source Quadrant Charts for Projects in Various Subdomains of the Database Field

The Open Source Quadrant Chart evaluates database categories based on three key metrics: Activity, OpenRank, and CommunityVolume. The CommunityVolume metric follows the same formula as the Attention metric in open-digger project, calculated as the weighted sum of stars and forks over a given time period: sum(1*star+2*fork).

Methodology for Quadrant Chart Construction:

  1. Select the top 10 projects from each database subfield based on Activity.

  2. Plot a log(x)-log(y) scatter plot using log(openrank)-log(communityvolume), where the base of the logarithm is 2. This represents the number of half-lives required for the spatial influence (openrank) and temporal influence (communityvolume) to decay to 1.

  3. Divide the plot into four quadrants using a vertical line corresponding to the mean of the horizontal coordinates (x-axis) of all points as the vertical axis, and a horizontal line corresponding to the mean of the vertical coordinates (y-axis) of all points as the horizontal axis.

There are 18 database categories in total. For the analysis, we selected 9 categories with an activity proportion greater than 1% in 2023: Relational, Key-value, Document, Wide Column, Search Engine, Time Series, Vector, Graph, and Object Oriented. The Open Source Quadrant Chart based on these categories is shown below:

6-12

Figure 6.12 Quadrant Chart of Activity Top 10 Projects in Database Categories

Figure 6.13 Quadrant Chart of Activity Top 10 Relational Databases

Figure 6.14 Quadrant Chart of Activity Top 10 Key-value Databases

Figure 6.15 Quadrant Chart of Activity Top 10 Document Databases

Figure 6.16 Quadrant Chart of Activity Top 10 Wide Column Databases

Figure 6.17 Quadrant Chart of Activity Top 10 Search Engine Databases

Figure 6.18 Quadrant Chart of Activity Top 10 Time Series Databases

Figure 6.19 Quadrant Chart of Activity Top 10 Vector Databases

Figure 6.20 Quadrant Chart of Activity Top 10 Graph Databases

Figure 6.21 Quadrant Chart of Activity Top 10 Object Oriented Databases

The Search Engine category exhibits significant polarization, with projects like ElasticSearch having both high OpenRank and CommunityVolume, while others like Lucene-Solr and Xapian have relatively low values in both metrics.

Insights from the First Quadrant: Relational, Document, Search Engine, Vector, and Wide Column databases exhibit strong OpenRank influence as well as high CommunityVolume engagement. In contrast, Object-Oriented and Graph databases show weaker performance in both aspects.

From the vertical distribution in the open-source quadrant chart of the top 9 subcategories by activity, it can be observed that subcategories such as key_value and search_engine, represented by projects like valkey and meilisearch, exhibit higher CommunityVolume relative to their OpenRank, indicating a stronger community presence and faster growth expectations compared to other subcategories. The vector subcategory shows a strong linear correlation between the log-log values of CommunityVolume and OpenRank for its top 10 projects, suggesting a balanced relationship between community presence and collaborative influence.

6.3.5 Analysis of Working Hours of Open Source Database Projects

6-22

Figure 6.22 Working Hours Distribution of Open Source Database Projects

From the chart, it can be observed that the peak working hours for open-source database projects are mainly concentrated between 2:00 to 10:00 UTC from Monday to Friday, while the active hours span from 1:00 to 18:00 UTC from Monday to Friday. This pattern may be related to the fact that most database-related projects have corporate backing. Based on the active UTC time, the chart shows that the active time of the day begins at 2:00 UTC, reaching a peak time at 6:00 UTC and continuing until 10:00 UTC. At 11:00 UTC, activity significantly decreases, and by 18:00 UTC, the projects are no longer active. The two distinct peak time — 2:00 to 6:00 UTC and 6:00 to 10:00 UTC — correspond to the working hours in Asia and Europe, respectively (assuming a typical work start time of 9:00 local time, aligning with UTC+7 to UTC+3 and UTC+3 to UTC-1). As the overlap in working hours gradually decreases afterward, the work peak quickly diminishes. This analysis highlights the critical role of collaboration between Asia and Europe in the open-source database domain, underscoring the importance of their contributions to the field.

Select a repo