## 5. Technological insights The technology field is rapidly evolving, especially in various subfields. **Operating systems** are being developed in new architectures, **cloud native** are driving digital transformation, **databases** are becoming the infrastructure for data innovation, **big data** is facilitating intelligent decision-making, **artificial intelligence** is accelerating automation in various industries, and **front-end** technologies are focusing on interaction and aesthetics. These areas are at the forefront of technology, attracting innovators and investors and creating a booming trend. In this section, we will provide insights into these six areas in terms of two metrics: influence and activity. ### 5.1 Overall development trend of six major technology areas in the past five years ![5-1](/image/data/chapter_5/5-1.png) <center> Figure 5.1 Trends in OpenRank by subfield over the last 5 years </center> ![5-2](/image/data/chapter_5/5-2.png) <center> Figure 5.2 Trends in activity by subfield over the past five years </center> <br> Cloud-native computing and artificial intelligence (AI) have gained popularity in the past five years, reflected in their increased number of repositories. Databases remain critical, while the influence of front-end development is shrinking. Operating systems have a smaller number of repositories but hold great value. ### 5.2 5-Year Trends in OpenRank and Activity for the Top 10 Projects in Each Technology Area #### 5.2.1 Cloud Native ![5-3](/image/data/chapter_5/5-3.png) <center> Figure 5.3 Trends in the Cloud-Native Top 10 OpenRank Projects over the Last Five Years </center> ![5-4](/image/data/chapter_5/5-4.png) <center> Figure 5.4 Cloud-Native Top 10 Active Project Trends in the Last Five Years </center> <br> Both indicators of Kubernetes have significantly decreased, while Grafana has emerged as the top influencer. The llvm-project has shown remarkable growth and has become the most active project in the past three years. LLVM is a compiler framework that comprises a collection of modular and reusable compiler as well as toolchain technologies. Its rapid growth in popularity among developers is a testament to its effectiveness. #### 5.2.2 Artificial intelligence ![5-5](/image/data/chapter_5/5-5.png) <center> Figure 5.5 Trends in the AI Top 10 OpenRank Projects over the Last Five Years </center> ![5-6](/image/data/chapter_5/5-6.png) <center> Figure 5.6 Artificial Intelligence Top 10 Active Project Trends in the Last Five Years </center> <br> TensorFlow has been declining and is out of the top 5, while Pytorch is growing and widening the gap. LangChain, an open-source software project by Harrison Chase, is in second place in both indicators since it launched in October 2022 and is now one of the most popular frameworks for LLM development. #### 5.2.3 Big Data ![5-7](/image/data/chapter_5/5-7.png) <center> Figure 5.7 Trends in the Big Data Top 10 OpenRank Projects in the Last Five Years </center> ![5-8](/image/data/chapter_5/5-8.png) <center> Figure 5.8 Big Data Top 10 Active Projects Trends in the Last 5 Years </center> <br> Kibana and Grafana are the top two big data solutions, with a consistent upward trend. Grafana is predicted to surpass Kibana and become the top-ranked solution in the future. Kibana is an open-source tool for data visualization and exploration, tightly integrated with ElasticSearch. Grafana is an open-source tool for monitoring and reporting. It can visualize data from various sources, including Prometheus, InfluxDB, and Graphite, among others. Grafana's data processing and visualization features enable the creation of different charts and dashboards. #### 5.2.4 Database ![5-9](/image/data/chapter_5/5-9.png) <center> Figure 5.9 Trends in the Database Top 10 OpenRank Projects over the Last Five Years </center> ![5-10](/image/data/chapter_5/5-10.png) <center> Figure 5.10 Database Top 10 Active Project Trends in the Last Five Years </center> <br> Doris is the fastest-growing database, with activity metrics nearing the top spot, while ElasticSearch is dropping back in popularity. It is predicted that Doris will surpass ClickHouse in the future. ClickHouse is an open source MPP architecture designed by Yandex. It analyzes large amounts of data and is claimed to be 100-1000x faster than traditional databases. Key feature: high-performance vectorized execution engine. Also known for rich functionality and reliability. Apache Doris is contributed by Baidu open source MPP analytical database products , distributed architecture is simple , easy to operate and maintain . #### 5.2.5 Frontend ![5-11](/image/data/chapter_5/5-11.png) <center> Figure 5.11 Trends in the Frontend Top 10 OpenRank Projects over the Last Five Years </center> ![5-12](/image/data/chapter_5/5-12.png) <center> Figure 5.12 Frontend Top 10 Active Project Trends in the Last Five Years </center> <br> While declining in both indicators year over year, Flutter still has a clear advantage over Next.js, which started to gain momentum in 2023 and is rising significantly. The 3-10 ranked programs are highly competitive, with little gap between them. Flutter is a framework developed and supported by Google. Front-end and full-stack developers use Flutter to build the user interface of applications for multiple platforms with a single code base. Next.js is an open source platform created by Vercel, built with Node.js and Babel translators and designed for use with React Single Page Application Framework. In addition, Next.js provides many useful features, such as preview mode, rapid developer compilation and static export. #### 5.2.6 Operating system ![5-13](/image/data/chapter_5/5-13.png) <center> Figure 5.13 Trends in the Operating System Top 10 OpenRank Projects over the Last Five Years </center> <br> ![5-14](/image/data/chapter_5/5-14.png) <center> Figure 5.14 Operating System Top 10 Active Project Trends in the Last Five Years </center> <br> As you can see, several repositories under the OpenHarmony project are in the top 10 list. This insight combines data from the Gitee platform so you can more intuitively see the advantages of domestic operating systems in various aspects (there are several repositories under the OpenHarmony project, and this insight analyzes them in terms of repositories). SerenityOS has fallen back a bit since 2021 and is second only to OpenHarmony and OpenEuler, which also have good performance. ### 5.3 OpenRank Top 10 list for each field in 2023 Below are the OpenRank rankings for projects in each field for 2023. #### 5.3.1 Cloud Native Table 5.1 Top Projects in Cloud Native | Number | Project Name | OpenRank | | :----: | :--------------------: | :------: | | 1 | grafana/grafana | 7134.37 | | 2 | lvm/llvm-project | 7049.62 | | 3 | kubernetes/kubernetes | 5374.14 | | 4 | ClickHouse/ClickHouse | 4941.99 | | 5 | cilium/cilum | 3215.42 | | 6 | ceph/ceeph | 3172.49 | | 7 | keycloak/keycloak | 3095.56 | | 8 | gravitational/teleport | 3082.18 | | 9 | envoyproxy/envoy | 2929.08 | | 10 | backstopage/package | 2903.39 | #### 5.3.2 Artificial Intelligence Table 5.2 Top Projects in Artificial Intelligence | Number | Project Name | OpenRank | | :----: | :----------------------------------: | :------: | | 1 | pytorch/pytorch | 10182.45 | | 2 | langchain-ai/langchain | 6080.25 | | 3 | Paddle/Paddle | 5408.62 | | 4 | huggingface/transformers | 4422.84 | | 5 | AUTOMATIC1111/stable-diffusion-webui | 3881.6 | | 6 | openvinoolkit/openvinvinino | 3857.31 | | 7 | microsoft/onnxruntime | 3006.75 | | 8 | tensorflow/tensor | 2723.26 | | 9 | Significant-Gravitas/AutoGPT | 2664.85 | | 10 | ggerganov/llama.cpp | 2339.8 | #### 5.3.3 Big Data Table 5.3 Top Projects in Big Data | Number | Project Name | OpenRank | | :----: | :-------------------: | -------- | | 1 | elastic/kibana | 7601.04 | | 2 | grafana/grafana | 7134.37 | | 3 | ClickHouse/ClickHouse | 4941.99 | | 4 | airbytehq/airbyte | 4658.86 | | 5 | apache/doris | 4307.26 | | 6 | elastic/elasticsearch | 3729.39 | | 7 | apache/airflow | 3642.9 | | 8 | StarRocks/starrocks | 3194.56 | | 9 | trinodb/trino | 2703.4 | | 10 | apache/spark | 2654.02 | #### 5.3.4 Database Table 5.4 Top Projects in Database | Number | Project Name | OpenRank | | :----: | :-------------------: | :------: | | 1 | ClickHouse/ClickHouse | 4941.99 | | 2 | apache/doris | 4307.26 | | 3 | elastic/elasticsearch | 3729.39 | | 4 | cockroachdb/cockroach | 3443.7 | | 5 | StarRocks/starrocks | 3194.56 | | 6 | trinodb/trino | 2703.4 | | 7 | apache/spark | 2654.02 | | 8 | pingcap/tidb | 2200.38 | | 9 | milvus-io/milus | 2001.11 | | 10 | yugabyte/yugabyte-db | 1940.75 | #### 5.3.5 Frontend Table 5.5 Top Projects in Frontend | Number | Project Name | OpenRank | | :----: | :-------------------: | :------: | | 1 | flutter/futter | 9361.81 | | 2 | vercel/next.js | 6638.65 | | 3 | appsmithorg/appsmith | 3474.07 | | 4 | nuxt/nuxt | 3387.23 | | 5 | facebook/react-native | 3260.55 | | 6 | Ant-design/ant-design | 3053.25 | | 7 | nodejs/node | 2736.37 | | 8 | angular/angular | 2273.82 | | 9 | Electron/electron | 1773.31 | | 10 | denoland/denoo | 1654.01 | #### 5.3.6 Operating system Table 5.6 Top Projects in Operating System | Number | Project Name | OpenRank | | :----: | :---------------------------------------------------------------------------: | :------: | | 1 | openharmony/docs | 3277.69 | | 2 | openharmony/arkui_ace_engagement | 2818.09 | | 3 | SerenityOS/serenity | 2257.68 | | 4 | openharmony/graphic_graphic_2d | 1239.6 | | 5 | openeuer/docs | 1206.9 | | 6 | openharmony/xts_acts | 1186.06 | | 7 | openharmony/arkcompiler_ets_runtime | 961.99 | | 8 | openharmony/interface_sdk-js | 910.91 | | 9 | reactos/reactos | 745.23 | | 10 | armbian/build | 679.1 | ## 6. Insights on open source projects In 2023, large AI models like GPT-4 and CLIP emerged, leading to competition among global enterprises to invest in research and development for cutting-edge technologies like language understanding and image generation. The industry saw rapid evolution, marking the beginning of a new era in the broad application of AI. The database field experienced a trend of innovation with various technologies like distributed databases, time-series databases, and graph databases emerging to cater to different application scenarios. Cloud-native databases became popular, offering flexible scaling and high availability. This section provides data insights on project types by statistically analyzing project topics. In-depth insights are also provided into the two core areas of database and AI. ### 6.1 Type of project This subsection selects the top 10,000 active GitHub repositories for statistical analysis. #### 6.1.1 Ratios for different project types <img width="600" alt="6-1" src="/image/data/chapter_6/6-1.png"> <center> Figure 6.1 Ratios for different project types </center> <br> - Software development primarily comprises components and frameworks (libraries and frameworks), which constitute 31.36% of it. Developers enjoy using these open-source collaborative innovations, which are the most popular types to contribute to; - The Application Software category is second only to the Component Framework category (24.34%) due to its utility, enabling all users (not just developers) to utilize open source software in a variety of industries and domains; - Non-Software content holds a significant share of 23.17%. It shows the growing trend of open-source as a collaborative development model that extends to the entire content domain, including documentation, education, art, hardware, and other non-programming-related areas; - Developers find the Software Tools category valuable as it allows them to focus on building software applications and products, making up 18.9% of their work; - The System Software category comprises fundamental software, accounting for only 2.3% of the total despite its immense value and complexity. #### 6.1.2 Percentage of OpenRank by Project Type <img width="600" alt="6-2" src="/image/data/chapter_6/6-2.png"> <br> <center> Figure 6.2 Percentage of OpenRank by Project Type </center> <br> Let's take this a step further and look at these categories through the lens of OpenRank influence: - The most significant change is that content resource type (Non-Software) projects have relatively low impact, although they have high activity; - System Software, on the other hand, has a small percentage of activity but a relatively large percentage of influence, and a similar phenomenon can be observed with Software Tools projects; - The component framework type and the application software type have not changed much, and both are among the more prevalent types. #### 6.1.3 OpenRank Trends by Project Type in the Last 5 Years <img width="728" alt="6-3" src="/image/data/chapter_6/6-3.png"> <br> <center> Figure 6.3 OpenRank Trends by Project Type in the Last 5 Years </center> <br> As you can see from the five-year OpenRank evolution chart above, the influence of the System Software category is increasing year by year, while the influence of the Non Software category is decreasing. ### 6.2 Project Topic Analysis This section also analyzes the top 10,000 active GitHub repositories and obtains insights from the Topic tags under the repositories. #### 6.2.1 Top Topic <img width="643" alt="6-4" src="/image/data/chapter_6/6-4.png"> Figure 6.4 Top 10 appearances of Topic <br> The top 10 topics cover a diverse range of areas, demonstrating the broad interest of the open-source community. JavaScript, Hacktoberfest, and Python are some of the most popular topics, representing hotspots for cutting-edge technologies, active community activities, and versatile programming languages. These topics highlight the interest in front-end development, open-source contributions, and interdisciplinary programming. #### 6.2.2 Overall OpenRank Trends for Repositories of Popular Topics <center><img width="707" alt="6-5" src="/image/data/chapter_6/6-5.png"></center> Figure 6.5 OpenRank trends for repositories with top 10 Topic occurrences (2019 - 2023) <br> - Hacktoberfest is an annual event that takes place in October. It aims to promote the open-source community and is organized by DigitalOcean in collaboration with GitHub. The goal of the event is to encourage more people to participate in open-source projects and contribute to the community. OpenRank is used to measure people's enthusiasm for open-source projects, community involvement, and contributions. Developers play an active role in the campaign by submitting Pull Requests to open-source projects, thus helping to increase the reputation and influence of the repository. - JavaScript and Python:technologies have maintained relatively stable trends over the past few years, with no significant growth or decline. ### 6.3 Project analysis in databases This section uses information from open-source databases, which are disclosed in the [Database of Databases](https://dbdb.io/) and [DB-Engines Ranking](https://db-engines.com/en/ranking). The field is divided into 18 subcategories based on the storage structure and usage of databases. These subcategories include Relational, Key-value, Document, Search Engine, Wide Column, Time Series, Graph, Vector, Object Oriented, Hierarchical, RDF, Array, Event, Spatial, Native XML, Multivalue, Content, and Network. We then collect and analyze corresponding database information on GitHub. We examine the corresponding open-source projects for each database and gather and analyze their collaboration log data on GitHub. This helps us gain detailed insights into the field. #### 6.3.1 2023 OpenRank and Activity Lists by Subdomain in the Database Domain **1, OpenRank Rankings for Database Subdomains** Table 6.1 OpenRank Rankings for Database Subdomains | Ranking | Subfield Name | OpenRank | | :-----: | :-------------: | :------: | | 1 | Relational | 58092.36 | | 2 | Key-value | 21834.08 | | 3 | Document | 17264.93 | | 4 | Search Engine | 8093.77 | | 5 | Wide Column | 7896.43 | | 6 | Time Series | 7813.54 | | 7 | Graph | 5196.52 | | 8 | Vector | 4965.41 | | 9 | Object Oriented | 3104.07 | | 10 | Hierarchical | 1355.4 | | 11 | RDF | 592.68 | | 12 | Array | 383.95 | | 13 | Event | 256.59 | | 14 | Spatial | 224.05 | | 15 | Native XML | 209.51 | | 16 | Multivalue | 15.89 | | 17 | Content | 3.43 | **2, Activity Rankings for Database Subdomains** Table 6.2 Activity Rankings for Database Subdomains | Ranking | Subfield Name | Activity | | :-----: | :-------------: | :-------: | | 1 | Relational | 161025.44 | | 2 | Key-value | 62501.64 | | 3 | Document | 49400.11 | | 4 | Search Engine | 23799.87 | | 5 | Time Series | 22077.57 | | 6 | Wide Column | 21292.17 | | 7 | Vector | 16395.88 | | 8 | Graph | 14947.43 | | 9 | Object Oriented | 8418.14 | | 10 | Hierarchical | 3406.55 | | 11 | RDF | 1701.67 | | 12 | Array | 1280.14 | | 13 | Native XML | 737.94 | | 14 | Spatial | 680.79 | | 15 | Event | 654.42 | | 16 | Content | 33.94 | | 17 | Multivalue | 12.68 | The OpenRank and activity rankings for 2023 for each sub-domain of the database domain show that: - Relational, key-value, and document databases are the top three subdomains, accounting for over 70% of the database domain; - Relational's two indicators exceeded those of the second through fifth-place finishers combined and accounted for more than 40 percent of the database field, making it a mega-subcategory. #### 6.3.2 Trends over the last five years in projects under the various subfields of the database area ![6-6](/image/data/chapter_6/6-6.png) Figure 6.6 Trends in OpenRank by Subdomain in Database Domain (2019 - 2023) ![6-7](/image/data/chapter_6/6-7.png) Figure 6.7 Trends in Activity by Subdomain in Database Domain (2019 - 2023) The trend of OpenRank and the trend of activity of projects in each subdomain of the database domain over the past five years shows that: - Over the past five years, Relational, Key-value, and Document have consistently ranked in the top three in both indicators; - Search Engine, Wide Column, Time Series, Graph, Vector, and Object Oriented ranked fourth through ninth, with both indicators trending upward; - Search Engine and Vector subcategories have shown a fast growth rate. Search Engines have jumped two positions to become the fourth largest subcategory. Vector is still competing with the Graph subcategory and has the potential to improve its OpenRank. The influence created by the large model has not yet subsided, and it is predicted that Vector will overtake Graph by 2024. #### 6.3.3 Open source quadrant map of projects under each sub-domain of the database domain There are three metrics involved in the Open Source Quadrant diagram: Activity, Openrank, and CommunityVolume. CommunityVolume is the same formula as the Attention metric in open-digger, i.e. a weighted sum of the number of stars and the number of forks of the target project in a given period of time:`sum(1*star+2*fork)`. Quadrant plotting methods: 1. Select the Top 10 projects by activity for each database subcategory; 2. Make a `log(x)-log(y)` scatterplot of `log(openrank)-log(communityvolume)`, the base of the log is 2, denote the number of half-lives required for the spatial influence openrank and the temporal influence communityvolume to decay to 1, respectively. 3. The vertical line corresponding to the mean value of the horizontal coordinates of all points on the graph is used as the vertical axis, and the horizontal line corresponding to the mean value of the vertical coordinates of all points on the graph is used as the horizontal axis to divide into four quadrants. There are a total of 18 subcategory labels in the database domain, and the top 9 categories that account for more than 1% of activity in 2023 were selected for statistical analysis to map the open source quadrant as follows: <center> <img src="/image/data/chapter_6/6-12.png" width="600px"> </center> <center>Figure 6.8 Relational Database OpenRank-CommmunityVolume log-log Open Source Quadrant Map</center><br /> <center> <img src="/image/data/chapter_6/6-13.png" width="600px"> </center> <center>Figure 6.9 Key-Value Database OpenRank-CommmunityVolume log-log Open Source Quadrant Map</center><br /> <center> <img src="/image/data/chapter_6/6-14.png" width="600px"> </center> <center>Figure 6.10 Document-based databases OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center><br /> <center> <img src="/image/data/chapter_6/6-15.png" width="600px"> </center> <center>Figure 6.11 Search Engine OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center><br /> <center> <img src="/image/data/chapter_6/6-16.png" width="600px"> </center> <center>Figure 6.12 Time series database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center><br /> <center> <img src="/image/data/chapter_6/6-17.png" width="600px"> </center> <center>Figure 6.13 wide column database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center><br /> <center> <img src="/image/data/chapter_6/6-18.png" width="600px"> </center> <center>Figure 6.14 Vector database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center><br /> <center> <img src="/image/data/chapter_6/6-19.png" width="600px"> </center> <center>Figure 6.15 Graph database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center> <br /> <center> <img src="/image/data/chapter_6/6-20.png" width="600px"> </center> <center>Figure 6.16 object-oriented database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center> <br /> <center> <img src="/image/data/chapter_6/6-21.png" width="600px"> </center> <center>Figure 6.17 Top 9 Subcategory Databases by Activity OpenRank-CommmunityVolume log-log Open Source Quadrant Chart</center> <br /> The search engine category is highly polarized, with projects like ElasticSearch with high OpenRank and CommmunityVolume, and projects like Sphinx and Xapian with very low OpenRank and CommmunityVolume. From the first quadrant: relational, document, search engine, and vector are all database types with strong openrank influence and CommmunityVolume focus, while object_oriented is relatively weak in both areas. The Open Source Quadrant plot shows the vertical distribution of the Top 9 subclasses of databases in terms of activity. Among these subclasses, two stand out - search engine and vector. These two subclasses have a higher community volume than OpenRank, which means they have more active contributors. They also have a higher community voice, meaning their opinions and feedback are more valued. Additionally, they are known for faster development expectations compared to the other subclasses. ### 6.4 Project Analysis of Generative AI Area This section will examine the open-source projects related to generative AI, using the [Generative AI Open Source (GenOS) Index](https://www.decibel.vc/articles/launching-the-generative-ai-open-source-genos-index) as a reference point. We will classify these projects into four subcategories: tools, models, applications, and infrastructure. The detailed insights are outlined below: #### 6.4.1 Growth trends in subfields of generative AI over the past five years <img width="712" alt="6-8" src="/image/data/chapter_6/6-8.png"> <center> Figure 6.18 OpenRank Trends in Generative AI by Subdomain, 2019 - 2023 </center> <br> <img width="722" alt="6-9" src="/image/data/chapter_6/6-9.png"> <center> Figure 6.19 Activity Trends in Generative AI by Subdomain, 2019 - 2023 </center> <br> - Categorization analysis of activity and influence across models, tools, apps, and infrastructure reveals consistent trends; - AIGC open source projects in the modeling category are more influential and active than those in the tools and applications categories; - The modeling category has grown rapidly since 2022 and surpassed Infrastructure in 2023. AIGC's innovative application development had a significant breakthrough in 2023, leading to concurrent application growth. #### 6.4.2 Trends in OpenRank and Activity Top 10 for Projects in the Generative AI Domain <img width="716" alt="6-10" src="/image/data/chapter_6/6-10.png"> <center> Figure 6.20 5-Year Trend of OpenRank Top 10 Projects in Generative AI </center> <br> <img width="699" alt="6-11" src="/image/data/chapter_6/6-11.png"> <center> Figure 6.21 5-Year Trend of the Top 10 Active Projects in Generative AI </center> <br> - langchain is ranked #1 in terms of influence and activity and is highly regarded by developers; - transformers has been the reigning champion in the AIGC field for the past few years, and its position is expected to remain unchallenged until 2023. This project has significantly impacted both the academic and open-source communities, showcasing its groundbreaking capabilities; - stable-diffusion-webui is an AIGC tool that has gained a lot of attention from developers. It has surpassed "Transformers" in terms of activity and is likely to surpass it in terms of influence by 2024; - Since being open-sourced in 2023, several AIGC projects have gained significant influence and activity, placing them on the Top 10 list. This highlights the rapid pace of change in the field of AIGC. #### 6.4.3 Top 10 List of OpenRank and Activity of Projects in Generative AI in 2023 **1. List of OpenRank Top 10 Projects in Generative AI** <center> Table 6.3 OpenRank Rankings in Generative AI </center> <br> | Ranking | Project Name | OpenRank | | :-----: | :----------------------------------------: | :------: | | 1 | langchain-ai/langchain | 6080.25 | | 2 | huggingface/transformers | 4422.84 | | 3 | AUTOMATIC1111/stable-diffusion-webui | 3881.6 | | 4 | Significant-Gravitas/AutoGPT | 2664.85 | | 5 | ggerganov/llama.cpp | 2339.8 | | 6 | oobabooga/text-generation-webui | 2242.5 | | 7 | milvus-io/milus | 2001.11 | | 8 | run-llama/llama_index | 1913.01 | | 9 | facebookincubator/velox | 1589.53 | | 10 | invoke-ai/InvokeAI | 1571.45 | **2. List of Top 10 Active Projects in Generative AI** <center> Table 6.4 Activity Rankings in Generative AI </center> <br> | Ranking | Project Name | Activity | | :-----: | :----------------------------------------: | :------: | | 1 | langchain-ai/langchain | 22563.04 | | 2 | AUTOMATIC1111/stable-diffusion-webui | 13933.03 | | 3 | huggingface/transformers | 13618.11 | | 4 | Significant-Gravitas/AutoGPT | 10961.81 | | 5 | cobabooga/text-generation-webui | 8597.33 | | 6 | ggerganov/llama.cpp | 8108.62 | | 7 | run-llama/llama_index | 7532.47 | | 8 | milvus-io/milus | 6488.35 | | 9 | facebookincubator/velox | 4923.05 | | 10 | Chatchat-space/Langchain-Chatchat | 4477.63 | ## 7. Developer Insights **Developers** are vital to open-source innovation. They create and supply open-source projects and contribute significantly to them. The total number of developers and their collaboration mechanism impact the amount of contribution. In this section, we will analyze data on individual developers at national and regional levels. ### 7.1 Geographical distribution of developers This analysis, like the one in Section 1.3, is based on 10 million active GitHub developers. Out of the 100 million registered users on GitHub, only 2 million developers have provided accurate geolocation information, which makes up a 2% sample. **1. GitHub Active Developers Distribution Map** The number of active developers on GitHub was first visualized on a map, as shown below. ![7-1.png](/image/data/chapter_7/7-1.png) Figure 7.1 2023 GitHub Active Developers Distribution Map <br> GitHub developers are concentrated in areas with large populations and fast internet development, such as coastal regions of China, Europe, the United States, India, and the southeast coast of Brazil. They are sparsely distributed in other areas with small populations or less developed internet. **2. GitHub Active Developers by Country / Region** ![7-2.png](/image/data/chapter_7/7-2.png) <center>Figure 7.2 GitHub Active Developers by Country / Region </center> <br> Table 7.1 2023 Ranking of Countries/Regions by Number of Active Developers <br> | Ranking | States | Number of active | | :-----: | :------------: | :--------------: | | 1 | United States | 236899 | | 2 | China | 113893 | | 3 | India | 107066 | | 4 | Brazil | 83932 | | 5 | Germany | 64836 | | 6 | United Kingdom | 55175 | | 7 | Canada | 42238 | | 8 | France | 40341 | | 9 | Russia | 31534 | | 10 | Japan | 21942 | The United States has the largest number of developers, followed by China, India and Brazil, while other countries with a certain population and economic level, such as Canada and some European countries, also have a large number of developers on GitHub. **3. Distribution of Active Developers on GitHub in China** The graph below visualizes the distribution of the number of active developers on GitHub on a map. ![7-4.png](/image/data/chapter_7/7-4.png) <center>Figure 7.3 2023 Distribution of Active Developers in China </center> <br> Table 7.2 2023 Regional Ranking of Active Developers in China <br> | Ranking | Regions | Quantity | | :-----: | :-------: | :------: | | 1 | Beijing | 24151 | | 2 | Sengah | 18215 | | 3 | Guangdong | 16153 | | 4 | Zhejiang | 10927 | | 5 | Taiwan | 8823 | | 6 | Jiangsu | 5437 | | 7 | Chechen | 5311 | | 8 | Hong Kong | 3344 | | 9 | Hubei | 3273 | | 10 | Shaanxi | 1993 | Beijing is found to have the most GitHub users in China, followed by Shanghai, Guangzhou, and Zhejiang. Most of China's active GitHub users are in the eastern coastal regions, while some central provinces such as Shaanxi, Hunan, and Hubei also have a lot of active users, and it's worth noting that Sichuan has the most active GitHub users outside of the coastal regions. **4. GitHub China Developer Influence Distribution after OpenRank Weighting** Trying to do the aggregation with the OpenRank value of the developers in each region, we get the influence distribution map and regional ranking of Chinese developers, as shown in the following graph. ![7-3.png](/image/data/chapter_7/7-3.png) <center> Figure 7.4 OpenRank influence distribution of Chinese developers </center> <br> Table 7.3 OpenRank Influence Ranking in China <br> | Ranking | Regions | OpenRank | | :-----: | :-------: | :-------: | | 1 | Beijing | 506624.08 | | 2 | Sengah | 435804.42 | | 3 | Guangdong | 306014.24 | | 4 | Zhejiang | 274284.92 | | 5 | Taiwan | 216991.49 | | 6 | Chechen | 96881.79 | | 7 | Jiangsu | 83321.13 | | 8 | Hong Kong | 83238.46 | | 9 | Hubei | 51370.74 | | 10 | Fujian | 33482.25 | As you can see from the rankings, the OpenRank regional rankings are highly consistent with the regional rankings for the number of active developers: - There are significant regional differences in terms of the influence of Chinese developers. Developers from Beijing and Shanghai dominate the first class, while developers from Guangdong, Zhejiang, and Taiwan fall into the second class. These regions have a different level of influence compared to those ranked lower; - The overall number of active people in Sichuan is smaller than in Jiangsu, but the overall influence is greater, and the same phenomenon occurs in Fujian and Shaanxi. ### 7.2 Developer Working Hours Analysis This section analyzes the working hours of GitHub and Gitee developers. By default, the time is in the UTC zone, with an 8-hour lag compared to the East Eighth Time Zone, i.e., Beijing Standard Time. The data is scaled to the [1-10] range by default using the min-max method, with larger dots representing higher values in the time zone graph. #### 7.2.1 Distribution of working hours of global developers **Distribution of working hours of GitHub-wide developers** According to statistics on developers' working hours across GitHub, the majority of developers work between 6 and 21 hours. There is a higher concentration of developers working at 12 o'clock, likely due to timed tasks. Weekends (Saturdays and Sundays) are relatively inactive. ![7-5.png](/image/data/chapter_7/7-5.png) <center>Figure 7.5 GitHub-wide developer working hours in 2023</center> <br> **Distribution of working hours of Gitee-wide developers** ![7-6.png](/image/data/chapter_7/7-6.png) <center>Figure 7.6 Gitee-wide developer working hours in 2023</center> <br> The Gitee data clearly aligns more with the East Eighth Time Zone's work time routine. **Global developer working hours distribution, excluding bots** ![7-7.png](/image/data/chapter_7/7-7.png) <center>Figure 7.7 2023 Global Developers' Working Hours, Excluding Robots</center> <br> RAfter removing the bot data, it is found that the time distribution of developers is more prevalent in the interval of 6:00 - 21:00, which is more evenly distributed. #### 7.2.2 Distribution of working hours on the project Below is a comparison of the working hours distribution of the top four Chinese OpenRank repositories and the top four global OpenRank GitHub repositories in 2023. Distribution of working hours on the top four OpenRank projects in the global GitHub repository 1. NixOS/Nixpkg ![7-8.png](/image/data/chapter_7/7-8.png) <center>Figure 7.8 NixOS/nixpgs Working Hours in 2023</center> <br> 2. Home-assistanceant/core ![7-9.png](/image/data/chapter_7/7-9.png) <center>Figure 7.9 home-assistant/core Working Hours in 2023</center> <br> 3. microsoft/vscode ![7-10.png](/image/data/chapter_7/7-10.png) <center>Figure 7.10 Microsoft/vscode Working Hours in 2023</center> <br> 4. MicrosoftDocs/azure-docs ![7-11.png](/image/data/chapter_7/7-11.png) <center>Figure 7.11 MicrosoftDocs/azure-docs Working Hours in 2023</center> <br> **Distribution of working hours of the top 4 OpenRank repositories in China** 1. OpenHarmony ![7-12.png](/image/data/chapter_7/7-12.png) <center>Figure 7.12 OpenHarmony Working Hours in 2023</center> <br> 2. openEuler ![7-13.png](/image/data/chapter_7/7-13.png) <center>Figure 7.13 openEuler Working Hours in 2023</center> <br> 3. PaddlePaddle ![7-14.png](/image/data/chapter_7/7-14.png) <center>Figure 7.14 PaddlePaddle Working Hours in 2023</center> <br> 4. MindSpore ![7-15.png](/image/data/chapter_7/7-15.png) <center>Figure 7.15 MindSpore Working Hours in 2023</center> <br> ### 7.3 Developer Role Analysis This section categorizes GitHub users into four roles: **Explorer**, **Participant**, **Contributor**, and **Committer**, based on events they trigger in open-source repositories. The four roles are defined in the table below. <center> Table 7.5 Four Roles of Developer </center> <br> | Roles | Definitions | Meaning | | -------------------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------------- | | Explorer | Users who star a project | Indicates the user has some interest in the project | | Participants | Users who have made an Issue or Comment on a project | Indicates user participation in the project | | Contributor | Users with Pull Requests (PRs) for a project | Indicates that the user has contributed to the project's code base | | Commiter | Users participating in PR-review or merge | Indicates that the user has contributed deeply to the project | The figure below shows the four cascaded and structured roles. Using the defined role structure, we evaluate the top 10 projects in the OpenRank rankings of GitHub-wide projects from three perspectives: number of roles, time change, and developer role evolution. This is based on the project ranking list in Part II. ![7-16.png](/image/data/chapter_7/7-16.png) <center>Figure 7.16 Developer Roles and Relationships</center> <br> #### 7.3.1 Distribution of roles <center> Table 7.6 Distribution of the number of developer roles for the top 10 projects in the OpenRank rankings </center> <br> | Repository name | Explorer | Participant | Contributor | Committer | | ---------------------------------------- | -------- | ------------ | ------------ | --------- | | NixOS/Nixpkg | 6244 | 3381 | 3074 | 2638 | | Home-assistanceant/core | 17777 | 9116 | 1230 | 905 | | microsoft/vscode | 20113 | 16027 | 525 | 339 | | MicrosoftDocs/azure-docs | 8939 | 2282 | 1591 | 610 | | pytorch/pytorch | 13237 | 6391 | 1230 | 685 | | godotenine/godot | 23426 | 7203 | 1020 | 569 | | flutter/futter | 14056 | 11101 | 637 | 334 | | odooo/odoo | 5078 | 1841 | 930 | 570 | | digitalinnovationone/dio-lab-open-source | 3619 | 907 | 504 | 40 | | microsoft/winget-pkgs | 1852 | 1395 | 1384 | 286 | <br> ![7-17.png](/image/data/chapter_7/7-17.png) <center> Figure 7.17 Developer Role Distribution Map </center> <br> Spring: - Based on the number of explorers, the three most popular projects are godotengine/godot, microsoft/vscode, and home-assistant/core, suggesting they have received widespread attention and support; - microsoft/vscode is the project with the largest gap between the number of participants and contributors, while microsoft/winget-pkgs has the smallest gap between the two; - NixOS/nixpkgs has the highest number of committers at 2,638 compared to other projects. In contrast, the digitalinnovationone/dio-lab-open-source project has the lowest number of committers. #### 7.3.2 New additions to roles in 2023 Role additions are counted as valid additions to role X if a user who was not in role X (e.g., a contributor or submitter role) before 2023 becomes in that role in 2023. For example, if A submits a PR to Project B in 2021 (but never participates in the Code Review process), and A reviews the PR in Project B in 2023, A is the new committer. The details of the roles added are shown in the graph below and the table below. ![7-18.png](/image/data/chapter_7/7-18.png) <center>Figure 7.18 Map of new roles in the open source community in 2023 </center> <br> <center> Table 7.7 Distribution of the number of new developer roles for the top 10 projects in the OpenRank rankings </center> <br> | Repository name | New Committer | New Contributor | New Participant | New Explorer | | ---------------------------------------- | ------------- | --------------- | --------------- | ------------ | | NixOS/Nixpkg | 1226 | 1622 | 1591 | 3027 | | Home-assistanceant/core | 538 | 808 | 4640 | 8998 | | microsoft/vscode | 263 | 394 | 10216 | 15746 | | MicrosoftDocs/azure-docs | 352 | 1420 | 3913 | 1579 | | pytorch/pytorch | 391 | 802 | 2083 | 13016 | | godotenine/godot | 386 | 708 | 2834 | 22996 | | flutter/futter | 184 | 455 | 3954 | 13579 | | odooo/odoo | 244 | 453 | 472 | 4991 | | digitalinnovationone/dio-lab-open-source | 40 | 3611 | 732 | 504 | | microsoft/winget-pkgs | 231 | 957 | 485 | 1373 | The results showed: - The repository godotengine/godot received the highest number of stars, 22,996, with half added in September 2023 due to game developers seeking open-source alternatives to Unity's new charging strategy. Meanwhile, digitalinnovationone/dio-lab-open-source and Microsoft/winget-pkgs received the fewest new stars, 504 and 1,373, respectively; - The repository with the highest number of new participants was microsoft/vscode with 10,216; digitalinnovationone/dio-lab-open-source had the fewest new Issues with 732; - The repository with the highest number of new contributors was NixOS/nixpkgs with 1,622; - The repository with the highest number of new committers was also NixOS/nixpkgs with 1,226. #### 7.3.3 Perspectives on Developer Evolution The developer evolution process is defined as the number of roles in an open-source community that moves to other roles. This report only measures the number of developers who have moved from one role to a more profound one. For example, a user who participated until 2023 will change from a participant to a contributor in 2023 when they make their first PR. ![7-19.png](/image/data/chapter_7/7-25.png) <center> Figure 7.19 Developer Role Evolution Diagram </center> <br> <center> Table 7.8 Distribution of the number of role conversions for the top 10 OpenRank projects </center> <br> | Repository name | Contributor -> Committer | Participant -> Contributor | Explorer -> Participant | | :--------------------------------------: | :----------------------: | :-----------------------: | :---------------------: | | NixOS/Nixpkg | 254 | 122 | 168 | | Home-assistanceant/core | 70 | 113 | 134 | | microsoft/vscode | 16 | 70 | 287 | | MicrosoftDocs/azure-docs | 129 | 169 | 21 | | pytorch/pytorch | 60 | 53 | 187 | | godotenine/godot | 63 | 131 | 330 | | flutter/futter | 31 | 91 | 419 | | odooo/odoo | 55 | 19 | 32 | | digitalinnovationone/dio-lab-open-source | 0 | 0 | 0 | | microsoft/winget-pkgs | 49 | 11 | 18 | The results showed: - Across communities, we can observe the typical funnel model of an evolutionary path from explorers to participants to contributors and committers. In godotengine/godot, for example, 330 contributors successfully evolved to committers, 131 participants became contributors, while 63 explorers evolved to participants. This trend was also observed in other communities and is consistent with the general evolution of community members from initial exploration to deeper involvement. - In some communities, such as NixOS/nixpkgs, we observed many contributors evolving into committers. In this community, 254 contributors successfully evolved into committers, which may represent a relatively high demand for code review. This may encourage more contributors to become deeply involved in maintenance, which may help improve the quality and stability of the community's code. - In some communities, such as flutter/flutter and godotengine/godot, we observed a relatively high number of successful conversions of explorers into participants. In flutter/flutter, 419 explorers evolved into participants, while in godotengine/godot, 330 explorers turned into participants. - The digitalinnovationone/dio-lab-open-source project has no data since it was created in 2023. ### 7.4 Robot account analysis Robotic (bot) automation is a significant contributor to open-source collaboration platforms. This section analyzes nearly 600 million repository events across 7.7 million open-source repositories and over 1,200 bot accounts for 2023. #### 7.4.1 Analysis of active data of robots <div align="center"> <img src="/image/data/chapter_7/7-21.png" alt="7-21" width="400px"/> <img src="/image/data/chapter_7/7-20.png" alt="7-20" width="300px"/> </div> <center>Figure 7.20 Trend in number of robot events (left) & percentage of robot events in 2023 (right)</center> <br> Analyzing the robotics activity data from 2015 to 2023, some of the observations are as follows: Since 2019, the number of bot events has increased significantly, rising from 4,217,635 to 304,257,084. This surge in bot account activity on GitHub can be attributed to the widespread adoption and advancement of GitHub's automation, continuous integration, and continuous deployment (CI/CD) tools between 2019 and 2021. Despite the small number of bot accounts, each bot serves multiple repositories, demonstrating efficiency and broad reach. #### 7.4.2 Analysis of event types for robots ![7-22.png](/image/data/chapter_7/7-22.png) <center> Figure 7.21 Difference in number and annual growth rate (%) of GitHub event counts (2022 vs 2023)</center> <br> This graph shows the change in the number of GitHub events by type and their growth rate between 2022 and 2023. By comparing the data from these two years, we can gain insight into the trend of bot account usage in the development process: - Dominance of Code Push: PushEvent dominates bot account activity, with a significant rise in volume especially in 2023, suggesting that bot accounts play an important role in code maintenance and updates; - Changes in project creation activity: CreateEvent is very active in 2022, but declines in 2023, which may indicate a decline in bot account activity in creating new projects; - Importance of code review and collaboration: PullRequestEvent and IssueCommentEvent numbers were higher in both years, showing the active participation of bot accounts in code reviews and issue discussions; - Changes in activity types: DeleteEvent decreases in 2023 compared to 2022, while ReleaseEvent increases, reflecting the different focus of robotic accounts in project lifecycle management; - Increase in annotation-related events: CommitCommentEvent and PullRequestReviewCommentEvent increased in 2023, indicating that bot accounts are becoming more active in the code review process with discussions and feedback; - Specific uses of bot accounts: less common event types such as GollumEvent, MemberEvent, PublicEvent, and WatchEvent are relatively low in number, suggesting that bot accounts are primarily used for specific automation tasks and are less involved in social interactions. #### 7.4.3 Distribution of working hours for robot accounts Similar to the developer working hours distribution, we also analyzed the data on the working hours of bot accounts. ![7-23.png](/image/data/chapter_7/7-23.png) <center> Figure 7.22 Distribution of robot account working hours </center> <br> - The working hour distribution of the robot account is mainly centered on 0am to 1am and 12pm to 13pm; - Based on the global developer time zones it can be surmised that most automated processes are more active in the early morning and midday hours; - Robot work active time is less relevant to workdays and non-workdays, most automated collaborative tasks are scheduled, and fewer are related to responding to a contributor's event. #### 7.4.4 GitHub's top list of incidents for collaborative bots ![7-24.png](/image/data/chapter_7/7-24.png) <center>Figure 7.23 2023 GitHub's top list of incidents for collaborative bots</center> <br> ## 8. Case Studies ### 8.1 openEuler Community Case Study In 2023, the OpenDigger community integrated Gitee data for the first time, allowing Gitee projects to participate in OpenRank calculations. The openEuler community surpassed PaddlePaddle in the same year, achieving an OpenRank value of 16,728. This made it the second largest open source community in China, after openHarmony. In 2023, the openEuler community attracted 3,941 developers to collaborate on Issues or PRs, with 1,934 contributors successfully contributing and merging at least one PR to the openEuler community's repository. It's worth noting that the openEuler community started a document bug hunt in early 2023. They also integrated an interactive page contribution mechanism with Gitee on the community's official document website. This feature enables developers to correct any errors they find while reading the documents directly on the official website. With just a single click, they can launch Gitee lightweight pull requests (PRs), without having to jump to the Gitee platform or perform Git operations. The data change from this innovative mechanism is impressive. In 2023, the openeuler/docs repository incorporated 7,764 PRs, 74% of which were submitted directly through the official web page. The launch of this mechanism also significantly increased the average number of active contributors per month (from 30 to 80), and the average number of PRs merged per month (from 116 to 722). One noteworthy project is openeuler/mugen, which is a highly active testing framework project within the openEuler community. In 2023, 138 developers participated in discussions and contributed to the project, with 95 successfully joining PR. The project has the third-highest OpenRank within the openEuler community, after the openeuler/docs documentation repository and the openeuler/kernel kernel repository. This excellent testing framework enables developers to quickly write and test cases to verify the correctness and validity of their contributions, significantly reducing the cost of subsequent contributions. To summarize, the openEuler community has achieved a high OpenRank value thanks to its effective contribution mechanism and testing framework. The community has designed an interactive system that allows for easy documentation contribution with minimal costs. Moreover, contributors can quickly verify the accuracy of their code through a reliable testing framework. These developer experience optimizations are excellent examples for other open-source communities to follow and implement. ### 8.2 List of top repositories contributed by Chinese developers We analyzed how Chinese developers contributed to the top 30 repositories in the OpenRank ranking list for 2023 using data from almost 10 million GitHub developer accounts, including nearly 200,000 from China: ![8-1.png](/image/data/chapter_8/8-1.png) <center>Figure 8.1 Top 30 Contributed Repositories by Chinese Developers on GitHub</center> <br> Most of the projects are represented in the master OpenRank list, the more interesting ones include: - [NixOS/Nixpkgs](https://github.com/NixOS/nixpkgs):It's also a top international project, a package management tool for a new operating system, and while most of the updates are package information updates, it also means that the ecosystem of that operating system itself is thriving. - [Intel-analytics/BigDL](https://github.com/intel-analytics/BigDL):a runtime repository was created to run LLM on the Intel XPU in 2017. However, it became nearly obsolete by the end of 2021. Surprisingly, it made a comeback with the rise of LLM in 2022 and now maintains an active size of around 50 people per month. <center> <img src="/image/data/chapter_8/8-2.png" alt="8-2" width="400px"/> </center> <center>Figure 8.2 BigDL OpenRank Trend Chart </center> <br> > Screenshot above from [HyperCRX](https://github.com/hypertrons/hypertrons-crx) - [siyuan-note/siyuan](https://github.com/siyuan-note/siyuan):Siyuan Notes, a privacy-first domestic open source knowledge management tool, supports bidirectional knowledge block-level references and maintains an active community size of one hundred people per month. Supports subscription commercialisation at a very affordable price. - [baidu/amis](https://github.com/baidu/amis):is an open-source low-code page generation framework developed by Baidu. In recent years, low-code projects have gained immense popularity, such as Ali's open-source LowcodeEngine, Harmony ecosystem family's DevEco Studio, etc. These projects have provided great convenience for developers to rapidly develop applications using low-code. - [Cocos/cocos-engine](https://github.com/cocos/cocos-engine):domestic game engine leader, with the rise of the concept of meta-verse, godot and other game engines become the world's important top open source projects, and domestic game engine cocos/cocos-engine also has excellent performance in China. - [MaaAssistantArknights/MaaAssistantArknights](https://github.com/MaaAssistantArknights/MaaAssistantArknights) This is a fascinating project aimed at automating daily quests for the game Tomorrow's Ark using a script assistant. The automation can be achieved through a mobile phone simulator. The project is community-maintained, open source, free, and supports all desktop platforms. It has received over 10,000 stars and has more than 300 active contributors every month, which is fantastic. ![8-3.png](/image/data/chapter_8/8-3.png) <center>Figure 8.3 MaaAssistantArknights Project Screenshot </center> <br>