owned this note
owned this note
Published
Linked with GitHub
# Big Open Source Projects
## Book Dash Excerpt
Open source software such as Spark and Kubernetes serve as the foundational building blocks that power most technology companies that exist today. Looking to these open source utilities as an example, important support structures can be identified to pave the path towards lasting, global-scale sustainability.
Like many other open source projects, Spark, Kafka, and Kubernetes began through the work of an interested engineer at organisations like UC Berkeley, LinkedIn, and Google. After an initial incubation phase at the organisation or through the support of a foundation like the Apache Incubator, the project was passed on for maintenance to a larger, non-profit organisation such as the Apache Software Foundation or the Cloud Native Computing Foundation, which was founded by the Linux Foundation and Google to support projects like Kubernetes. This step serves to ensure an open source, vendor-neutral home for the OSS and allows the founders and founding organisation to focus on other activities. In the case of Apache Spark, the founders created the company Databricks to serve as a user-facing web platform for working with Spark. They've also developed other activites around the original OSS such as free, online Spark classes and a Spark conference. Kubernetes also provides an example of this diversification of initiatives built on top of an open source foundation. Currently 3 core implementations exist for use: open source, commerical, and managed. This allows different business models to be designed for different developer use cases for this powerful technology.
Open source foundations like Redhat, Cloud Native Computing Foundation, and Apache Software Foundation serve as a catalyst, funder, and steward of OSS projects. In addition to supporting and maintaining individual projects, they also serve to establish shared resources and culture around OSS. For example, Apache Software Foundation runs in a decentralized, meritocratic manner, with projects maintained by "self-selected technical experts" who are aactive contributors to the community and the promotion of copyleft license for distribution, which allows users to "copy it freely, examine and modify the source code, and redistribute the software to others (free or priced) as long as the redistributed software is also passed along with the copyleft stipulation."
## Research
* Apache Software Foundation
* 1999: Apache Software Foundation is an American nonprofit supporting open source software projects formed from group of developers of the Apache HTTP Server
* Decentralized open source community of developers distributing software under Apache License; copyleft form of FOSS
* Apache projects characterized by collaborative, consensus-based development and an open, pragmatic software license
* Projects maintained by self-selected technical experts who are active contributors; membership to ASF based on active contributions
* ASF is a "second generation open-source org" - commerical support provided without the risk of platform lock-in
* Apache Spark
* Analytics engine for large-scale data processing
* Interface for programming clusters with data paralellism
* 2010 - AMPLab at UC Berkeley by Romanian-Canadian scientist; BSD
* 2013 - donated to Apache Software Foundation for maintenance; Apache 2.0
* 2017 - Databricks company created by Spark founders as web platform for working with Spark that provides services like automated cluster managment & IPython notebooks; first-party service on Azure
* Sell cloud data platform with marketing term "lakehouse" combining "data warehouse" and "data lake"
* Allows analytical queries against semi-structured data without a database schema
* Company has created other open source projects and organised online classes / conference about Spark
* 2021 - Integration with Google Kubernetes
* Apache Kafka
* Distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration
* 2011 - originally developed at LinkedIn and open sourced
* 2012 - graduation from Apache Incubator
* Kubernetes
* Open-source container orchestration system for automating software deployment, scaling, and management
* 2014: Google engineers created, inspired by Google Borg (cluster manager); written in Go
* 2015: Partner with Linux Foundation to form Cloud Native Computing Foundation with Kubernetes as "Seed Technology"
* 2018: Google handed over operational control of Kubernetes to Cloud Native Computing Foundation community to maintain
* open-source vendor-netural hub of cloud native computing/hosting projects
* part of the Linux Foundation
* Commonly used to host microservice-based implementations
* 3 forms: open source, commerical, and managed
* Production-grade container scheduling and management of apps across multiple hosts
* Redhat