GCS architecture meeting minutes

--- tags: gluster, gcs, meeting --- # GCS architecture meeting minutes ## Meeting info - Time: Thursdays at 15:00 UTC (local conversion: `date -d "15:00 UTC"`) - Location: https://bluejeans.com/600091070 ## Proposed topics (future meetings) - Migrating from Heketi/gd1 to GCS - How critical is this? Should we focus on this at all? - Creating and tracking progress for scale targets - CI & automated testing: Where we are; where we're going - CI vs. long-running - Kubernetes vs. OpenShift - Node OS (CentOS, CoreOS, Atomic, etc.) - Component deep-dives: How does that piece work? - Demo of latest release ------ # 2019-03-14 ## Attendees John Strunk, Satish ## Agenda - Need to clean up documentation prior to release - Choose Ansible or kubectl-gluster - Need to put together a checklist for 1.0 release ------ # 2019-03-07 ## Attendees John Strunk, Amar, Poornima, Kotresh ## Agenda - Scale testing update - Problem w/ udev not being mounted in gd2 container - Tests were working on Fedora28, but having issues on CentOS/RHEL. - Didn't use vagrant based model, but used the `kubectl-gluster` tool. - Hostname was causing issues, patches sent to PRs to fix it. May need more insights, changes to use IP instead of hostname. - https://github.com/gluster/gcs/pull/149 - https://github.com/aravindavk/kubectl-gluster/pull/7 - LVM udev issue, is there more data? not able to find much. - Potentially related: disabling dmeventd? https://github.com/gluster/gluster-containers/pull/137 - [Amar]Can GD2 take a directory as input, instead of a device? That way at least we can make loopback a default in GCS ?? ------ # 2019-02-28 ### Attendees - Aravinda - Poornima ### Agenda * Are we good for the build finally? - Issue for deployment support for loopback bricks https://github.com/gluster/gcs/issues/146 - Glusterd2 Dockerfile changes to install rpms from glusterfs-6 * Can we pick glusterfs-6 as base? * Any updates on testing? ------ # 2019-02-21 ## Attendees * Rohan Joseph, Rohan Gupta, Poornima G, Amar Tumballi, Aravinda VK, Kotresh ## Agenda - [GCS 1.0 remaining items](https://waffle.io/gluster/gcs?label=GCS%2F1.0) - Gluster metrics on Grafana dashboard: GCS#134 - Not yet conclusive looking at the discussion on the bug - Failed provisioning RWO: GCS#140 - Need more RAM than 2GB - Lock contention issue. Not completely resolved, but shouldn't be blocked. - Takes 8mins to create 600+ PVs, instead of 6 mins, due to the txn issues. - GD2 pods should cleanly shutdown: GCS#144 - Needs more testing. - Doesn't look like blocker. - Need more clarity on what are the actual issue, currently it is theoratical limit. - Need to work on RWO little more, we need to handle thin-arbiter etc. - [Kotresh] Tested kubectl-gluster project. - Overall good. Raised a issue for the same. - Specially about cleanup. - [Aravinda] some of the things are taken care of in one of the latest patch. - Need to handle retry of disk setup failures. - Most probably should treat 'disk add' errors properly and allow continuity if its a retry. - Other project integration to keep an eye on: - [Restic from heptio](https://github.com/heptio/velero/blob/master/docs/restic.md) - GCS on minikube? What does it take? - Need to check if CSI is supported in minikube at all. (Sometime back it was not 1.13) - Needed some changes to get device add part of glusterd2. - GCS on non-1.13 version of k8s, can we keep 1 container for that version? - Considering we don't use any specific features of CSI from v1.0, should be possible. - Can't say it as 'supported' as CSI itself is not supported. Which should "OK" as we want this container to be for testing. - loopback brick support is out there, should we do anything more? - 1 more change is required in GCS yml, need to take 'directory' like how we used to take 'devices'. - Need GCS to install this like other 2 CSI drivers. - This would be handled by GlusterFS CSI driver, but would need different options in `StorageClass` - Updates from other projects. - glusterfs: All OK - glusterd2: - nothing in the blocker area - gluster-csi-driver: - thin-arbiter support for RWO is pending. - anthill: - one critical PR which is waiting on CI failure needs to get in. (PR#56) - After this, we can work in parallel. - Need a lint ignore to be added in CI. - Expected it to be done by Admins. - gcs: - loopback brick related PRs would be sent. * Round Table - Can we have a demo/session on operators and how it is done ? - Possible, and we can have it recorded? ------ # 2019-02-14 ### Attendees * Kotresh Hiremat, Poornima G, Amar Tumballi, Aravinda VK, ### Updates from individual projects * GlusterD2 - Experimental loopback brick support added - 3 bugs related to RWO. - txn framework timeout issues, no RCA yet - cluster timeout 3seconds, test patch is ready. - Issue with Intelligent Volume Provisioning, stale brick issue. Approach to fix the issue is known. Patch would be posted by tomorrow. - Need to increase the default size of BHV, currently 5GB, make it default >50GB (or half of available size) - Also see if this can be provided as StorageClass option. * Gluster-CSI - thin-arbiter BHV for RWO volumes is still in progress, need to refresh the PR. * gluster-prometheus/gluster-mixins * anthill * GCS - Python2 support should be made available for kubectl-gluster - Must if we want to provide it on top of CentOS, as it is not yet having default python3. * GlusterFS - Branching done for glusterfs-6, csi/gcs can start depending on glusterfs-6 branch for images. - Not sure if nightly is built for the branch. ### Agenda * AI from previous meeting: - Release update email sent : [Link](https://lists.gluster.org/pipermail/gluster-devel/2019-February/055854.html) - Do comment on it if something pending. * List of current blockers: - [List in Waffle](https://waffle.io/gluster/gcs?label=GCS%2F1.0) * Asks from github issues/emails: - Can we use GCS on k8s 1.12.x ? What are the changes we need for the same? - Where is quick-start guide link? Can it be sent as response to release update email? - `gcs/deploy` README would be a good link. * Can we plan for a demo next week? ----- # 2019-02-07 Recording: https://bluejeans.com/s/jlDMF/ ## Attendees: * Amar, John Strunk, Madhu, Shubhendu, Kotresh, Aravinda, Poornima, Shyam, Humble, Sahina, Vishal ## Agenda: - Should we call the next release v1.0? - Are we promising any API compatibility? - Expectation would be that it is completely stable. - Currently using glusterfs master instead of a release branch - ~~General thought is to move ahead w/ 0.6 as the release~~ - Or, do we view this like glusterfs w/ sequential numbering? - **We will make this v1.0-pre** - Actual changes (see below) are expected to be small - What does this mean for payload of 1.0? - When can it be ready and what do we need? - Proposal: must be supportable moving forward: - Upgrade - Maintainable in production - Hold 1.0 for ~1 mo - Based on glusterfs 6 - Further testing - Currently only testing on CentOS, but users will use other OS - Too much investigation required for removing LVM is this release - Version 2.0: cattle-mode as big deliverable? ## Round Table * ----- # 2019-01-31 ## Attendees: Amar, RohanJoseph, Anmol, Poornima, ## Agenda: * gcs v1.0: When is this? - Pending getting images, and a round of end-to-end testing. Post which we are good to continue with tagging. * gluster-prometheus/mixins: good for v1.0 goal - there are not much * anthill operator: - some progress made to agree upon a framework. - * Document is very elaborate. Can we reduce the size of document, to 1-3 steps? ------ # 2019-01-24 Recording: https://bluejeans.com/s/aV1Px/ ## Attendees John Strunk, Amar, Kotresh, Poornima, Rohan Gupta, Satish ## Agenda/notes - GCS 1.0 release status - https://waffle.io/gluster/gcs?label=GCS%2F1.0 - glusterfs - excessive logging fixes now should be in nightly build, so GCS should be fine to consume it. - Ready for 1.0! - GD2 - thin arbiter friendly now (#1494) - Disable shd - any updates? - AI: check with team and get back (Amar) - CSI - Thin arbiter PR: #154 - #102 is not required now. - Follow up on volume permissions: #141 - gluster-prometheus - Follow up on status - mixins - All set? - GCS repo - Content for website? - http://aravindavk.in/glustercs ?? Feedback welcome - We would like blog content: experiences, getting started, etc. - Talk about use cases in addition to just the technology - Stretch clusters, hyperconvergence - AI: Contact individuals responsible for blockers (jstrunk) - Documentation - CSI: Thin arbiter - CSI: Volume permissions - CSI: RWO loopback - Other business # 2019-01-17 Recording: https://bluejeans.com/s/EGVIB/ ## Attendees * John Strunk, AmarTumballi, RTalur, Rohan Gupta, Siddarth Anupkrishna, Rohan Joseph, Shyam, Vijay Bellur ## Agenda/notes - Review of GCS 1.0 blockers - [Waffle.io board](https://waffle.io/gluster/gcs?label=GCS%2F1.0) - glusterfs - 1.0 will depend on master - prefer the log reduction patch in - https://review.gluster.org/22053 - Experimental code is now already taken out. - op-version - need to recheck if we are using anything 'future' op-versions. - future gfapi - not critical for the v1.0 goals. - Fencing? (Not critical) - GD2 - Features: - RWO w/ loopback devices (issue can be closed #1476) - Patch is merged in gd2 - RWX with loopback bricks (not currently tagged for 1.0) - Still under review - Disable shd (will be handled by client-side) - Bugs: - Modified options not showing up (review) - Volume status reporting - Volume stop failed - CSI - Features: - loopback-based RWO volumes (needs to be tagged for 1.0) - (Thin)Arbiter support - Bugs: - PV directory permissions (review) - "UNKNOWN" capabilities (review) - Don't log secrets (review) - Missing version info in container - Disallow ROX volume type - client `-o ro` works, but need to see what we need with ROX. - Can consider `volume set read-only true` - Anthill - No operator deliverables for 1.0 - gluster-prometheus - Features: - Management interface - Split brain count - [@amarts] Prefer not to have it in v1.0, as it may cause some perf penalty on bricks. - And with shd disabled, we have to see how this all works. - Cluster id for metrics - Bugs: - Deleted volume metrics still showing up (137 the fix for 89?) - gluster-mixins - Dashboard update - GCS repo - Deployment guide (#117) - *Any other business?* - Blogs? - Website? # 2019-01-10 Recording: https://bluejeans.com/s/6wgxZ/ ## Attendees Amar, Ankush, Aravinda, Jose, Madhu, Nithya, Rohan G, Amye, JohnStrunk, Anmol, Deepshikha, Atin, Shubhendu, Sidharth A, Rohan CJ, Umanga, Ju, Vijay B, Humble Chirammal ## Agenda/notes - Discussion of outstanding blockers for GCS 1.0 release - GCS leads - Goal: RWO & RWX data path usable - Tag: GCS/1.0-Blocker (across all repos) - GD2 - Can trim down current list of blockers - Brick-mux - Loop-based bricks - (non-blocker) request rate throttling - What scale do we need to achieve for 1.0? - etcd issues - "completed" state problem is resolved - Lack of persistent storage from etcd-operator still an issue - Will still work on lowering request size - CSI-driver - Driver updated to kube 1.13 and 1.0.0.pre.0 release made available last week - https://github.com/gluster/gluster-csi-driver/releases/tag/1.0.0-pre.0 - RWX - In good state currently, no significant blockers - RWO - Current approach is to use client loopback instead of iSCSI stack - gluster-block approach delayed for now - Still WIP. Discussion on where block creation will happen. - GD2 patch in progress for block creation: https://github.com/gluster/glusterd2/pull/1439 - Private branch has been in testing. Test data to be shared soon. - 60+ PVs/min create - \>3k PVs created - Next week should be available publicly - E2E tests: - Not on priority for CSI 1.0.0 or GCS 1.0.0 release. - anthill - Not blocking 1.0 release; will continue to use Ansible - gluster-prometheus - CPU/memory utilization exporters disabled due to resource usage - Want to include cluster-level metrics - gluster-mixins - Still outstanding PR in GCS repo - Basic dashboards pending, but in good shape - Pending alerts: Will review candidate list and decide on ones to prioritize for 1.0 - glusterfs - Nothing major pending - FOSDEM - Feb 2-3 - What can we show? - Provide a demo of GCS 1.0! - Atin will coordinate w/ Kaushal - Deploying: Need a deployment guide... Blocker for 1.0. - Publicizing on gluster.org # 2019-01-03 Recording: https://bluejeans.com/s/6_A7@ ## Attendees Ankush, Ju, Jeff, John, Anmol, Deepshikha, Humble, Madhu, Nishanth, Sahina, Shubhendu, Sidharth, Umanga, Rohan G, Rohan CJ ## Agenda/notes - Deploying gluster-mixins - Ankush - Repo: https://github.com/gluster/gluster-mixins/ - Provides alerts and custom Grafana dashboards - operator uses libsonnet to call mixin files - jsonnet bundlers hold some files which will pull in the underlying mixins (e.g. etcd mixin, gluster mixin), then it pulls all resources from github and compiles them. See https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus for more info. - Cluster monitoring operator example: https://github.com/openshift/cluster-monitoring-operator/blob/master/pkg/manifests/manifests.go - proposal: put libsonnet files into anthill project (consistent with how native monitoring mixins get done) - Would get pulled in from the mixin repo when the operator container is built - Anthill would then apply the embedded yaml files when it starts - Decoupling gluster-prometheus from gd2? - We depend on more than just the rest endpoint to retrieve metrics, including /proc, etc. This makes breaking dependencies difficult. - Service name "gd2-client" is getting exposed in metrics. Need to look into how to get a better name exposed to prometheus - Other business; topics for next week? - Ju suggested a demo of the latest GCS release

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.