owned this note
owned this note
Published
Linked with GitHub
---
tags: gluster, gcs, meeting
---
# GCS architecture meeting minutes
## Meeting info
- Time: Thursdays at 15:00 UTC (local conversion: `date -d "15:00 UTC"`)
- Location: https://bluejeans.com/600091070
## Proposed topics (future meetings)
- Migrating from Heketi/gd1 to GCS
- How critical is this? Should we focus on this at all?
- Creating and tracking progress for scale targets
- CI & automated testing: Where we are; where we're going
- CI vs. long-running
- Kubernetes vs. OpenShift
- Node OS (CentOS, CoreOS, Atomic, etc.)
- Component deep-dives: How does that piece work?
- Demo of latest release
------
# 2019-03-14
## Attendees
John Strunk, Satish
## Agenda
- Need to clean up documentation prior to release
- Choose Ansible or kubectl-gluster
- Need to put together a checklist for 1.0 release
------
# 2019-03-07
## Attendees
John Strunk, Amar, Poornima, Kotresh
## Agenda
- Scale testing update
- Problem w/ udev not being mounted in gd2 container
- Tests were working on Fedora28, but having issues on CentOS/RHEL.
- Didn't use vagrant based model, but used the `kubectl-gluster` tool.
- Hostname was causing issues, patches sent to PRs to fix it. May need more insights, changes to use IP instead of hostname.
- https://github.com/gluster/gcs/pull/149
- https://github.com/aravindavk/kubectl-gluster/pull/7
- LVM udev issue, is there more data? not able to find much.
- Potentially related: disabling dmeventd? https://github.com/gluster/gluster-containers/pull/137
- [Amar]Can GD2 take a directory as input, instead of a device? That way at least we can make loopback a default in GCS ??
------
# 2019-02-28
### Attendees
- Aravinda
- Poornima
### Agenda
* Are we good for the build finally?
- Issue for deployment support for loopback bricks https://github.com/gluster/gcs/issues/146
- Glusterd2 Dockerfile changes to install rpms from glusterfs-6
* Can we pick glusterfs-6 as base?
* Any updates on testing?
------
# 2019-02-21
## Attendees
* Rohan Joseph, Rohan Gupta, Poornima G, Amar Tumballi, Aravinda VK, Kotresh
## Agenda
- [GCS 1.0 remaining items](https://waffle.io/gluster/gcs?label=GCS%2F1.0)
- Gluster metrics on Grafana dashboard: GCS#134
- Not yet conclusive looking at the discussion on the bug
- Failed provisioning RWO: GCS#140
- Need more RAM than 2GB
- Lock contention issue. Not completely resolved, but shouldn't be blocked.
- Takes 8mins to create 600+ PVs, instead of 6 mins, due to the txn issues.
- GD2 pods should cleanly shutdown: GCS#144
- Needs more testing.
- Doesn't look like blocker.
- Need more clarity on what are the actual issue, currently it is theoratical limit.
- Need to work on RWO little more, we need to handle thin-arbiter etc.
- [Kotresh] Tested kubectl-gluster project.
- Overall good. Raised a issue for the same.
- Specially about cleanup.
- [Aravinda] some of the things are taken care of in one of the latest patch.
- Need to handle retry of disk setup failures.
- Most probably should treat 'disk add' errors properly and allow continuity if its a retry.
- Other project integration to keep an eye on:
- [Restic from heptio](https://github.com/heptio/velero/blob/master/docs/restic.md)
- GCS on minikube? What does it take?
- Need to check if CSI is supported in minikube at all. (Sometime back it was not 1.13)
- Needed some changes to get device add part of glusterd2.
- GCS on non-1.13 version of k8s, can we keep 1 container for that version?
- Considering we don't use any specific features of CSI from v1.0, should be possible.
- Can't say it as 'supported' as CSI itself is not supported. Which should "OK" as we want this container to be for testing.
- loopback brick support is out there, should we do anything more?
- 1 more change is required in GCS yml, need to take 'directory' like how we used to take 'devices'.
- Need GCS to install this like other 2 CSI drivers.
- This would be handled by GlusterFS CSI driver, but would need different options in `StorageClass`
- Updates from other projects.
- glusterfs: All OK
- glusterd2:
- nothing in the blocker area
- gluster-csi-driver:
- thin-arbiter support for RWO is pending.
- anthill:
- one critical PR which is waiting on CI failure needs to get in. (PR#56)
- After this, we can work in parallel.
- Need a lint ignore to be added in CI.
- Expected it to be done by Admins.
- gcs:
- loopback brick related PRs would be sent.
* Round Table
- Can we have a demo/session on operators and how it is done ?
- Possible, and we can have it recorded?
------
# 2019-02-14
### Attendees
* Kotresh Hiremat, Poornima G, Amar Tumballi, Aravinda VK,
### Updates from individual projects
* GlusterD2
- Experimental loopback brick support added
- 3 bugs related to RWO.
- txn framework timeout issues, no RCA yet
- cluster timeout 3seconds, test patch is ready.
- Issue with Intelligent Volume Provisioning, stale brick issue. Approach to fix the issue is known. Patch would be posted by tomorrow.
- Need to increase the default size of BHV, currently 5GB, make it default >50GB (or half of available size)
- Also see if this can be provided as StorageClass option.
* Gluster-CSI
- thin-arbiter BHV for RWO volumes is still in progress, need to refresh the PR.
* gluster-prometheus/gluster-mixins
* anthill
* GCS
- Python2 support should be made available for kubectl-gluster
- Must if we want to provide it on top of CentOS, as it is not yet having default python3.
* GlusterFS
- Branching done for glusterfs-6, csi/gcs can start depending on glusterfs-6 branch for images.
- Not sure if nightly is built for the branch.
### Agenda
* AI from previous meeting:
- Release update email sent : [Link](https://lists.gluster.org/pipermail/gluster-devel/2019-February/055854.html)
- Do comment on it if something pending.
* List of current blockers:
- [List in Waffle](https://waffle.io/gluster/gcs?label=GCS%2F1.0)
* Asks from github issues/emails:
- Can we use GCS on k8s 1.12.x ? What are the changes we need for the same?
- Where is quick-start guide link? Can it be sent as response to release update email?
- `gcs/deploy` README would be a good link.
* Can we plan for a demo next week?
-----
# 2019-02-07
Recording: https://bluejeans.com/s/jlDMF/
## Attendees:
* Amar, John Strunk, Madhu, Shubhendu, Kotresh, Aravinda, Poornima, Shyam, Humble, Sahina, Vishal
## Agenda:
- Should we call the next release v1.0?
- Are we promising any API compatibility?
- Expectation would be that it is completely stable.
- Currently using glusterfs master instead of a release branch
- ~~General thought is to move ahead w/ 0.6 as the release~~
- Or, do we view this like glusterfs w/ sequential numbering?
- **We will make this v1.0-pre**
- Actual changes (see below) are expected to be small
- What does this mean for payload of 1.0?
- When can it be ready and what do we need?
- Proposal: must be supportable moving forward:
- Upgrade
- Maintainable in production
- Hold 1.0 for ~1 mo
- Based on glusterfs 6
- Further testing
- Currently only testing on CentOS, but users will use other OS
- Too much investigation required for removing LVM is this release
- Version 2.0: cattle-mode as big deliverable?
## Round Table
*
-----
# 2019-01-31
## Attendees:
Amar, RohanJoseph, Anmol, Poornima,
## Agenda:
* gcs v1.0: When is this?
- Pending getting images, and a round of end-to-end testing. Post which we are good to continue with tagging.
* gluster-prometheus/mixins: good for v1.0 goal
- there are not much
* anthill operator:
- some progress made to agree upon a framework.
-
* Document is very elaborate. Can we reduce the size of document, to 1-3 steps?
------
# 2019-01-24
Recording: https://bluejeans.com/s/aV1Px/
## Attendees
John Strunk, Amar, Kotresh, Poornima, Rohan Gupta, Satish
## Agenda/notes
- GCS 1.0 release status
- https://waffle.io/gluster/gcs?label=GCS%2F1.0
- glusterfs
- excessive logging fixes now should be in nightly build, so GCS should be fine to consume it.
- Ready for 1.0!
- GD2
- thin arbiter friendly now (#1494)
- Disable shd - any updates?
- AI: check with team and get back (Amar)
- CSI
- Thin arbiter PR: #154
- #102 is not required now.
- Follow up on volume permissions: #141
- gluster-prometheus
- Follow up on status
- mixins
- All set?
- GCS repo
- Content for website?
- http://aravindavk.in/glustercs ?? Feedback welcome
- We would like blog content: experiences, getting started, etc.
- Talk about use cases in addition to just the technology
- Stretch clusters, hyperconvergence
- AI: Contact individuals responsible for blockers (jstrunk)
- Documentation
- CSI: Thin arbiter
- CSI: Volume permissions
- CSI: RWO loopback
- Other business
# 2019-01-17
Recording: https://bluejeans.com/s/EGVIB/
## Attendees
* John Strunk, AmarTumballi, RTalur, Rohan Gupta, Siddarth Anupkrishna, Rohan Joseph, Shyam, Vijay Bellur
## Agenda/notes
- Review of GCS 1.0 blockers - [Waffle.io board](https://waffle.io/gluster/gcs?label=GCS%2F1.0)
- glusterfs
- 1.0 will depend on master
- prefer the log reduction patch in - https://review.gluster.org/22053
- Experimental code is now already taken out.
- op-version
- need to recheck if we are using anything 'future' op-versions.
- future gfapi
- not critical for the v1.0 goals.
- Fencing? (Not critical)
- GD2
- Features:
- RWO w/ loopback devices (issue can be closed #1476)
- Patch is merged in gd2
- RWX with loopback bricks (not currently tagged for 1.0)
- Still under review
- Disable shd (will be handled by client-side)
- Bugs:
- Modified options not showing up (review)
- Volume status reporting
- Volume stop failed
- CSI
- Features:
- loopback-based RWO volumes (needs to be tagged for 1.0)
- (Thin)Arbiter support
- Bugs:
- PV directory permissions (review)
- "UNKNOWN" capabilities (review)
- Don't log secrets (review)
- Missing version info in container
- Disallow ROX volume type
- client `-o ro` works, but need to see what we need with ROX.
- Can consider `volume set read-only true`
- Anthill
- No operator deliverables for 1.0
- gluster-prometheus
- Features:
- Management interface
- Split brain count
- [@amarts] Prefer not to have it in v1.0, as it may cause some perf penalty on bricks.
- And with shd disabled, we have to see how this all works.
- Cluster id for metrics
- Bugs:
- Deleted volume metrics still showing up (137 the fix for 89?)
- gluster-mixins
- Dashboard update
- GCS repo
- Deployment guide (#117)
- *Any other business?*
- Blogs?
- Website?
# 2019-01-10
Recording: https://bluejeans.com/s/6wgxZ/
## Attendees
Amar, Ankush, Aravinda, Jose, Madhu, Nithya, Rohan G, Amye, JohnStrunk, Anmol, Deepshikha, Atin, Shubhendu, Sidharth A, Rohan CJ, Umanga, Ju, Vijay B, Humble Chirammal
## Agenda/notes
- Discussion of outstanding blockers for GCS 1.0 release - GCS leads
- Goal: RWO & RWX data path usable
- Tag: GCS/1.0-Blocker (across all repos)
- GD2
- Can trim down current list of blockers
- Brick-mux
- Loop-based bricks
- (non-blocker) request rate throttling
- What scale do we need to achieve for 1.0?
- etcd issues
- "completed" state problem is resolved
- Lack of persistent storage from etcd-operator still an issue
- Will still work on lowering request size
- CSI-driver
- Driver updated to kube 1.13 and 1.0.0.pre.0 release made available last week
- https://github.com/gluster/gluster-csi-driver/releases/tag/1.0.0-pre.0
- RWX
- In good state currently, no significant blockers
- RWO
- Current approach is to use client loopback instead of iSCSI stack
- gluster-block approach delayed for now
- Still WIP. Discussion on where block creation will happen.
- GD2 patch in progress for block creation: https://github.com/gluster/glusterd2/pull/1439
- Private branch has been in testing. Test data to be shared soon.
- 60+ PVs/min create
- \>3k PVs created
- Next week should be available publicly
- E2E tests:
- Not on priority for CSI 1.0.0 or GCS 1.0.0 release.
- anthill
- Not blocking 1.0 release; will continue to use Ansible
- gluster-prometheus
- CPU/memory utilization exporters disabled due to resource usage
- Want to include cluster-level metrics
- gluster-mixins
- Still outstanding PR in GCS repo
- Basic dashboards pending, but in good shape
- Pending alerts: Will review candidate list and decide on ones to prioritize for 1.0
- glusterfs
- Nothing major pending
- FOSDEM
- Feb 2-3
- What can we show?
- Provide a demo of GCS 1.0!
- Atin will coordinate w/ Kaushal
- Deploying: Need a deployment guide... Blocker for 1.0.
- Publicizing on gluster.org
# 2019-01-03
Recording: https://bluejeans.com/s/6_A7@
## Attendees
Ankush, Ju, Jeff, John, Anmol, Deepshikha, Humble, Madhu, Nishanth, Sahina, Shubhendu, Sidharth, Umanga, Rohan G, Rohan CJ
## Agenda/notes
- Deploying gluster-mixins - Ankush
- Repo: https://github.com/gluster/gluster-mixins/
- Provides alerts and custom Grafana dashboards
- operator uses libsonnet to call mixin files
- jsonnet bundlers hold some files which will pull in the underlying mixins (e.g. etcd mixin, gluster mixin), then it pulls all resources from github and compiles them. See https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus for more info.
- Cluster monitoring operator example: https://github.com/openshift/cluster-monitoring-operator/blob/master/pkg/manifests/manifests.go
- proposal: put libsonnet files into anthill project (consistent with how native monitoring mixins get done)
- Would get pulled in from the mixin repo when the operator container is built
- Anthill would then apply the embedded yaml files when it starts
- Decoupling gluster-prometheus from gd2?
- We depend on more than just the rest endpoint to retrieve metrics, including /proc, etc. This makes breaking dependencies difficult.
- Service name "gd2-client" is getting exposed in metrics. Need to look into how to get a better name exposed to prometheus
- Other business; topics for next week?
- Ju suggested a demo of the latest GCS release