# ScopChat Loki
## Summary
- [ ] **Provide a high level summary of this new product feature. Explain how this change will benefit ScopServ customers. Enumerate the customer use-cases.**
- [ ] **What metrics, including business metrics, should be monitored to ensure will this feature launch will be a success?**
## Architecture
- [ ] **Add architecture diagrams to this issue of feature components and how they interact with existing ScopServ components. Make sure to include the following: Internal dependencies, ports, encryption, protocols, security policies, etc.**
- [ ] **Describe each component of the new feature and enumerate what it does to support customer use cases.**
- [ ] **For each component and dependency, what is the blast radius of failures? Is there anything in the feature design that will reduce this risk?**
- [ ] **If applicable, explain how this new feature will scale and any potential single points of failure in the design.**
### Deployment
## Operational Risk Assessment
- [ ] **What are the potential scalability or performance issues that may result with this change?**
- [ ] **List the external and internal dependencies to the application (ex: redis, postgres, etc) for this feature and how the service will be impacted by a failure of that dependency.**
- [ ] **Were there any features cut or compromises made to make the feature launch?**
- [ ] **List the top three operational risks when this feature goes live.**
- [ ] **What are a few operational concerns that will not be present at launch, but may be a concern later?**
- [ ] **Can the new product feature be safely rolled back once it is live, can it be disabled using a feature flag?**
- [ ] **Document every way the customer will interact with this new feature and how customers will be impacted by a failure of each interaction.**
- [ ] **As a thought experiment, think of worst-case failure scenarios for this product feature, how can the blast-radius of the failure be isolated?**
## Database
- [ ] **If we use a database, is the data structure verified and vetted by the database team?**
- [ ] **Do we have an approximate growth rate of the stored data (for capacity planning)?**
- [ ] **Can we age data and delete data of a certain age?**
## Security and Compliance
- **Are we adding any new resources of the following type? (If yes, please list them here or link to a place where they are listed)**
- [ ] **AWS Accounts/GCP Projects**
- [ ] **New Subnets**
- [ ] **VPC/Network Peering**
- [ ] **DNS names**
- [ ] **Entry-points exposed to the internet (Public IPs, Load-Balancers, Buckets, etc...)**
- [ ] **Other (anything relevant that might be worth mention)**
- **Secure Software Development Life Cycle (SSDLC)**
- [ ] **Is the configuration following a security standard? (CIS is a good baseline for example)**
- [ ] **All cloud infrastructure resources are labeled according to the [Infrastructure Labels and Tags](https://about.ScopServ.com/handbook/infrastructure-standards/labels-tags/) guidelines**
- [ ] **Were the [ScopServ security development guidelines](https://docs.ScopServ.com/ee/development/secure_coding_guidelines.html) followed for this feature?**
- [ ] **Do we have an automatic procedure to update the infrastructure (OS, container images, packages, etc...)**
- [ ] **Do we use IaC (Terraform) for all the infrastructure related to this feature? If not, what kind of resources are not covered?**
- [ ] **Do we have secure static code analysis tools ([kics](https://github.com/Checkmarx/kics) or [checkov](https://github.com/bridgecrewio/checkov)) covering this feature's terraform?**
- **If there's a new terraform state:**
- [ ] **Where is to terraform state stored, and who has access to it?**
- [ ] **Does this feature add secrets to the terraform state? If yes, can they be stored in a secrets manager?**
- **If we're creating new containers:**
- [ ] **Are we using a distroless base image?**
- **Do we have security scanners covering these containers?**
- [ ] **`kics` or `checkov` for Dockerfiles for example**
- [ ] **[ScopServ's container](https://docs.ScopServ.com/ee/user/application_security/container_scanning/#configuration) scanner for vulnerabilities**
- **Identity and Access Management**
- [ ] Are we adding any new forms of **Authentication** (New service-accounts, users/password for storage, OIDC, etc...)?
- [ ] **Does it follow the least privilege principle?**
- **If we are adding any new Data Storage (Databases, buckets, etc...)**
- [ ] **What kind of data is stored on each system? (secrets, customer data, audit, etc...)**
- [ ] **How is data rated according to our [data classification standard](https://about.ScopServ.com/handbook/engineering/security/data-classification-standard.html) (customer data is RED)**
- [ ] **Is data it encrypted at rest? (If the storage is provided by a GCP service, the answer is most likely yes)**
- [ ] **Do we have audit logs on data access?**
- **Network security (encryption and ports should be clear in the architecture diagram above)**
- [ ] **Firewalls follow the least privilege principle (w/ network policies in Kubernetes or firewalls on cloud provider)**
- [ ] **Is the service covered by any DDoS protection solution (GCP/AWS load-balancers or Cloudflare usually cover this)**
- [ ] **Is the service covered by a WAF (Web Application Firewall)**
- **Logging & Audit**
- [ ] **Has effort been made to obscure or elide sensitive customer data in logging?**
- [ ] **Ensure we are keeping required access and audit logs for compliance, and only what is necessary.**
- **Compliance**
- [ ] **Ensure appropriate logs are being kept for complaince and requirements for retention are met.**
- [ ] **Is the service subject to any regulatory/compliance standards? If so, detail which and provide details on applicable controls, management processes, additional monitoring, and mitigating factors.**
- [ ] If the data classification = Red for the new environment, please create a [Security Compliance Intake issue](https://ScopServ.com/ScopServ-com/gl-security/security-assurance/security-compliance-commercial-and-dedicated/security-compliance-intake/-/issues/new?issue[title]=System%20Intake:%20%5BSystem%20Name%20FY2%23%20Q%23%5D&issuable_template=intakeform).
## Performance
- [ ] **Explain what validation was done following ScopServ's [performance guidelines](https://docs.ScopServ.com/ee/development/performance.html). Please explain which tools were used and link to the results below.**
- [ ] **Are there any potential performance impacts on the database when this feature is enabled at ScopServ.com scale?**
- [ ] **Are there any throttling limits imposed by this feature? If so how are they managed?**
- [ ] **If there are throttling limits, what is the customer experience of hitting a limit?**
- [ ] **For all dependencies external and internal to the application, are there retry and back-off strategies for them?**
- [ ] **Does the feature account for brief spikes in traffic, at least 2x above the expected TPS?**
## Backup and Restore
- [ ] **Outside of existing backups, are there any other customer data that needs to be backed up for this product feature?**
- [ ] **Are backups monitored?**
- [ ] **Was a restore from backup tested?**
## Monitoring and Alerts
- [ ] **Is the service logging in JSON format and are logs forwarded to logstash?**
- [ ] **Is the service reporting metrics to Prometheus?**
- [ ] **How is the end-to-end customer experience measured?**
- [ ] **Do we have a target SLA in place for this service?**
- [ ] **Do we know what the indicators (SLI) are that map to the target SLA?**
- [ ] **Do we have alerts that are triggered when the SLI's (and thus the SLA) are not met?**
- [ ] **Do we have troubleshooting runbooks linked to these alerts?**
- [ ] **What are the thresholds for tweeting or issuing an official customer notification for an outage related to this feature?**
- [ ] **do the oncall rotations responsible for this service have access to this service?**
## Responsibility
- [ ] **Which individuals are the subject matter experts and know the most about this feature?**
- [ ] **Which team or set of individuals will take responsibility for the reliability of the feature once it is in production?**
- [ ] **Is someone from the team who built the feature on call for the launch? If not, why not?**
## Testing
- [ ] **Describe the load test plan used for this feature. What breaking points were validated?**
- [ ] **For the component failures that were theorized for this feature, were they tested? If so include the results of these failure tests.**
- [ ] **Give a brief overview of what tests are run automatically in ScopServ's CI/CD pipeline for this feature?**
a