# SafeSkein

     
 
**Addressing Security Vulnerabilities in Yarn Package Dependencies**
## Problem Statement
> *`yarn` handles numerous packages and dependencies for projects and with an ever-growing number of packages and their associated transitive dependencies, ensuring the security of these packages has become a complex task.*
Security vulnerabilities in packages pose significant risks to projects, and manually detecting and mitigating these risks is laborious and prone to errors. Developers often need to **traverse complex dependency graphs to find the source of the vulnerability and identify the version changes needed across all dependent packages**. This process can be time-consuming, especially in large projects with numerous dependencies, and **there's a high risk of oversight**.
**The goal of this project:**
- [ ] **SafeSkein DevTool**: Develop an automated tool that accurately identifies vulnerable packages, suggests safe versions or patches *(when no safe version is possible)*, and actions to make the necessary changes quickly and efficiently. *This will aid in mitigating the existing vulnerabilities in our project*
- [ ] **SafeSkein CI/CD**: Add a pipeline job, which breaks the pipeline when any new vulnerability is introduced through the addition of a new vulnerable package (directly or transitively). *This will ensure future stability in our project*
***This project will not only mitigate security risks but also significantly save time and effort for developers, and keep it future proof as well.***
---
<details>
<summary>
<b> Click here for implementation details </b><br/><br/>
</summary>
- [SafeSkein DevTool](#SafeSkein-DevTool)
- [Problem modelling - Identification of safe versions](#Problem-modelling---Identification-of-safe-versions)
- [Third-party APIs for vulnerability database queries
](#Third-party-APIs-for-vulnerability-database-queries)
- [Fetching all versions and dependencies](#Fetching-all-versions-and-dependencies)
- [SafeSkein CI/CD](#SafeSkein-CI/CD)
## SafeSkein DevTool
### Problem modelling - Identification of safe versions
> [color=#2c0093] We will model this problem as an directed graph (tree) $T_g(V,E)$
\
> [color=#2c0093] $v \in V(T_g)$, vertex $v$ represents a npm package $P_v$ with a fixed version and dependency chain, that is, a package coming from two different dependency chain even while having the same version, will be represented as two vertices or to simply put, shared dependencies are not considered.
\
> [color=#2c0093] $safeVersions[v]$ will hold safe versions of the current package which keeps itself and its dependencies safe as well.
\
> [color=#2c0093] $e \in E(T_g)$, $e(x,y)$ is an directed edge $V_x \to V_y$, represents package $P_x$ is dependent on $P_y$ that is importing $P_x$ will bring $P_y$ along with it.

```pseudocode!=
## Pseudo code - Safe version identification
getDependentSafeVersions(v_c, v_p):
// Requires safeVersions[v_c] to be computed before.
// Follow the semantic versioning rules for comparing versions here.
// This essentially requires to check version of v_c in a given version of v_p.
return: the versions of v_p that brings v_c with version lying in safeVersions[v_c]
computeSafeVersions(v, E):
// Post order traversal, recursively compute for children first
for v_c in children(v, E):
computeSafeVersions(v_c, E)
// Initialize with safe versions of P_v itself
// This info will be fetched from National Vulnerability Database (NVD) or Snyk APIs or GitHub Security Advisory API.
// API responses must be cached to avoid redundant calls.
// For `@sprinklr` prefixed packages, assume all versions are safe for initialization
safeVersions[v] := vuln_db_api(P(v))
// Traverse all childrens
for v_c in children(v, E):
safeVersions[v] :=
// Intersection of versions values
intersection(
safeVersions[v],
getDependentSafeVersions(v_c, v)
)
```
### Third-party APIs for vulnerability database queries
- [Snyk API Docs](https://snyk.docs.apiary.io/)
- [GitHub Security Advisory API](https://docs.github.com/en/graphql/reference/objects#securityadvisory)
### Fetching all versions and dependencies
- JSON fetched *(with header containing either `'Authorization': 'Basic username:password'` or authenticaion token)* from a npm registry *(should be configurable for private registry with fallback as: `https://registry.npmjs.org`)* can be parsed for the following information:
- all versions of a package
```json
/* Here's an mutated example of all react versions
parsed from https://registry.npmjs.org/react */
[
...
"15.4.0-rc.1",
"15.4.0-rc.2",
"15.4.0-rc.3",
"15.4.0-rc.4",
"15.4.0",
"15.4.1",
"15.4.2",
"16.0.0-alpha",
"16.0.0-alpha.0",
...
]
```
- all dependencies (with versions) of the package for a specific version
```json
/* Here's an mutated example of react version to dependency map
parsed from https://registry.npmjs.org/react */
{
...
"0.2.2": {
"eventemitter2": "~0.4.1",
"sprintf": "~0.1.1",
"ensure-array": "~0.0.2"
},
"0.2.3": {
"eventemitter2": "~0.5.0",
"sprintf": "^0.2.3",
}
...
}
```
## SafeSkein CI/CD
</details>
---
## Reading Material
- [About NPM audit reports](https://docs.npmjs.com/about-audit-reports)
- [Auditing package dependencies for security vulnerabilities
](https://docs.npmjs.com/auditing-package-dependencies-for-security-vulnerabilities)
- [`yarn why`](https://yarnpkg.com/cli/why)
- [Semantic Version (SemVer) ranges](https://classic.yarnpkg.com/lang/en/docs/dependency-versions/)