Analysis of the data of vulnerabilities - BoF @GSVS

# Analysis of the data of vulnerabilities - BoF @GSVS ## Moderator Brandon Lum ## Attendees Shripad Emily Brandon Josh Buker Allan Friedman Art Manion Paul Scarrone Jonathan Leitschuh CRob Jamie Magee GH folks Christopher Turner ## Discussion Areas * Completeness of the data, * usefulness (missing content), * any data science needs ## Summary _a tl;dr of the discussion_ ## Notes What are we defining as vulnerability data? the CVSS score, packages, version, how you fix it, CWE, what all do you see as data around vulns? We have a lot derived data points and can be used for lots of analysis. how have the derived data points come to be and how accurate are they? Lots of black box sources that don't tell you that. What are some of the sources of bad data? is it because certs are not doing their job or are there biases coming from the data generation are there systemic biases, why don't we have all the fields that we want. Are there biases or are the tools insufficient to be able to accept the data. Everyone is driven to do it for different reasons, everyone wants to jump to results. Emily's Question: What are the fields we want that allow us to better analyse the information presented? What kinds of things do we want out of the data? minimum what is affected and what versions are affected. they must be done in a machine readable way. if we can tie vuln management to open source development workflows, if any vulnerability had any automation to get hte range from when we introduced to fix the vuln, there is a lot of ways to do automation on it. Idea: matching the commit(s) in open source to the vuln to tie a clean fix If we run it on an SBOM, we will get a lot of informaiton. VEX helps in this way. Having that data would be able to automatically create these VEX type documents In most cases, CVE fix is not only part of 1 commit, but multiple commits. Many reporters have no incentive to give us good data. This needs changed how to do we change it? Thought on what matters on the data ; - am i affected - how bad is it - how can i fix it - What kind of vuln is it, is it easy to understand Adding financial incentives around good data, paying bounties for writing up good reports. Including , xyz, information, provide monetary incentive People don't like the answer of cve exists but there is no patched version A lot of companies create new versions that still have the vulns. i.e. they deprecate the method but don't fix the vuln Part of incentives, another part of it is making the process easier. Making it easy = incentive to submit data A possibly fundamental issue is about creating a format, and it boils down to the similar things. CVE description fields - what is the machine readable version of local priv escalation. That language problem may not be totally sovled. Human experts can sort of understand, but hard to encode. Thus, slow progress. With all the information - if we fill the gaps, what can we do with it? Machines need to understand it. There is a general lack of knowledge of this kind of information that want to give it. Maintainers from my community is well intention and supported, but don't know who to contact and what information to give. (Github) Can we add call graph info into the IDs? is this the right thing to do. The data is encodable, but the utility, im not sure Data has to be very certain in order to make it useful, getting that certainty is very hard. Can we express certainty to a given degree of confidence? Doing the call graph accurate is difficult but having it 50% would be helpful as well. Sounds like templates of expected content is needed that can be easily converted to machine readable formats. perhaps doing this in the command line with a utility? Just a note and thinking out loud Vuln - is it a vuln, what's fixed, what's affected. Then the next level is which function is affected. But we can't get the basic information right. Better descriptions are good.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.