Santiago Torres
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Guac-alytics meetings :warning: slides and more [here](https://drive.google.com/drive/folders/1Ea21f5vJiTSPFlk1ZRlN9q9ve8jGAyap?usp=sharing&ts=63811bbb) :warning: ##### [Analysis Plan](https://docs.google.com/spreadsheets/d/18oAV8NWxfRHRec7k8CYpXToR_Ejgvj-RfG8cZiJklmA/edit?usp=sharing) ## Meeting 2025-10-1 (10:00 AM EST) * Submit abstract even if Sahiti is not having time * Plan resubmission with a comparison using code-dependencies (satyam?) * Focus on interpretation of the metrics from an OSS point of view. Todo: - [ ] Do a review on papers for MSR: 1) work vulnerability prediction' 2) look for work on network metrics in MSR; add to the .bib (sahiti) 10 to 20 top papers (sahiti) - [ ] Review code \href{https://github.com/TSELab/guac-alytics} (Sabine, Satyam, Sahiti) - [ ] Sahiti to push the last version of the code (Sahiti) - [ ] Santiago to help fine-tune the interpretation of the metrics - [ ] Submit abstract on Monday @Santiago Torres Arias - I plan to fine-tune the abstract tomorrow evening and submit it. Could you also check the interpretation of the metrics from a OSS supply-chain point of view so that they convince the community? @Satyam Mukherjee - we should connect on Monday evening briefly to which extend you can be involved in this paper too (without a GNN). Santiago and I discussed a range of opportunities. I personally would like to run a GNN on the data but we also talked about comparing it with metrics that use functional/code dependencies - which we do not have right now. Adding this also too HM. ## Meeting 2025-10-08 (10:00 AM EST) * Decision was made to drop the risk term! * Framing more in the MSR real * Emphasize the data/mining and the networks and what they mean Todo: - [x] How to Raise a Suggest Edit](/qklo0730R8KW6jLwNFvT8A) Review the paper: read paper and the reviews [Link](https://docs.google.com/document/d/1_6iLgWb8ou8i5GtsXZPrRg0LuZv19P6EbuuZJ16zVfg/edit?tab=t.0) (All, Sahiti, Sabine, Santiago) - [ ] Do a review on papers for MSR: 1) work vulnerability prediction' 2) look for work on network metrics in MSR; add to the .bib (sahiti) 10 to 20 top papers (sahiti) - [ ] Review code \href{https://github.com/TSELab/guac-alytics} - [ ] Sahiti to push the last version of the code - [x] Sabine to provide nvidia access and also access to server - [x] Both to review methodology: - [x] Santiago to give comments by Monday, and we meet all again the week after hoping that additional revisions are made! ## Meeting 2024-06-04 (10:00 AM EST) ### Attendees Sahithi Santiago Sabine Satyam ### Action Items • [] Review choice of BFS algorithm vs minimum spanning tree approaches. Check literature for justification. • [] Check code to identify issue causing single boxplot and fix visualizations. Separate by k-core values. • [] Calculate correlations between network metrics and improve outlier analysis. Consider scatter plots. • [] Present same analysis on source phase data for comparison. • [] Revisit definition of "propagation cost" term and justify metric choices from cybersecurity perspective. ### Notes Risk assessment metrics for cybersecurity packages. • Speakers discuss a document and shared Overleaf document, with confusion over who added the other to the shared list. • Speaker 3 explains the metrics used to analyze package popularity and interconnectedness. Cybersecurity metrics and visualizations. • Speakers discuss the use of BFS for attack radius and minimum spanning trees in cybersecurity. • Speakers discuss data visualization and statistical analysis in a thesis defense. • Speakers discuss skewed distribution of attack radius values in a box plot, with most packages having zero values. Network analysis and core decomposition with null values replaced. • Speaker 2 explains K core decomposition, starting with minimum degree of 1, as isolates can occur in the network. • Speaker 3 imputes null values in a common data frame by replacing them with 0, for combined metric results. • Analyze outliers in data to identify interesting patterns. Analyzing package data for vulnerabilities. • Speakers discuss how to improve outlier analysis for package delivery vulnerability. • Speaker 3 identifies vulnerable packages in outliers analysis. Risk metrics and outlier analysis in software engineering. • Speaker 1 suggests using "risk metrics" instead of "propagation" to measure cybersecurity risk due to confusion and lack of clarity in definition. • Speaker 1 seeks clarification on how to compare vulnerability data and outlier analysis. • Speaker 3 found 3004 out of 3539 packages with vulnerabilities in the top 15% of metrics. Analyzing vulnerability data in software development. • Speaker 1 suggests using derivative to find inflection point in skewed data (0:32:19) • Speakers agree on doing sensitivity analysis with 5%, 10%, and 50% quantiles (0:33:36) • Speaker 3 finds that in the source phase, attack radius and path length values are similar, with the longest path length potentially being the shortest length a node can be reached. • Speaker 1 finds vulnerability data set has higher risk metrics in build stage. Analyzing vulnerability data using machine learning models. • Speakers discuss potential statistical analysis of vulnerability data, including Poisson and zero-inflated models. • Speaker 2 mentions zero-inflated models and time series analysis, while Speaker 1 emphasizes the importance of showing temporal changes in the data. • Speaker 3 tries different models, including logistic regression, random forest, and gradient boosting, to improve accuracy and precision for the target variable. Predicting vulnerability in software using risk metrics and metadata. • Speaker 1 explains how to predict vulnerability based on risk metrics, achieving 95% accuracy and 97% F1 score. • Speakers discuss using metadata metrics and stepwise modeling for cybersecurity risk analysis. • Speaker 1 and 3 discuss related work and popularity of packages in software development. Analyzing audio transcript to predict vulnerability in software development. • Speaker 1 finds high popularity in downloads correlates with higher risk metrics. • Speaker 1 proposes building a model to predict vulnerabilities 5 months in advance using time series analysis. • Speaker 1 seeks feedback on analysis for music recommendation project, focusing on source and build separation. Interdisciplinary journal publication for cybersecurity research. • Speaker 3 plans to propose two research questions based on the paper's metrics and Sabine's propositions. • Speaker 1 discusses publishing their work in an interdisciplinary journal, with potential collaboration with Satyam. • Speaker 2 suggests considering journals like Journal of Network Science for publication, and offers to share their Purdue background for access. ## Meeting 2024-01-19 (11:00 AM EST) ### Attendees Sahithi Santiago Jorge Abhi ### Notes - ## Meeting 2023-12-04 (3:00 PM EST) ### Attendees Sabine Sahithi Vinh Abhi ### Notes - Change the figure 1 - increase the font size, reduce the figure size and make it completely related to Debian. - Make log plot on build to publish packages. - Propagation/diffusion time - Go through some papers and define them properly and use same term. - Rename Publish Diffusion time to Ecosystem inclusion time - Sankey Diagram - https://plotly.com/python/sankey-diagram/ and https://d3-graph-gallery.com/graph/sankey_basic.html - For section 3, look into papers on debian and add some background and insights about it. - Do lit review on Debian and categorize them according to supply chain elements. ## Meeting 2023-10-16 (3:00 PM EST) ### Attendees Sabine Santiago Sahithi Vinh Abhi ### Notes - Run 'ninka' on the repos for licenses. - The research questions - RQ0: Has the debsources dataset fared well in light of new data and new data types? what information was missing from it? - RQ1: In what ways can we trace artifacts as they move through the chain. - RQ2: How are open source software artifacts built, fixed and propagated through the chain? - We need temporal data analysis for RQ0. ## Meeting 2023-10-16 (3:00 PM EST) ### Attendees Sabine Santiago Sahithi Jorge Vinh Abhi ### Notes - https://2024.msrconf.org/track/msr-2024-technical-papers - 14th November 2024 Abstract Deadline - Work on the paper. ## Meeting 2023-10-11 (4:30 PM EST) ### Attendees Sabine Santiago Sahithi Jorge Vinh Abhi ### Notes - Find papers on mean-time-to-repair or how long does it take to fix the bug(write what they are doing - what the data sources are, are they using ticket system or any other). ## Meeting 2023-10-02 (3:00 PM EST) ### Attendees Sabine Santiago Sahithi Sumukhi Vinh Abhi ### Notes - We do both sloccount and cloc. We create the tables and plots for sloccount. Tables and plots with cloc are to be added as supplement information at the end. - Try atleast 5 reasons why the tags in the source code are not matching with the published data. Compare the average mean, median and standard deviation - This will be added as findings in the propagation time and diffusion time section. ## Meeting 2023-09-25 (3:00 PM EST) ### Attendees Sabine Santiago Sahithi Jorge Sumukhi Vinh Abhi ### Notes - Add the releases do not clone every tag. - Clone only the repo structure so that when we want to analyze it we will trace it back from it. - Clone repos which are only after 2017 using --bare. - Match the tag names with versions. Give an overview on how much you could match. - Analysis on license changes over time (force more open or closed licenses). ## Meeting 2023-09-18 (3:00 PM EST) ### Attendees Sabine Santiago Sahithi Jorge Sumukhi ### Notes - Update the Ubuntu and all other on tower. - Rerun the codes in the parsers and vulnerability_data branches and update the issue - https://github.com/TSELab/guac-alytics/issues/34 - Add Vinh to the github. - Work on the issues - https://github.com/TSELab/guac-alytics/issues/37, https://github.com/TSELab/guac-alytics/issues/36. ## Meeting 2023-09-11 (3:00 PM EST) ### Attendees Sabine Santiago Sahithi Abhi Vinh Jorge Sumukhi ### Notes - Run the codes on the git repo and update the comments in pull request. - Update the upstream_data table and add changes to the pull request. (store repo URL and tag/releases) - Add Vinh to the Github and Tower. - Recheck the analysis on file-size. - In the plots, put them in the sorted order. - Compare the analysis with Desources Paper. ## Meeting 2023-08-28 (3:00 PM EST) ### Attendees Sabine Sumukhi Sahithi Abhi ### Notes - Clone the TSELab/guac into opendigital - Clone the gitlab repos of the upstream code - We need to update the upstream table with releases and timestamps - Give the link of the package in the upstream table - discuss about this with Santiago about linking it from salsa - Create EDA, Network Analysis and Visualizations sub-folders in Analysis ## Meeting 2023-08-22 (10:00 AM EST) ### Attendees Sabine Santiago Sumukhi Sahithi Abhi ### Notes - **The meeting was scheduled for 3:00 PM every Monday at YOUNG 360.** - Based on the new data (buildinfos and upstream code) we acquired, our objective is to validate the accuracy of the insights with the Debsources paper. - Data we have is peculiar evidence. - The Debsources paper lacks important information regarding supply chain. With our data sources we can provide evidence to support our assertions. - to prove that compare the metadata and insights from the Debsources paper(they have the automated data which states that all the packages have same dependencies) - We have to provide evidence that our data sources are more relevant. - Comparative Analysis of the overall timeline - languages used (per update in repo) - loc (per supply chain effect - builds and per update in repo) - file size (per supply chain effect builds and per update in repo) - licenses (per supply chain effect - builds and per update in repo) - Propagation Time - how long does it take from src to build to publish (per repo) - scatter plots - repo vs package and time taken (source-build or build-publish) vs loc - distribution - correlation analysis - Diffusion time - after publishing a package, when it is first appearing as a dependency #### AI - Sahithi - [ ] Work on the Comparative Analysis. - [ ] Work on the Propogation time and diffusion times. - Abhi - [ ] Work on the Comparative Analysis. - [ ] Work on the Propogation time and diffusion times. - Sumukhi - [ ] Help Sahithi and Abhi on the Analysis. - [ ] Replicate the analysis done in the Debsources Paper (Start with table creations). ## Meeting 2023-08-17 (10:00 AM EST) ### Attendees Sabine Santiago Sumukhi Sahithi Abhi ### Notes - Reviewed the tower issues and discussed about how to resolve them and about space. - Discussed about the hashes of the upstream code. #### AI - Sabine - [ ] Review the pull requests and issues, and update them. - [ ] Add space to the tower. - Santiago - [ ] Review the pull requests and issues, and update them. - Sahithi - [ ] Replicate the Debsources Dataset Analysis and compare them with the buildinfos. - [ ] Analyze the behaviour of each version and get timestamps of it. - [ ] Resolve the tower issue. - [ ] Organize the analysis outcomes into a designated folder within the Git repository and keep them up to date. - [ ] Get the average values for the packages for better analysis. - [ ] Create distribution plots once we get the final results. - Abhi - [ ] Replicate the Debsources Dataset Analysis and compare them with the publish. - [ ] Push the codes on to "guac-alytics" and on to the tower. - [ ] Once the drive space has been increased, copy the Debsources data to the tower. - Sumukhi - [ ] Replicate the analysis done in the Debsources Paper (Start with table creations). - [ ] Try to connect the results with the Abhi's results. ## Meeting 2023-08-09 (10:00 AM EST) ### Attendees Sabine Sumukhi Sahithi Abhi ### Notes - Reviewed preliminary results on code complexity and languages used. - Assessed the open issues in the Git repository. - Reviewed the timestamp and hashes comparision between publish and buildinfos. - Reviewed the distribution plots between publish and buildinfos. #### AI - Sabine - [ ] Review the pull requests and issues, and update them. - Santiago - [ ] Review the pull requests and issues, and update them. - Sahithi - [ ] Replicate the Debsources Dataset Analysis and compare them with the buildinfos. - [x] Create a table in the database on upstream code. - [ ] Analyze the behaviour of each version and get timestamps of it. - [ ] Organize the analysis outcomes into a designated folder within the Git repository and keep them up to date. - [x] Connect upstream code analysis to buildinfos using hashes. - [ ] Get the average values for the packages for better analysis. - [ ] Create distribution plots once we get the final results. - Abhi - [ ] Replicate the Debsources Dataset Analysis and compare them with the publish. - [ ] Push the codes on to "guac-alytics" and on to the tower. - [ ] Once the drive space has been increased, copy the Debsources data to the tower. - Sumukhi - [ ] Replicate the analysis done in the Debsources Paper (Start with table creations). - [ ] Try to connect the results with the Abhi's results. ## Meeting 2023-08-01 (2:00 PM EST) ### Attendees Sabine Santiago Sahithi ### Notes - Do a upstream code analysis, Debian and built. - Basic source code metrics - Lines of code (added & deleted) - Programming languages - CC metric (cyclomatic complexity) - Licenses - Work on the analysis plan. ## Meeting 2023-07-31 (10:00 AM EST) ### Attendees Jia Sabine Santiago Sahithi ### Notes - Discussed on the analysis of paper 'The Debsources Dataset: two decades of free and open source software'. - Questions to be asked self by analysing the paper 'The Debsources Dataset'. - What is the duration for building the source code to publish the package? - Can we verify the accuracy of their statements? - How well do their laboratory experiments align with real-time data? - Issue on Jia cloning of upstream repo such as https://tracker.debian.org/pkg/debpear #### AI - Sahithi - [ ] Read the paper - The Debsources Dataset: two decades of free and open source software - [ ] Understand the data and find the gap with the supply chain - [ ] Come up with an Analysis plan to work for the paper. - [ ] Replicate the Debsources Dataset Analysis. - Jia - [ ] Read the paper - The Debsources Dataset: two decades of free and open source software - [ ] Create additional column in the log file to capture the tracker link - [ ] We will run several passes for the repos that cannot be cloned - [ ] Faulty repos that we looked into include https://packages.debian.org/stable/django-cors-headers, https://packages.debian.org/stable/donfig, https://tracker.debian.org/pkg/debpear - Abhi - [ ] Replicate the Debsources Dataset Analysis ## Meeting 2023-07-18 (1:00 PM EST) ### Attendees Abhi Jia Sabine Santiago Sahithi ### Notes - Discussed on the analysis of table1 in the paper. - Discussed about cloning salsa.debian packages on to the tower. - Provided an overview of different Linux commands and their usage. #### AI - Sahithi - [ ] Read the paper - The Debsources Dataset: two decades of free and open source software - [ ] Reflect on their methodology and consider how we can replicate their process with our own data. - [ ] Work on the analysis in the paper and reflect it. - [ ] Update the ER diagram after working on upstream code table. - [ ] Add research question or goal, coverage of supply chain elements and sample or dataset used to the table in the paper. - [ ] Work on the analysis and related work in the paper. - Jia - [ ] Clone the salsa.debian repository and retrieve the required data to create the Upstream code table. - [ ] Should use the git credentials - [ ] Work on the paper. - [ ] Read the paper - The Debsources Dataset: two decades of free and open source software - Abhi - [x] Write a paragraph about the data mining process for my tables. - [x] Move code to the main repo - [ ] Create a local copy of Sahithi's data and work on analysis. - [x] Read the paper - The Debsources Dataset: two decades of free and open source software ## Meeting 2023-07-10 (10:00 AM EST) ### Attendees Abhi Jia Santiago Sahithi ### Notes - Discussed on the analysis of the published and buildinfo tables. - Discussed about cloning salsa.debian packages on to the tower. #### AI - Sahithi - [ ] Clone the salsa.debian repository and retrieve the required data to create the Upstream code table. - [ ] Update the ER diagram after working on upstream code table. - [ ] Work on the analysis and related work in the paper. - Jia - [ ] Clone the salsa.debian repository and retrieve the required data to create the Upstream code table. - [ ] Work on the paper. - Abhi - [ ] Rework some tables if needed and find reasons for lots of duplicated entries. - [ ] Work on the analysis. - [ ] Work on a query to explore the finding related to buildinfo versions not existing on publish tables. ## Meeting 2023-07-03 (10:00 AM EST) ### Attendees Abhi Jia Sabine Santiago Sahithi ### Notes - Reviewed ER diagram - Provided an overview of the tables and their analysis in the paper, highlighting the key points to be included. - Discussed about the publish_packages table. #### AI - Sahithi - [ ] Create the Upstream code table. - [ ] Update the ER diagram after working on upstream code table. - [ ] Work on the analysis and related work in the paper. - Jia - [ ] Create the Upstream code table. - [ ] Clone the salsa.debian repository and retrieve the required data. - [ ] Work on the paper. - Abhi - [ ] Work on the analysis and related work in the paper. ## Meeting 2023-06-19 (10:00 AM EST) ### Attendees Abhi Jia Sabine Santiago Sahithi ### Notes - Reviewed ER diagram - Explored salsa.debian to build the upstream code table and clarified the doubts. #### AI - Sahithi - [ ] Create the Upstream code table. - [ ] Update the ER diagram after working on upstream code table. - [ ] Save and update the vulnerability data table whenever a new update is available. - [ ] Work on the paper. - Jia - [ ] Create the Upstream code table. - [ ] Clone the salsa.debian repository and retrieve the required data. - [ ] Work on the paper. - Abhi - [ ] Work on the paper. ## Meeting 2023-06-12 (10:00 AM EST) ### Attendees Abhi Jia Sabine Santiago Sahithi ### Notes - Reviewed vulnerability table in the database. - Reviewed snapshot parsing and publish table #### AI - Sahithi - [x] Add a date column to the vulnerability table in the database. - [x] Update the ER diagram with the current database. - [ ] Work on the literature review - definitions of supply chain and activities. - Jia - [ ] Work on the literature review - definitions of supply chain and activities. - [ ] Delve deeper into the database structure and work on the upstream code table. - Abhi - [ ] Add timestamp when a binary was added to a snapshot - [ ] Add DB to tower - [ ] Compare and analyze with current DB in tower (buildinfo related) ## Meeting 2023-06-05 (10:00 AM EST) ### Updates - Discussed about the addition of the vulnerability table and upstream code table to the database. - Provided feedback on the tables that were previously worked on. - Gave inputs to add pictures and tables to the paper. ## Meeting 2023-05-27 (8:00 PM EST) ### Updates - Introduced new team members to the project. - Prof. Santiago explained about Software Supply Chain to Jia. - Added new members to Zotero, Overleaf, and GitHub. - Assigned tasks to Sahithi, Jia, and Abhi. - Read the following research papers: - [Constant-factor approximation of near-linear edit distance in near-linear time](https://dl-acm-org.ezproxy.lib.purdue.edu/doi/abs/10.1145/3357713.3384282?casa_token=YNdIoEZYUhkAAAAA:9X_L0RuPQhQI2DmJk9X21uWPuNRVsT7M5pDbgwioP4f3L_O4WEpG5mFD-Bnv1RHLnI2oICUlWu2CfjU) - [An overview of distance and similarity functions for structured data](https://link-springer-com.ezproxy.lib.purdue.edu/article/10.1007/s10462-020-09821-w) #### AI - Sahithi - [ ] Explain about the metadata that is collected and the scripts over at /guacalytics. - [ ] Give an analysis on path lengths. - [ ] Create tables focusing on the following topics: - [ ] Definitions of supply-chains and included activities. - [ ] A literature review of empirical papers studying OSS supply-chains explaining what data used i.e., what are elements of the supply-chain are used. ## Meeting 2023-05-12 (2:30 PM EST) ### Updates - Updates on Poster of IC2S2 Conference. - Discussion on Website #### AI - Sahithi - [ ] Define Supply chain in website from the proposal and change citations format. - [ ] Define Risk and add proper citations to the poster. - [ ] Draw a debian supply chain diagram. ## Meeting 2023-04-17 (5:30 PM EST) ### Updates - Discussed path lengths and the need to dig deeper on the longest path lengths. - Discussed the concept of swap memory. - Discussed the possibility of optimizing the code by analyzing the calculated path lengths of the nodes connected to it. #### AI - Sahithi - [x] Learn more about longest path lengths and try to connect them. - [x] Prepare slides on the current progress for future reference. - [ ] Try to connect the things learned in research papers to the current findings. ## Meeting 2023-04-10 (6:30 PM EST) ### Updates - Reviewed the pull request and added comments. - Prof.Sabine explained network metrics (k-core). #### AI - Sahithi - [x] Create a folder in the tower for the project as /data/yellow/guacalytics/. - [x] Address the issues in the GitHub. - [x] Reading the papers on core-periphery and summarize the papers. ## Meeting 2023-04-03 (6:30 PM EST) ### Updates - Reviewed the website and discussed on the graph. - Reviewed the literature and provided a framework for arranging the papers. - Discussed the research question - Discussed to include case studies and answer the questions from Debsource paper and add additional questions. - Discussed the next steps for the second paper. - Prof.Sabine explained network metrics, motifs, and propagation costs. - Proposed to integrate deep learning techniques into propagation graphs on core-periphery. ### Reseach Paper - What data model is needed to reliable model OSS supply-chains and their evolution? - We are going to propose a graph-based approach that is based on the .buildinfo data in order to understand hidden influence in the supply-chain. #### AI - Sahithi - [x] Calculate the path length of each node in the graph. - [x] need to review more papers on network science. - [x] define the different network metrics: in-degree, out-degree, coreness, path length, propogation cost and add them in the literature review. - [x] re-arrange the papers - [x] read papers shared by Prof.Sabine. - [x] create issues in the GitHub. - Santiago - [ ] Do a pass over on the paper daily - [ ] read papers shared by Prof.Sabine. - Sabine - [ ] Do a pass over on the paper daily ## Meeting 2023-03-20 (6:30 PM ET) AI - Santiago: - [ ] Do a pass over paper 1 - Sahithi: - [ ] Schema for popcon - [ ] relwork ## Meeting 2023-02-13 (6:30 PM ET) ### updates - Reviewed progress for data validation and ER diagrams. - Discussed the updates results for the plots (see AI below) - Discussed data structures and database structure AI - Sahithi: - [ ] Upload the ER diagram with the update - [ ] test the scripts with an artificial dataset ## Meeting 2023-02-06 (6:30 PM ET) AI: - Sahithi: - [x] Data validation (check for duplicates) - [x] Make a markdown in the repo where you describe a description of the database: - [x] Make an ER diagram of the database - [x] Debug why the plot has the version on the label (is it pulling sth from the dependency, instead of source) - [x] Once you found the bug: - [x] you want to look at the bug and - [x] also see whether there are inconsistencies. - [ ] Ask yourself, does this make sense? - [ ] then interpret it: what does e.g., in-degree mean? - [ ] Coreness metrics (but do the above first) - Santiago: - [x] Write up introduction for the paper (S&P mag?) - Sabine - [ ] Run the code together and figure out what are the next steps in the analysis. - [ ] Can always go back into the discussions from vineet to see what you can glean/learn from it ## Meeting 2022-12-14 (2:00 PM ET) ### Attendees Santiago Sahithi Sabine ### Updates - Sahithi: - Updated the maintainers table - Sabine: - Read the debsources dataset ### Brainstorming of research question - How can we better represent the debian supply chain, in such a way that critical gaps in the supply chain (such as building, localizing, debianlizing) are not overlooked.: - What is the role of graphs in this dataset? (see below) - Are there any fundamental mis-representations due to faulty models in previous work? - The faithfulness of the model will help us understand the predictability - Supply chain, key issue: the diffiusion or propagation of information. The best way to represent that in datastructures. Through relational data we can do this (i.e., through graphs). - It's not about co-relation, but about relation (i.e., direct causality) - With this hook, we can follow up - We need to decide upon how to present the data "framework" (descriptive model) Paper outline: 1. Introduction: * Motivation/hook * gap * Why does it matter: cyber security research is about predicting/detecting and explaining vulnerabilities * Research question: What data model is needed to successfully predict vulnerabilities in the supply chain? 2. A framework/data model of an OSS supply-chain: * Introduction of the OSS supply-chain model (conceptual) that is relational using Debian as an example * Deriving a data model and its properties to successfully represent it 3. Analysis of existing work in representing the OSS supply-chain (implications for predictive and explanatory modeling questions of cyber security) 5. Descriptive statistical analysis using the framework/data model * Analaysis based on built-graphs 6. Discussion 7. Future work ### AI: - Santiago: - Pass over introduction - add related work to zotero - (fix bug in matplotlib, blocked on tower) - Are there any graph-based database papers that relate to this? - Sahithi: - Help review related work - Keyword-related search specifically on MSR using the following keywords: - Open source software - Supply Chain - Software Vulnerabilities - Dig up other keywords for prior work - Sabine: - Help with ensuing sections ## Meeting 2022-12-06 (2:00 PM ET) ### Attendees Santiago Sahithi Sabine ### Updates - Sahithi: - Updates on her PR and structure, function definitions, etc. - Description of the module and repository structure. The main functionn is the main antrypoint ### Notes - Maintainer Discussion: - what is the relationship betweennn a maintainer a team. What is the maintainer of a repo? - The schema for the maintainer table is including package information, and it is really not describing information about the maintainer. Tracked on new issue https://github.com/TSELab/guac-alytics/issues/8 - Doubts: - numpy + matplotlib is failing somewhere here - we wonder whether this is happening at the interaction between two modules - Paper discussion: - the debsources dataset paper: they don't consider the buildinfo files (they didn't exist back when their dataset was being collected) - Zotero: - we have a new zotero library for this work: https://www.zotero.org/groups/4883577/gual-ytics ### AI: - [ ] Github project for the database creation: - [ ] Task 1: complete rebuild of the database - [ ] Task 2: rebuild the graph - [ ] Task 3: data validation and - [ ] Task 4: read papers - [ ] Task 5: Research questions for the ESE/MSR paper - [ ] Sahithi + Sabine: create a new overleaf for the MSFR paper (share with Santiago) - [ ] Sahithi: add a separate document to talk about the papers you read on the overleaf for the paper: - [ ] Add summary of debsources paper - [ ] Add summary of the network evolution paper - [ ] Santiago: check bug with the renderer on matplotlib ## Meeting 2022-11-30 (2:00 PM ET) ### Attendees Sahithi Santiago ### Notes 1. Go over open PR: 1. Small nits to correct re: change constant to a var/arg 2. perhaps create a constants.py file 3. TODO: Sahithi will update PR 2. Discuss Setting up Testing harness 1. Santiago will setup basic packaging infrastructure (on a new PR) 2. Follow-up PR will set up base testing harness 3. TODO: create those PR's (Santiago) 3. Activity: 1. learn about PRs 2. Learn about linking issues to PRs and such Discuss reading the Debian build dataset since the work is related (and it was published on MSR) Look at the last 4 years of MSR to find relevant papers on the conference: MSR 2023: https://conf.researchr.org/home/msr-2023 Discussed errors when plotting/pipelining the data. We may need to checkpoint the constructed graph in memory. Possible candidates are pickle/json/gefx ## Meeting 2022-11-25 (2:10 PM EST) ### Attendees Sabine Santiago ### Notes TODO: post the link to the gdrive (at the top) - First goal: - define an ontology/properties of the graph - what is a node? - what is an edge? - TODO: update this in the terminology repo - Regarding node types - we had discussed three types of nodes - Identities (Developer, automated system, an institution) - Software Artifacts (Repository,Package,Docker Container) - Metadata (e.g. Build info file, ) - we also discussed types of edges represent *information flow* or *associations* - Actions (from to...) - Association/similarities - If we connect it to the supply-chain: https://docs.google.com/presentation/d/1FKthyyVpaDAtYtiiHWIv-lM3RIYWilLE-Bn8NZQ6vEY/edit and some earlier modeling: https://docs.google.com/presentation/d/1AOLv85UUPoU6i2fUPgkWNy3rQLtf7N1PuBgKCMlhoME/edit#slide=id.g101f069a808_4_0 - work on: build-publish-install-use - MSR paper: what do we want to do? - clarify: from a mining point of view what's missing in the existing R - future research questions related to this if they capture that - is it only a lit review? - outline/start writing. We have all the literature, we just need to write this. ```graphviz digraph dg{ identitya -> sourcecode[label="association:producer"]; identityb -> package[label="association:producer"]; identityc -> image[label="association:producer"]; identityc -> identitya[label="association:organizational"]; sourcecode-> package[label="action:build"]; package-> image[label="action:package"]; } ``` - Project tracker: 1. MSR paper discussing this ontology and the dataset built using this 2. Academy of MGMT paper 3. ??? ### TODO - Santiago: work on outline folr the MSR paper - Sabine: start w/ slides - Sahiti: update with notes and sync w/ Santiago

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully