相關論文 - HackMD

# 相關論文 ###### 更新進度：https://hackmd.io/e4SB1AAoRUSfWoSMFOc36A ### Graph Neural Network 基礎介紹 - [A Comprehensive Survey on Graph Neural Networks](https://arxiv.org/pdf/1901.00596.pdf) ### embedding the data into grap(把資料轉換成圖) - [(論文)Sequence as Network: An Attempt to Apply Network Analysis to Sequence Analysis](https://drive.google.com/file/d/10JRk7K9ntNTNHWj8Uwvsk5_68BJVXnNL/view?usp=share_link) - 他這篇的主題跟我一樣，他分析的對象是義大利的男生跟女生在十年間的工作轉換狀況 ### Graph Embedding(把圖的資料取出的方法) - [(論文)KMer-Node2Vec: Learning Vector Representations of K-mers from the K-mer Graph](https://www.biorxiv.org/content/10.1101/2022.08.30.505832v2.full) - 這篇是用kmer+node2vec去將基因序列進行轉換的 - [(論文) Fast and Accurate Network Embeddings via Very Sparse Random Projection](https://arxiv.org/pdf/1908.11512v1.pdf) - [他們的github](https://github.com/GTmac/FastRP) - 這篇是說他比目前的deepwalk, node2vec都更快(27 times)能表達graph的node features，而且他們可以在維度降低的同時保存比較高order的相似性(We first describe the usage of very sparse random projection for dimension reduction and its merit in preserving high-order proximity.) - 他們認為在sparse graph中 nolink between two nodes 不代表他們沒有關係(Real-world graphs are usually extremely sparse, which means most of the entries in S are zero. However, the absence of edge between two nodes u and v does not imply that there is no association between them) ### Graph Clustering / Community detection - [(論文)Clustering Algorithms on Protein Interaction Networks](https://etd.lib.nctu.edu.tw/cgi-bin/gs32/hugsweb.cgi?o=dnthucdr&s=id=%22GH029962826%22.&searchmode=basic) - 這篇寫是將每個蛋白質的結構轉成圖（無向圖），再利用一個演算法去推算這個蛋白質跟另一個蛋白質之間會不會有交互作用，所以這邊會得到一個很大的圖之中又有子圖，子圖跟子圖之間有連線，再利用圖分群去把蛋白質分群 - 我的狀況跟這篇不同的點在於 1. 他的交互作用是推算的，我可能要找一個理由去建立這個關係 2. 他的蛋白質跟蛋白質之間是子圖跟子圖去連線的，所以我需要建立一個關係是建立在子圖跟子圖上的 3. 他的蛋白質結構是無向圖，我的玩家行為是有向圖 - [(論文)Detecting User Community in Sparse Domain via Cross-Graph Pairwise Learning](https://arxiv.org/pdf/2009.02637.pdf) - 這篇有利用Community Recurrent Unit(與GRU/ LSTM相似的結構，把內部改成community版本)去篩選出有用的community repersentation，另外他用的是Dynamic Directed Weighted Sparse Large Attribute Graph所以也許可以找到的素材很多 - [(論文)跟上一篇相同的人寫的，不過針對細節有很多描述](https://scholarworks.iu.edu/dspace/bitstream/handle/2022/25623/thesis.pdf?sequence=1&isAllowed=y) - 可能有用的細節： - Sparse Graph：這也就是我目前的狀況，因為如果把kmers的k提高的話就會造成每個點的degree很低，也不是strongly connected，在拓墣結構鬆散的狀況下不好進行後續分析，我的狀況就可能可以挑出dense graph做分析或是怎樣下面第一個是他的定義，第二個是他說Sparse Graph用在spectral clutering效果不好，他這邊利用的方式是non-backtracking walks（也就是random walk但不能回到上一個節點），然後轉換成這個non-backtraking walks的矩陣也有寫在[30]。 - A common notion isthat dense graph is with the number of edges as |E| = O(|V|^2) as the maximum number of edges in a directed graph is |V|(|V| − 1)/2. And a sparse graph usually contains the number of edges as |E| = O(|V|). - Due to the fact that spectral clustering methods are not able to perform well in sparse graphs,[30] ### Spectral analysis of large sparse graphs - [(論文)Spectral analysis of large sparse graphs](https://www.irif.fr/~steiner/jifp/salez.pdf) - 講比較散的圖怎麼先用分析 ### Similarity-based link prediction in social networks using latent relationships between the users - [(論文)Similarity-based link prediction in social networks using latent relationships between the users](https://www.nature.com/articles/s41598-020-76799-4) - 圖論的最大兩個分支是network reduction跟link prediction，也可以嘗試把兩個玩家的遊玩行為（或加上其他的標籤，讓單一玩家的序列資料可以變成一個更複雜的圖，再利用Sparse Distance挑出幾個degree比較高的點進行連線）最後把玩家的圖轉換成一個node的attribute，最後預測這個玩家會不會跟另外一個玩家有連線（**但需要定義一下這個連線的意義是什麼，以及為什麼這個link prediction between nodes的方法可以判斷出這個東西**） ### Distance between two directed graph - [(論文)Graph Kernels. S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt; 11(Apr):1201−1242, 2010](https://arxiv.org/pdf/0807.0093.pdf) - [(論文)Sato et al. Directed acyclic graph kernels for structural RNA analysis. BMC Bioinformatics 2008, 9:318 doi:10.1186/1471-2105-9-318](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2515856/) - [(論文)D-cores: Measuring Collaboration of Directed Graphs Based on Degeneracy](https://www.graphdegeneracy.org/dcores_ICDM_2011.pdf) - 這篇很難用到的數學很多 - [(論文)Directed clustering in weighted networks: A new perspective，這篇提出一個新的clustering coefficient去衡量DAG，但他是把node區分成in-node, middleman-node, out-node, cyclic-node 然後分別計算不同屬性的node分數再乘上某一個權重變成整個graph的分數](https://www.sciencedirect.com/science/article/pii/S096007791730509X) - [(論文)Analysis of large sparse graphs using regular decomposition of graph distance matrices](https://arxiv.org/pdf/1811.10470.pdf) - 他這邊有講到一個重點，就是如果做clustering的話就沒有ground truth，他的解法如下，另外他這邊使用的是undirected grahp，所以其他可以用的地方不多 - As a benchmark we consider the famous planted bipartition model [8]. It is a random graph and a special case of SBM. As ground truth, there are two communities of nodes with equal number of nodes in each and with two parameters. First parameter is the link probability between nodes inside each community and the second one, the link probability between nodes in different communities. The links are drawn randomly and independently from each other. For such a model, it is known that there is a sharp transition, or ’critical point’, of detectability of such a structure depending on the difference between the two parameters [8], [9]. The critical point is also located in the area of very sparse graphs, when expected degree is bounded. This example is suitable for testing our method because: having a ground truth, graph sparsity, bounded average degree and the proven sharp threshold. The preliminary results we report here, are promising. It seems that our algorithm is effective right up to the threshold in the limit of large scale. Moreover simulations indicate that such a structure can be found from very sparse and bounded size samples of the distance matrix. ### Gated Graph Neural Network - [Gated Graph Sequence Neural Networks](https://arxiv.org/abs/1511.05493) - [Gated Graph Sequence Neural Networks (GGNN) Github](https://github.com/chingyaoc/ggnn.pytorch) - [University of Toronto & Microsoft Research Cambridge Slide](https://www.cs.toronto.edu/~yujiali/files/talks/iclr16_ggnn_talk.pdf) ### Gated Graph Sequence Neural Network, - [Gated Graph Sequence Neural Network](https://arxiv.org/abs/1511.05493) - [Gated Graph Sequence Neural Networks 知乎](https://zhuanlan.zhihu.com/p/28170197) ### Sequential Recommendation - [Memory Augmented Graph Neural Networks for Sequential Recommendation](https://arxiv.org/abs/1912.11730) ## Other ### from bio-information - [Using graph theory to analyze biological networks](https://biodatamining.biomedcentral.com/articles/10.1186/1756-0381-4-10) - 做了一個很大的統整，但是是給bio-information的 - 挑出重要的節點：node sampling - Random walk是以node為開始，Total induced edge sampling是基於edge的random walk ### intro to Graph theory - [(Github)Graph簡介，CS領域](https://alrightchiu.github.io/SecondRound/graph-introjian-jie.html) - [(CSDN)基礎圖論跟一些indicator介紹](https://blog.csdn.net/qq_39297053/article/details/116889202) ### intro to graph neural network - [(知乎)Graph Neural Network Review](https://zhuanlan.zhihu.com/p/43972372) - [(Medium)閱讀筆記 : A Comprehensive Survey on Graph Neural Networks ( 簡介 + RecGNNs 篇)](https://z8663z.medium.com/%E9%96%B1%E8%AE%80%E7%AD%86%E8%A8%98-a-comprehensive-survey-on-graph-neural-networks-%E7%B7%A8%E8%BC%AF%E4%B8%AD-78118deae743) - [Graph-neural-network website](https://graph-neural-networks.github.io/) ### graph embedding - [(知乎)Graph Embedding 統整](https://zhuanlan.zhihu.com/p/64200072) - 最基礎演算法是"Deep Walk" that use both Breath First Search and Depth First Search BFS应该反映了结构性，DFS反而反应了同质性. - <font style color = 'red'>這邊問題點在於如果我用kmer把玩家的序列組在一起，沒有辦法有意義的解釋Link Prediction跟Clustering</font> - [(知乎)graph embedding必讀論文](https://zhuanlan.zhihu.com/p/58805184) - [(Video)妹子介紹word embedding + graph embedding](https://www.youtube.com/watch?v=oQPCxwmBiWo&ab_channel=Neo4j) ### find distance between graphs(from couldn't cite website) - [(Github)找Sparse Distance](https://github.com/jpata/SparseDistance) - [(Github)Friend_recommendation_model](https://github.com/akashkumar916/Friend_recommendation_model) - [(Github)Social network Graph Link Prediction - Facebook Challenge](https://github.com/UdiBhaskar/Social-Network-Graph-Link-Prediction/blob/master/Social%20network%20Graph%20Link%20Prediction%20-%20Facebook%20Challenge.ipynb) - 上兩篇都用的是directed graph，然後針對社群網站上的關係增加一些features(jaccard_followers, jaccard_followees, cosine_followers, cosine_followees)之類的再去用random forest或Deep Learning model 預測這個人跟另外一個人有沒有連結，在indicators的加入可能可以參考他的code ### kmer - [(知乎)kmer統整](https://zhuanlan.zhihu.com/p/528369807) - 裡面有提到kmer不適用在太複雜的基因序列之中 - [(Scipy doc.)Compressed sparse graph routines](https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html) - [(bilibili)Knowledge graph embedding](https://www.bilibili.com/video/BV1sV411b7Yk/) ### 不知道誰寫的networkx weighted centrality measures - [Link](https://splunktool.com/python-networkx-weighted-centrality-measures) ## Datasets ### Graph Dataset - [MovieLens 20M Dataset](https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset) - 資料的樣貌是給三個欄位，movieid1,movieid2與相關性 - bAbI dataset - [**standford dataset**](https://snap.stanford.edu/data/) - This website contains the graph dataset only - [kateto dataset（算是統整）](https://kateto.net/2016/05/network-datasets/) - [Pajek datasets](http://vlado.fmf.uni-lj.si/pub/networks/data/) - 都是圖的資料集 - [ucinetsoftware](https://sites.google.com/site/ucinetsoftware/datasets) - 都是圖的資料集 - [nets.indiana.edu](https://cnets.indiana.edu/data-repository-for-nan-group/#icswm2011_2) - 都是圖的資料集 ### Time series data - [(github)InstaFake Dataset](https://github.com/fcakyon/instafake-dataset) ### Sequential Dataset - 想一下要不要用鈊象的資料集 ## Toolkits ### Deep Learning - [DEEP GRAPH LIBRARY doc.](https://www.dgl.ai/) ## other hackmd ### [Result For Version 1](https://hackmd.io/_qGjGRTxQCyvxnTBrnvh0A) ### [kmer的k與strongly_connedted, 平均degree比較](https://hackmd.io/3eNMHNtIRQGKuFBfGt8L6g) ### [graph neural network 1103課程](https://hackmd.io/R9kvjcRiQ9OSz9vNWnE5Jw)