VNA + Gephi - HackMD

# VNA + Gephi ## Visual Network Analysis (Venturini et al 2015) ### Clusters <img src="https://gitlab.com/xpablov/data-studies/-/raw/master/DS19/S07/venturini-clusters.png" width="800"> ### Stars <img src="https://gitlab.com/xpablov/data-studies/-/raw/master/DS19/S07/venturini-stars.png" width="800"> ### Cliques <img src="https://gitlab.com/xpablov/data-studies/-/raw/master/DS19/S07/venturini-cliques.png" width="800"> ### Authorities and Hubs <img src="https://gitlab.com/xpablov/data-studies/-/raw/master/DS19/S07/venturini-authorities-hubs.png" width="800"> ### Qualitative work behind the network <img src="https://gitlab.com/xpablov/data-studies/-/raw/master/DS19/S07/venturini-table.png" width="800"> ### Different categories in the same network <img src="https://gitlab.com/xpablov/data-studies/-/raw/master/DS19/S07/venturini-languages-categories.png" width="800"> ### Glossary * **node**: point in a graph * **edge**: connection between two points in a graph * **directed edge**: it has a direction (e.g. *x* replies to *y*) * **undirected edge**: it does not have a direction (e.g. two co-appearing hashtags in a sentence) * **cluster**: a group with relatively small distances among the nodes (they can be intermediary [in the middle of other clusters] or peripheral [far from the center]) * **bridges**: nodes or clusters that have connections with several other clusters * **subcluster**: clusters within a bigger cluster * **in-degree**: quantity of incoming connections to a node * **out-degree**: quantity of outgoing connections from a node * **size**: number of nodes (in a network or cluster) * **centrality**: the most important nodes in the network (according to “degree”), it can be global (the whole network), or local (a cluster) * **authorities**: nodes with a high number of in-degree edges * **hubs**: nodes with a high number of out-degree edges * **density**: number of connections divided by the potential number of connections (a highly dense network has a lot of edges, regardless of the number of nodes) * **main component**: the main part of the network, separated from the disconnected nodes that can form a ring (cf. Venturini & Jacomy, 2015, p. 7, fig. 4) * **structural holes**: empty zones in between clusters, denote absence of connections * **star**: centralized structures, denote an authority or a hub * **cliques**: group of nodes with many connections between each other * **typology**: category of a node, edge, or cluster * **topology**: spatial localization of a node, edge, or cluster ## Gephi https://gephi.org * Open Source visualisation and exploration of graphs (networks) * Based on java * Works with .gexf and .gdf native files, or with 2 tabular files (a node spreadsheet and an edge spreadsheet) ## Main tabs ### DATA LABORATORY TAB * Here you can see 2 tables: a. A node table (which includes a node ID, along other columns) b. An edges table (which includes an edge ID, source node ID, and target node ID, along other columns) * You can modify the content directly on the tables, or add more tables manually. However, this is not recommended at this moment. ### OVERVIEW TAB * You have 5 windows: * **Appearance**: allows you to change the size, color, and labels of the nodes and edges of your graph, according to certain rules * **Layout**: allows you to change and tweak the layout of the graph (the "gravity" rules) * **Graph**: here you have a visual of the graph, an some direct access to changes in the appearance. Useful buttons here are "Center on Graph" (low left), which allows you to recenter the graph if you get lost; and "Edit, edit node attributes" (middle left), which allows you to see and modify a specific node information without going to the data laboratory tab. * **Context**: shows basic information of the graph (number of nodes, edges, and if it is directed) * **Filters and Statistics** : will show you statistics and allow you to filter parts of the information within the graph (we won't use this window for the moment) ### PREVIEW TAB - The preview tab will show a stylised version of the graph (you'll have to click refresh to see new changes), mostly to export a final visual. You can modify what is shown, and export as a a vector or bitmap image, or pdf document (svg, png, pdf). **Gephi quickstart guide**: https://www.slideshare.net/gephi/gephi-quick-start ### Activity (max. 2 people) DATA-QUERY INFO Bin: GEO Query: (whole dataset) Dates: 2020-09-16 / 2020-10-28 No. tweets: 11753 No. of distinct users: 625 1. Download any of the [example files](https://drive.google.com/drive/folders/1OkUea102RwsTsrmeES9NkaxuNoISM_tg?usp=sharing): * **Social graph by mentions (all dataset)**: (*directed*) "DS19_cc_GEO_Aarhus-20200916-20201028------------mention-_Top0-9654fe3ff4.gdf" * **Co-hashtag (top 500) graph** (*undirected*): "DS19_cc_GEO_Aarhus-20200916-20201028------------hashtagCooc-_Top500-9654fe3ff4.gdf" * **Hashtag-URL graph** (*undirected*): "DS19_cc_GEO_Aarhus-20200916-20201028------------sourceHashtag--9654fe3ff4.gexf" * **Hashtag-user graph** (*directed*): "DS19_cc_GEO_Aarhus-20200916-20201028------------hashtagUser--9654fe3ff4.gexf" 2. Open gephi (it should be an executable). 3. Select one of the sample files. 4. Open graph -> check information -> (directed or indirected, usually gephi will detect this). A new graph should appear (nodes appear randomly) 5. **Make sure to note down all your steps, remember there is no undo button** 🤷‍ **LAYOUT WINDOW** 6. Change the layout: select “choose a layout” in the left menu and press “run” * “Force Atlas 2” will *exaggerate* the distance of the nodes (you’ll see more clearly which nodes are closer to each other) * “Fruchterman Reingold” will try to position the nodes in a spherical model * “Label Adjust” will create some space between nodes for label readability * “Noverlap” will create distance in overlapping nodes (so you can see all the nodes at once) * Each time you run a new layout it will run on the previous layout (is an additive process). This means that you need to experiment with the layouts first, to know which one will be useful. 7. Identify structures from the glossary. **STATISTICS WINDOW** 8. Run the modularity algorithm (leave the default options) **APPEARANCE WINDOW** 9. Change the size of the nodes, according to their degree, in-degree, or out-degree (remembert to press "Apply" to see your changes) 10. Change the color of the nodes, according to their modularity **GRAPH WINDOW** 11. Open the bottom menu (bottom right corner of the graph window) 12. Make the labels visible (tab labels -> click on node square), and change their size to "node size" 13. (if your labels are not very informative, press "configure" and select a more appropiate node attribute to show as label) 14. Go to the Preview tab and export your graph ## Activity (groups) 1. You now have access to the TCAT server (see email/discord) 2. Continue working on your research question 3. Try some queries to familiarize with the kind of data you have access to now (remember that high activity on the server, as well as long queries will take time to process. It is better to start with small queries ~~or small bins~~). You can get files for: a) statistics b) data b) networks files.