# Efficient Hyperbolic Embeddings for Tree Structures - Hyperbolic geometry better represents hierarchical data than Euclidean - Poincaré ball model: $\mathcal{B}^d = \{x \in \mathbb{R}^d : \|x\| < 1\}$ - Distance: $d(x,y) = \text{arccosh}(1 + 2\frac{\|x-y\|^2}{(1-\|x\|^2)(1-\|y\|^2)})$ - Lorentz model isomorphic to Poincaré, uses $(d+1)$-dimensional vectors - AR embeddings generalize hyperbolic, can represent with 1 repel dimension ## References https://arxiv.org/abs/2106.09671 https://github.com/pyt-team/TopoBenchmarkX?tab=readme-ov-file#anchor-tutorials ### Meeting Summary 07/15/24: Applying Topology to BERT-like Architectures for Clinical Tasks #### Key Concepts and Discussion Points 1. **Activation Functions** - **Sparsemax**: Utilized for producing sparse probability distributions, beneficial for applications needing sparsity. 2. **Graph Theory and Topology Integration** - **Sampling Trees from Distributions**: Essential for modeling hierarchical data. - **Encoding Topology**: - **Expander Graphs**: Sparse yet highly connected graphs. - **Yuancai Graphs and DAG Transformers**: Leveraging specific graph structures and architectures. - **Graph Laplacian and Spectral Graphs**: Using eigenvalues to understand graph properties and connectivity. - **Positional Embedding**: Selecting embeddings for graph structures, referencing H4, H6, Carlson, and topological data analysis experts. - **Simplex Generation**: Creating topological structures using hyperbolic geometry. - **Spatial Correlation Errors**: Addressing difficulties in hypothesis testing for spatially correlated data. - **Combinatorial Topological Complex**: Transforming graph representations to topological domains and applying topological neural networks. 3. **Transformer Models and Positional Encoding** - **Adapting Transformers**: - Integrating graph structures and positional encoding. - Enhancing positional awareness of topology and sparsity patterns. - **Synthetic Data for Training**: Using JAX to create and train models with positional encoding and logistic regression. - **Cost Considerations**: Managing high costs of models like ClinicalBERT and GPT-4. 4. **Synthetic Data and Causality** - **Dataset Evaluation and High-Accuracy Generation**: Assessing and generating datasets to explore embedding spaces. - **Medical Data**: Creating synthetic medical data to test model robustness and control causal relationships. - **Synthetic Tasks**: Binary classification and detecting topologies or generating cell complexes. 5. **Application in Healthcare** - **Graph Representations**: Representing healthcare relationships (e.g., patient interactions) with combinatorial complexes. - **Transforming Data**: Converting healthcare data into topological representations to learn underlying structures. 6. **Error Analysis and Challenges** - **Spectral Graph Control**: Managing complexity and processing times for large spectral graphs. - **Formalizing Topological Errors**: Analyzing errors to improve model performance. - **Future Directions**: Identifying applications and improving models through error analysis and complexity management. #### Next Steps and Action Items 1. **Further Study and Integration** - Study the works of H4, H6, Carlson, and others in positional embeddings and topological data analysis. - Explore the use of Sparsemax in relevant applications requiring sparse probability distributions. 2. **Experimental Setup** - Develop synthetic datasets to simulate clinical tasks and test the integration of topology with transformers. - Use JAX to create training programs with UKV standards and positional encoding. - Implement logistic regression with scikit-learn on synthetic datasets. - Convert healthcare data into combinatorial topological complexes. - Explore how graph representations can improve patient-to-hospital and employer-sponsored healthcare modeling. - Formalize a methodology for analyzing topological errors in model training. - Manage the complexity of spectral graphs, focusing on large graphs and their properties, see if we can break Transformer.