# Efficient Hyperbolic Embeddings for Tree Structures
- Hyperbolic geometry better represents hierarchical data than Euclidean
- Poincaré ball model: $\mathcal{B}^d = \{x \in \mathbb{R}^d : \|x\| < 1\}$
- Distance: $d(x,y) = \text{arccosh}(1 + 2\frac{\|x-y\|^2}{(1-\|x\|^2)(1-\|y\|^2)})$
- Lorentz model isomorphic to Poincaré, uses $(d+1)$-dimensional vectors
- AR embeddings generalize hyperbolic, can represent with 1 repel dimension
## References
https://arxiv.org/abs/2106.09671
https://github.com/pyt-team/TopoBenchmarkX?tab=readme-ov-file#anchor-tutorials
### Meeting Summary 07/15/24: Applying Topology to BERT-like Architectures for Clinical Tasks
#### Key Concepts and Discussion Points
1. **Activation Functions**
- **Sparsemax**: Utilized for producing sparse probability distributions, beneficial for applications needing sparsity.
2. **Graph Theory and Topology Integration**
- **Sampling Trees from Distributions**: Essential for modeling hierarchical data.
- **Encoding Topology**:
- **Expander Graphs**: Sparse yet highly connected graphs.
- **Yuancai Graphs and DAG Transformers**: Leveraging specific graph structures and architectures.
- **Graph Laplacian and Spectral Graphs**: Using eigenvalues to understand graph properties and connectivity.
- **Positional Embedding**: Selecting embeddings for graph structures, referencing H4, H6, Carlson, and topological data analysis experts.
- **Simplex Generation**: Creating topological structures using hyperbolic geometry.
- **Spatial Correlation Errors**: Addressing difficulties in hypothesis testing for spatially correlated data.
- **Combinatorial Topological Complex**: Transforming graph representations to topological domains and applying topological neural networks.
3. **Transformer Models and Positional Encoding**
- **Adapting Transformers**:
- Integrating graph structures and positional encoding.
- Enhancing positional awareness of topology and sparsity patterns.
- **Synthetic Data for Training**: Using JAX to create and train models with positional encoding and logistic regression.
- **Cost Considerations**: Managing high costs of models like ClinicalBERT and GPT-4.
4. **Synthetic Data and Causality**
- **Dataset Evaluation and High-Accuracy Generation**: Assessing and generating datasets to explore embedding spaces.
- **Medical Data**: Creating synthetic medical data to test model robustness and control causal relationships.
- **Synthetic Tasks**: Binary classification and detecting topologies or generating cell complexes.
5. **Application in Healthcare**
- **Graph Representations**: Representing healthcare relationships (e.g., patient interactions) with combinatorial complexes.
- **Transforming Data**: Converting healthcare data into topological representations to learn underlying structures.
6. **Error Analysis and Challenges**
- **Spectral Graph Control**: Managing complexity and processing times for large spectral graphs.
- **Formalizing Topological Errors**: Analyzing errors to improve model performance.
- **Future Directions**: Identifying applications and improving models through error analysis and complexity management.
#### Next Steps and Action Items
1. **Further Study and Integration**
- Study the works of H4, H6, Carlson, and others in positional embeddings and topological data analysis.
- Explore the use of Sparsemax in relevant applications requiring sparse probability distributions.
2. **Experimental Setup**
- Develop synthetic datasets to simulate clinical tasks and test the integration of topology with transformers.
- Use JAX to create training programs with UKV standards and positional encoding.
- Implement logistic regression with scikit-learn on synthetic datasets.
- Convert healthcare data into combinatorial topological complexes.
- Explore how graph representations can improve patient-to-hospital and employer-sponsored healthcare modeling.
- Formalize a methodology for analyzing topological errors in model training.
- Manage the complexity of spectral graphs, focusing on large graphs and their properties, see if we can break Transformer.