# Proof of concept: text chain of trust
###### tags: `research`
The term "chain of trust" comes from the domain of computer security, where it is one of the core principles behind verifying digital signatures. To know whether a signature is valid, a list of endorsing signatures is added to the original signature. These endorsing signatures can in turn have endorsing signatures themselves to help verify them, and so forth. When one or more signatures along this chain comes from an entity you explicitly trust (a family member, friend or trustworthy institution), the original signature can be trust implicitly.
This document outlines an experiment to see what it would take to create a chain of trust for sentences in a text article. The idea is to track down articles that support the claim made in the sentence, then track down articles that support the supporting articles, and so forth, until articles from explicitly trusted sources are reached (mainly peer reviewed journal articles). Moreover, we want to visualize this chain of trust so we get an overview of the origins of a claim, including which articles support and refute it, and how trustworthy those articles are in turn.
While with digital signatures, we can deal in absolutes (either a signature endorses another signature or not), with text we will probably have to settle for a score indicating the degree of trust.
## A simple scenario
In academic articles, the idea of a "chain of trust" is explicitly encoded in the usage of citations.
Hence, with these sorts of articles it should be considerably easier to get a basic system going to visualize this chain of trust than for example an article in a newspaper.
The main difficulty here is that a citation usually does not mention the specific sentence in the article that directly supports the claim.
If we want to follow the chain of trust deeper than a single link, we must have a specific sentence to check.
Sentence embeddings, such as those produced by [BERT](https://arxiv.org/abs/1810.04805) and [many other models](https://www.sbert.net/docs/pretrained_models.html), may prove useful here.
A sentence embedding is a numerical representation of a sentence.
(In the case of BERT, each sentence is converted into a vector of 768 numbers.)
These numbers act as coordinates in a vast semantic space.
Sentences that are close in meaning ("The weather is good", "We are having fine weather") will be assigned embeddings which lie close together.
In general, the distance between embeddings (measured as cosine distance) indicates the degree of semantic similarity.
So, if we want to track down a sentence in an article that supports a specific claim, one approach is to compute embeddings for all sentences and compare distances to the embedding of the claim.
## Methods
Let's begin with a simple case: [one of my journal articles](https://doi.org/10.1016/j.neuroimage.2019.116221) citing [another journal article by Hauk et al.](https://doi.org/10.1016/j.neuroimage.2013.10.067).
Elsevier offers [an API for text mining](https://dev.elsevier.com/tecdoc_text_mining.html) that offers up the full text of articles in XML format.
This can be parsed to extract each sentence and any citations supporting the claim made in the sentence.
All sentences where processed by the [`all-MiniLM-L12-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) model created by Microsoft and supplied by [HuggingFace](https://huggingface.co/).
The code and the articles in XML format can be found at https://github.com/wmvanvliet/tcot.
## Results
For this test, only the 4 sentences that cite the second article are considered.
Here are, for each of those sentences, the 5 closest matches in the second article, along with the sentence number and (cosine) distance:
```
019: To facilitate the interpretation of linear models, introduced a way to
transform the weight matrix into a pattern matrix, which is easier to
interpret (see section
From Haufe et al., 2014:
0 0.430 Interpretability of weight vectors always requires a forward model of
the data
1 0.447 For further analysis and visualization, all weight vectors were
normalized
2 0.455 also the linear model presented in
3 0.465 depicts an instance of the fourteen weight vectors calculated in
one particular simulation
4 0.498 Hence, just as in forward modeling, every factor is associated with an
-dimensional weight vector. ,
020: While focused on the computation, visualization and interpretation of the
pattern matrix, they suggest that their work may have applications
stretching beyond model interpretability and form the basis for a method
for incorporating domain information into linear models
From Haufe et al., 2014:
0 0.433 Treating the calculation of activation patterns as a regression
problem provides an interesting perspective, since it suggests a
straightforward way to integrate prior knowledge into the activation
pattern estimation, which could genuinely improve interpretability in
the presence of too few or excessively noisy data
1 0.441 The parameters (called activation patterns) of forward models allow
the exact desired interpretation
2 0.471 We point out that the interpretability of a model depends on the
direction of the functional relationship between observations and
underlying variables: the parameters of forward models are
interpretable, while those of backward models typically are not
3 0.490 Determining the origin of neural processes in time or space from the
parameters of a data-driven model requires what we call a of the data;
such a model explains how the measured data was generated from the
neural sources
4 0.508 Traditionally, the factors of the linear forward model are thought
of as variables aggregating and isolating the problem-specific
information, while the corresponding activation patterns model the
expression of each factor in each channel
175: As pointed out, there is a strong parallel between the pattern matrix
and the concept of a leadfield or “forward solution”, as used in source
estimation
From Haufe et al., 2014:
0 0.496 In both cases, the resulting will be the filter which optimally
extracts the sources with pattern given the covariance structure of
the (new) data
1 0.499 Nonetheless, our goal here is again to find a pattern matrix
indicating those measurement channels in which the extracted factors
are reflected
2 0.504 As with all forward models, the estimated patterns can be interpreted
in the desired way as outlined above
3 0.505 Traditionally, the factors of the linear forward model are thought
of as variables aggregating and isolating the problem-specific
information, while the corresponding activation patterns model the
expression of each factor in each channel
4 0.517 The parameters (called activation patterns) of forward models allow
the exact desired interpretation
194: For example, decoding models are often employed to explore the signal of
interest that was learned, in which case interpretability of the model is
more important
From Haufe et al., 2014:
0 0.396 Besides their simplicity, linear models are often preferred to
nonlinear approaches in decoding studies, because they combine
information from different channels in a weighted sum, which resembles
the working principle of neurons
1 0.424 We point out that the interpretability of a model depends on the
direction of the functional relationship between observations and
underlying variables: the parameters of forward models are
interpretable, while those of backward models typically are not
2 0.461 Backward modeling amounts to transforming the data into a supposedly
more informative representation, in which the signals of interest are
isolated as low-dimensional components or factors
3 0.465 Often it is desired to interpret the outcome of these methods with
respect to the cognitive processes under study
4 0.492 Moreover – in contrast to what may be a widespread intuition – the
interpretability of the parameters cannot be improved by means of
regularization (e.g., sparsification) for such models
```
The model performs poorly for the first sentence (#019), which might be because the text without references is a bit garbled. For the other three sentences, the model performs well!