# Credit Queries
Indra,
You'll find a set of useful queries below, but I don't want you to limit your consideration of our data to just these queries. The queries below are the result of a multi-month project to explore our ability to predict credit ratings using our NLP data in conjunction with traditional financial and structural assets. I'll walk you through a bit of the generative process that we go through when trying to craft queries to target a certain type of feature that we suspect may be interesting.
Because we're generally after features relating to specific entities, and the various metrics and observations that relate to them, we need to limit the ever-growing universe of companies that Forge knows about to the ones that you're interested in. It's important to note that there are three different definitions of "entity" that we will use here -- the entity label that is observed in the text, the entity's URI after that entity has been grounded in the KG, and the entity's ticker that comes from Citi having matched their dataset against the Forge KG. We do this with a simple join:
```SQL=
select
uri.ticker as ticker
from
EntitySubsidiaryOf company
join citi.credit.ticker_uri uri on (uri.uri = company.uripath)
;
```
This join isn't enough to tell us much, so we'll need more data to start to interpret it. One thing that's frequently helpful is a notion of time. We can grab this from the documents table, which includes both an extracted `docdatetime` as well as the time when the document entered Anvil, `collectdatetime`. Unfortunately, not all documents have a `docdatetime` to extract, but we can approximate that time with the `collectdatetime` as below. To help solve the issue of correctly resolved, but irrelevant mentions, we've built and deployed a model called `EntitySalience` that provides a measure of how central a given entity is to the document.
```SQL=
select
uri.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date
from
EntitySubsidiaryOf company
join citi.credit.ticker_uri uri on (uri.uri = company.uripath)
join Documents d on (company.docid = d.docid)
join EntitySaliencies es on (company.docid = es.docid
and company.entityid = es.entityid
and es.saliency > 0.5)
;
```
From here, we now have enough data to construct a simple feature such as whether an entity is "surging" in its mentions in the popular press. In fact, this turns out to be a fairly useful feature for human analysts and credit rating predictions alike, and I've presented a version of this query in the section titled "Company Counts".
Because most of our NLP models are capable of providing information at the sentence level, rather than the more traditional document scope, it's frequently valuable to join on `EntityLocations` so that we can ensure that the score/theme/event/etc and the entity are actually related. Intuitively, this is quite sensible. We can start by getting the locations of our entity in question:
```SQL=
select
uri.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date
from
EntitySubsidiaryOf company
join citi.credit.ticker_uri uri on (uri.uri = company.uripath)
join Documents d on (company.docid = d.docid)
join EntitySaliencies es on (company.docid = es.docid
and company.entityid = es.entityid
and es.saliency > 0.5)
join EntityLocations el on (company.docid = el.docid and company.entityid = el.entityid)
;
```
With this query prototype in hand, we can construct a variety of different features like the ones that are presented below. The section titled "Covenant Triggers", for example, takes the above prototype and looks for mentions of a company in the same sentence as discussion of the covenants.
## Intrinsic Events
This query picks up mentions of intrinsic events when mentioned near a relevant entity and scores the intrinsic events based on the user-defined scores for the intrinsic event classes.
```SQL=
select distinct
u.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
ie.label,
ie.class,
ie.category,
iff(
lower(ie.category) in ('contract', 'financing', 'communication', 'transaction', 'product'),
1,
iff(
lower(ie.category) in ('legal', 'investigation', 'terrorism', 'criminal', 'disruption', 'naturalDisaster', 'basel', 'cyber'),
-1,
0
)
) as event_polarity
from
EntitySubsidiaryOf e
join citi.credit.ticker_uri u on (e.uripath = u.uri)
join EntityLocations el on (
el.docid = e.docid and
el.entityid = e.entityid
)
join IntrinsicEvents ie on (
ie.docid = e.docid and
ie.sentenceid = el.sentenceid
)
join Documents d on (
e.docid = d.docid and
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) between '2014-01-01' and '2021-01-01'
)
;
```
## Sector Sentiment
This query provides the sentiment of a given sector. This is useful for two intuitive reasons:
1. For companies with a small digital wake, this provides a strong prior for the company itself.
2. We frequently observe correlated movement at the sector level for a variety of macroeconomic and structural reasons
With the credit-rating work we've found that polarity was as useful of a feature to include as the sentiment magnitude as well. It's unclear if you'll find the same, but it'll be interesting to hear about your results.
```SQL=
select
k.sector,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
avg(ess.sentiment) as sentiment,
stddev(ess.sentiment) as variability,
iff(avg(ess.sentiment) > 0, 1, -1) as polarity
from
EntitySubsidiaryOf company
join citi.credit.ticker_uri uri on (uri.uri = company.uripath)
join Documents d on (company.docid = d.docid)
join EntitySaliencies es on (company.docid = es.docid
and company.entityid = es.entityid
and es.saliency > 0.5)
join forgeai_kg_v0640.kg_v0640.Companies k on (company.uripath = k.uripath)
join EntitySentiments ess on (company.docid = ess.docid and company.entityid = ess.entityid)
group by 1, 2
;
```
## Ticker Sentiment
```SQL=
select
uri.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
avg(ess.sentiment) as sentiment,
stddev(ess.sentiment) as variability,
iff(avg(ess.sentiment) > 0, 1, -1) as polarity
from
EntitySubsidiaryOf company
join citi.credit.ticker_uri uri on (uri.uri = company.uripath)
join Documents d on (company.docid = d.docid)
join EntitySaliencies es on (company.docid = es.docid
and company.entityid = es.entityid
and es.saliency > 0.5)
join EntitySentiments ess on (company.docid = ess.docid and company.entityid = ess.entityid)
group by 1, 2
;
```
## Global Sentiment
It's worth reweighting the per-company sentiment trends based on the _global_ sentiment for all entities for that day. It's possible to imagine a situation where the entire market has a negative sentiment, while an individual stock is extremely positive, which is an extremely good signal to capture. We can look to Moderna's (MRNA) recent stock swings as a great example of this.
```SQL=
select
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
count(d.docid, dt.name, ea.entityid) as counts,
(1-exp(-counts/20000))/(1+exp(-counts/20000)) * avg(iff(es.sentiment > 0, 1, -1)) as sentiment,
(1-exp(-counts/20000))/(1+exp(-counts/20000)) * stddev(iff(es.sentiment > 0, 1, -1)) as variability
from EntityAssertions ea
join DocumentSentiments ds on ea.docid = ds.docid
join Documents d on ea.docid = d.docid
join DocumentTopics dt on ea.docid = dt.docid
join EntitySentiments es on ea.docid = es.docid and ea.entityid = es.entityid
join EntityLocations el on ea.docid = el.docid and el.entityid = ea.entityid
where
d.docdatetime is not null
and
el.sentenceid between dt.startsentenceid and dt.endsentenceid
group by 1
```
## Ratings Triggers
This query pulls out analyst mentions of both stock and bond ratings events.
```SQL=
select
u.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
min(sent.sentiment) as min_sentiment,
avg(sent.sentiment) as average_sentiment,
max(sent.sentiment) as max_sentiment
from EntitySubsidiaryOf company
join citi.credit.ticker_uri u on (company.uripath = u.uri)
join EntityLocations lc on (
lc.docid = company.docid and
lc.entityid = company.entityid
)
join Entities rte on (
rte.docid = company.docid and
rte.label ilike any(
'upgrades%',
'downgrade%',
'sold',
'buy%',
'bought',
'hold',
'negative watch',
'negative outlook',
'over%perform%',
'under%perform%',
'market%perform%',
'peer%perform%',
'neutral rating',
'neutral',
'overweight%',
'underweight%'
)
)
join EntityLocations rtel on (
rte.docid = rtel.docid and
rte.entityid = rtel.entityid
and rtel.sentenceid = lc.sentenceid
)
join EntitySentiments sent on (
sent.docid = company.docid and
sent.entityid = company.entityid
)
join Documents d on (
d.docid = company.docid and
nvl(d.docdatetime, d.collectdatetime) between '2014-01-01' and '2021-01-01'
)
group by ticker, date
```
## Raw Theme Distribution
This query gives you a per-day break down of a company's mentions for all observed themes.
```SQL=
select
u.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
t.name,
avg(t.score) as average_score,
count(*) as num_theme_mentions
from EntitySubsidiaryOf company
join citi.credit.ticker_uri u on (company.uripath = u.uri)
join EntityLocations el on (
el.docid = company.docid and
el.entityid = company.entityid
)
join Documenttopics t on (
t.docid = company.docid and
el.sentenceid between t.startsentenceid and t.endsentenceid
)
join Documents d on (
company.docid = d.docid and
nvl(d.docdatetime, d.collectdatetime) between '2014-01-01' and '2021-01-01'
)
group by 1, 2, 3
```
## Company Counts
We've found that company-counts are frequently very useful in understanding whether something anomalous is going on with a company. This query does so by reporting the number of standard deviations away from the mean an entity's mentions are.
```SQL=
with cnts as
(
select
nvl(to_date(d.docdatetime), to_date(d.collectdatetime) ) as date,
u.uri as uri,
u.ticker as ticker,
count(d.docid) as counts
from EntitySubsidiaryOf company
join citi.credit.ticker_uri u on (company.uripath = u.uri)
join Documents d on (company.docid = d.docid)
join EntitySaliencies es on (company.docid = es.docid
and company.entityid = es.entityid
and es.saliency > 0.5)
where date between '2014-01-01' and '2020-07-01'
group by
1,
2,
3
order by
1 desc
)
select
t1.ticker,
t1.date,
max(t1.counts) as max_counts,
(
max_counts - avg(t2.counts)
)
/ (iff(stddev(t2.counts) = 0, 1, stddev(t2.counts))) as dev
from
cnts t1
left outer join
cnts t2 on (
t1.uri = t2.uri and
t2.date between t1.date - 90 and t1.date
)
group by
t1.date,
t1.ticker
order by
t1.date desc
```
## Covenant Triggers
This query picks out mentions of covenant violations, breaches, waivers, or related events when mentioned in association with an entity itself. For the credit-rating model, we found that the existence of covenant-discussion around an entity was an incredibly important feature.
```SQL=
select
uri.ticker as ticker,
date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date,
sum(iff(te.label ilike any('covenant%waiver%', 'violation%covenant%', 'breach%violation%', 'covenant%breach%', 'covenant%violation%'), 1, 0)) as covenant_violation,
sum(iff(te.label ilike any('%covenant%'), 1, 0)) as covenant_cnt,
sum(iff(te.label ilike any('suspending%dividend%', 'dividend%suspension%', 'dividend cut%'), 1, 0)) as dividend_suspension
from EntitySubsidiaryOf company
join citi.credit.ticker_uri uri on (uri.uri = company.uripath)
join EntityLocations el on (company.docid = el.docid and company.entityid = el.entityid)
join Documents d on (
company.docid = d.docid and
nvl(d.docdatetime, d.collectdatetime) between '2014-01-01' and '2021-01-01'
)
join Entities te on (
te.docid = company.docid and
te.label ilike any(
'covenant%waiver%',
'covenant%breach%',
'breach%covenant%',
'covenant%violation%',
'violation%covenant%',
'%covenant%',
'suspending dividend%',
'dividend suspension%',
'dividend cut%'
)
)
join EntityLocations tel on (
te.docid = tel.docid and
te.entityid = tel.entityid and
tel.sentenceid = el.sentenceid
)
group by ticker, date
;
```
## Conclusion
As we've seen, there are many textual features that can be easily incorporated in downstream models. We've presented features that we found were particularly useful in the prediction of credit ratings one year out, but it's easy to imagine that you can use these queries as a starting point for features of your own. For example, imagine that you're interested in index-wide sentiment rather than sector-wide. This would be possible to do by massaging the "Sector Sentiment" query to aggregate by index rather than sector. We hope that this document is the starting point of ideation. Please don't hesitate to reach out to us with questions -- we're always happy to help.