# Credit Queries Indra, You'll find a set of useful queries below, but I don't want you to limit your consideration of our data to just these queries. The queries below are the result of a multi-month project to explore our ability to predict credit ratings using our NLP data in conjunction with traditional financial and structural assets. I'll walk you through a bit of the generative process that we go through when trying to craft queries to target a certain type of feature that we suspect may be interesting. Because we're generally after features relating to specific entities, and the various metrics and observations that relate to them, we need to limit the ever-growing universe of companies that Forge knows about to the ones that you're interested in. It's important to note that there are three different definitions of "entity" that we will use here -- the entity label that is observed in the text, the entity's URI after that entity has been grounded in the KG, and the entity's ticker that comes from Citi having matched their dataset against the Forge KG. We do this with a simple join: ```SQL= select uri.ticker as ticker from EntitySubsidiaryOf company join citi.credit.ticker_uri uri on (uri.uri = company.uripath) ; ``` This join isn't enough to tell us much, so we'll need more data to start to interpret it. One thing that's frequently helpful is a notion of time. We can grab this from the documents table, which includes both an extracted `docdatetime` as well as the time when the document entered Anvil, `collectdatetime`. Unfortunately, not all documents have a `docdatetime` to extract, but we can approximate that time with the `collectdatetime` as below. To help solve the issue of correctly resolved, but irrelevant mentions, we've built and deployed a model called `EntitySalience` that provides a measure of how central a given entity is to the document. ```SQL= select uri.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date from EntitySubsidiaryOf company join citi.credit.ticker_uri uri on (uri.uri = company.uripath) join Documents d on (company.docid = d.docid) join EntitySaliencies es on (company.docid = es.docid and company.entityid = es.entityid and es.saliency > 0.5) ; ``` From here, we now have enough data to construct a simple feature such as whether an entity is "surging" in its mentions in the popular press. In fact, this turns out to be a fairly useful feature for human analysts and credit rating predictions alike, and I've presented a version of this query in the section titled "Company Counts". Because most of our NLP models are capable of providing information at the sentence level, rather than the more traditional document scope, it's frequently valuable to join on `EntityLocations` so that we can ensure that the score/theme/event/etc and the entity are actually related. Intuitively, this is quite sensible. We can start by getting the locations of our entity in question: ```SQL= select uri.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date from EntitySubsidiaryOf company join citi.credit.ticker_uri uri on (uri.uri = company.uripath) join Documents d on (company.docid = d.docid) join EntitySaliencies es on (company.docid = es.docid and company.entityid = es.entityid and es.saliency > 0.5) join EntityLocations el on (company.docid = el.docid and company.entityid = el.entityid) ; ``` With this query prototype in hand, we can construct a variety of different features like the ones that are presented below. The section titled "Covenant Triggers", for example, takes the above prototype and looks for mentions of a company in the same sentence as discussion of the covenants. ## Intrinsic Events This query picks up mentions of intrinsic events when mentioned near a relevant entity and scores the intrinsic events based on the user-defined scores for the intrinsic event classes. ```SQL= select distinct u.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, ie.label, ie.class, ie.category, iff( lower(ie.category) in ('contract', 'financing', 'communication', 'transaction', 'product'), 1, iff( lower(ie.category) in ('legal', 'investigation', 'terrorism', 'criminal', 'disruption', 'naturalDisaster', 'basel', 'cyber'), -1, 0 ) ) as event_polarity from EntitySubsidiaryOf e join citi.credit.ticker_uri u on (e.uripath = u.uri) join EntityLocations el on ( el.docid = e.docid and el.entityid = e.entityid ) join IntrinsicEvents ie on ( ie.docid = e.docid and ie.sentenceid = el.sentenceid ) join Documents d on ( e.docid = d.docid and date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) between '2014-01-01' and '2021-01-01' ) ; ``` ## Sector Sentiment This query provides the sentiment of a given sector. This is useful for two intuitive reasons: 1. For companies with a small digital wake, this provides a strong prior for the company itself. 2. We frequently observe correlated movement at the sector level for a variety of macroeconomic and structural reasons With the credit-rating work we've found that polarity was as useful of a feature to include as the sentiment magnitude as well. It's unclear if you'll find the same, but it'll be interesting to hear about your results. ```SQL= select k.sector, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, avg(ess.sentiment) as sentiment, stddev(ess.sentiment) as variability, iff(avg(ess.sentiment) > 0, 1, -1) as polarity from EntitySubsidiaryOf company join citi.credit.ticker_uri uri on (uri.uri = company.uripath) join Documents d on (company.docid = d.docid) join EntitySaliencies es on (company.docid = es.docid and company.entityid = es.entityid and es.saliency > 0.5) join forgeai_kg_v0640.kg_v0640.Companies k on (company.uripath = k.uripath) join EntitySentiments ess on (company.docid = ess.docid and company.entityid = ess.entityid) group by 1, 2 ; ``` ## Ticker Sentiment ```SQL= select uri.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, avg(ess.sentiment) as sentiment, stddev(ess.sentiment) as variability, iff(avg(ess.sentiment) > 0, 1, -1) as polarity from EntitySubsidiaryOf company join citi.credit.ticker_uri uri on (uri.uri = company.uripath) join Documents d on (company.docid = d.docid) join EntitySaliencies es on (company.docid = es.docid and company.entityid = es.entityid and es.saliency > 0.5) join EntitySentiments ess on (company.docid = ess.docid and company.entityid = ess.entityid) group by 1, 2 ; ``` ## Global Sentiment It's worth reweighting the per-company sentiment trends based on the _global_ sentiment for all entities for that day. It's possible to imagine a situation where the entire market has a negative sentiment, while an individual stock is extremely positive, which is an extremely good signal to capture. We can look to Moderna's (MRNA) recent stock swings as a great example of this. ```SQL= select date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, count(d.docid, dt.name, ea.entityid) as counts, (1-exp(-counts/20000))/(1+exp(-counts/20000)) * avg(iff(es.sentiment > 0, 1, -1)) as sentiment, (1-exp(-counts/20000))/(1+exp(-counts/20000)) * stddev(iff(es.sentiment > 0, 1, -1)) as variability from EntityAssertions ea join DocumentSentiments ds on ea.docid = ds.docid join Documents d on ea.docid = d.docid join DocumentTopics dt on ea.docid = dt.docid join EntitySentiments es on ea.docid = es.docid and ea.entityid = es.entityid join EntityLocations el on ea.docid = el.docid and el.entityid = ea.entityid where d.docdatetime is not null and el.sentenceid between dt.startsentenceid and dt.endsentenceid group by 1 ``` ## Ratings Triggers This query pulls out analyst mentions of both stock and bond ratings events. ```SQL= select u.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, min(sent.sentiment) as min_sentiment, avg(sent.sentiment) as average_sentiment, max(sent.sentiment) as max_sentiment from EntitySubsidiaryOf company join citi.credit.ticker_uri u on (company.uripath = u.uri) join EntityLocations lc on ( lc.docid = company.docid and lc.entityid = company.entityid ) join Entities rte on ( rte.docid = company.docid and rte.label ilike any( 'upgrades%', 'downgrade%', 'sold', 'buy%', 'bought', 'hold', 'negative watch', 'negative outlook', 'over%perform%', 'under%perform%', 'market%perform%', 'peer%perform%', 'neutral rating', 'neutral', 'overweight%', 'underweight%' ) ) join EntityLocations rtel on ( rte.docid = rtel.docid and rte.entityid = rtel.entityid and rtel.sentenceid = lc.sentenceid ) join EntitySentiments sent on ( sent.docid = company.docid and sent.entityid = company.entityid ) join Documents d on ( d.docid = company.docid and nvl(d.docdatetime, d.collectdatetime) between '2014-01-01' and '2021-01-01' ) group by ticker, date ``` ## Raw Theme Distribution This query gives you a per-day break down of a company's mentions for all observed themes. ```SQL= select u.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, t.name, avg(t.score) as average_score, count(*) as num_theme_mentions from EntitySubsidiaryOf company join citi.credit.ticker_uri u on (company.uripath = u.uri) join EntityLocations el on ( el.docid = company.docid and el.entityid = company.entityid ) join Documenttopics t on ( t.docid = company.docid and el.sentenceid between t.startsentenceid and t.endsentenceid ) join Documents d on ( company.docid = d.docid and nvl(d.docdatetime, d.collectdatetime) between '2014-01-01' and '2021-01-01' ) group by 1, 2, 3 ``` ## Company Counts We've found that company-counts are frequently very useful in understanding whether something anomalous is going on with a company. This query does so by reporting the number of standard deviations away from the mean an entity's mentions are. ```SQL= with cnts as ( select nvl(to_date(d.docdatetime), to_date(d.collectdatetime) ) as date, u.uri as uri, u.ticker as ticker, count(d.docid) as counts from EntitySubsidiaryOf company join citi.credit.ticker_uri u on (company.uripath = u.uri) join Documents d on (company.docid = d.docid) join EntitySaliencies es on (company.docid = es.docid and company.entityid = es.entityid and es.saliency > 0.5) where date between '2014-01-01' and '2020-07-01' group by 1, 2, 3 order by 1 desc ) select t1.ticker, t1.date, max(t1.counts) as max_counts, ( max_counts - avg(t2.counts) ) / (iff(stddev(t2.counts) = 0, 1, stddev(t2.counts))) as dev from cnts t1 left outer join cnts t2 on ( t1.uri = t2.uri and t2.date between t1.date - 90 and t1.date ) group by t1.date, t1.ticker order by t1.date desc ``` ## Covenant Triggers This query picks out mentions of covenant violations, breaches, waivers, or related events when mentioned in association with an entity itself. For the credit-rating model, we found that the existence of covenant-discussion around an entity was an incredibly important feature. ```SQL= select uri.ticker as ticker, date_trunc("day", nvl(d.docdatetime, d.collectdatetime)) as date, sum(iff(te.label ilike any('covenant%waiver%', 'violation%covenant%', 'breach%violation%', 'covenant%breach%', 'covenant%violation%'), 1, 0)) as covenant_violation, sum(iff(te.label ilike any('%covenant%'), 1, 0)) as covenant_cnt, sum(iff(te.label ilike any('suspending%dividend%', 'dividend%suspension%', 'dividend cut%'), 1, 0)) as dividend_suspension from EntitySubsidiaryOf company join citi.credit.ticker_uri uri on (uri.uri = company.uripath) join EntityLocations el on (company.docid = el.docid and company.entityid = el.entityid) join Documents d on ( company.docid = d.docid and nvl(d.docdatetime, d.collectdatetime) between '2014-01-01' and '2021-01-01' ) join Entities te on ( te.docid = company.docid and te.label ilike any( 'covenant%waiver%', 'covenant%breach%', 'breach%covenant%', 'covenant%violation%', 'violation%covenant%', '%covenant%', 'suspending dividend%', 'dividend suspension%', 'dividend cut%' ) ) join EntityLocations tel on ( te.docid = tel.docid and te.entityid = tel.entityid and tel.sentenceid = el.sentenceid ) group by ticker, date ; ``` ## Conclusion As we've seen, there are many textual features that can be easily incorporated in downstream models. We've presented features that we found were particularly useful in the prediction of credit ratings one year out, but it's easy to imagine that you can use these queries as a starting point for features of your own. For example, imagine that you're interested in index-wide sentiment rather than sector-wide. This would be possible to do by massaging the "Sector Sentiment" query to aggregate by index rather than sector. We hope that this document is the starting point of ideation. Please don't hesitate to reach out to us with questions -- we're always happy to help.