# FEQ - Frequently Executed Queries
- database is around 150gb big, please use reasonable limits.
- summaries are created as materialized views every couple days...
- Keybase (see table) cryptographically verifies identities via links to social media profiles.
## quick size estimation
```explain select count(1) from table```
## find all java devs
```
select login, x.name, array_agg(full_name) projects, array_agg(distinct lang), sum(score) as score
from (
select
u.login, p.full_name, max(u.name) as name, max(p.lang) lang, sum(rs.commits) commits, max(p.stars) stars,
case
when 'java' = lower(max(p.lang)) or 'kotlin' = lower(max(p.lang)) or 'scala' = lower(max(p.lang))
then (log(10, max(p.stars) + 9) * (sum(rs.commits) + sum(rs.pullrequests)*10 + sum(rs.reviews)*100))
else 0
end as score
from angellist_users a, unnest(urls) as url
join github_users u on u.login = substring(url, 20)
join github_contribution_summary s on s.github_id = u.github_id
join github_contribution_summary_by_repository rs on rs.summary_id = s.id
join github_projects p on rs.repository_id = p.github_id
group by u.login, p.full_name
where url like 'https://github.com/%'
) x
group by login, x.name
order by score desc
limit 100
```
- semi random scoring algorithm.
- strip github username from angellist url
- estimated runtime 3min
## Emails from github commits
- group by emails and aliases by user
- not all commits are associated to an github account
- author, committer and signer can be 3 different ppl but are usually the same
- cases where this is not the case:
- pair-programming
- one works on multiple machines and pushes work using the wrong account
- a someone else eg. bot or release manager (committer) publishes someones (author) work
- one user uses a gpg key to cryptographically sign a commit but does not update his gpg key when he changed his email.
```
SELECT
id,
ARRAY_AGG(distinct name) aliases,
ARRAY_AGG(distinct email) emails
FROM (
SELECT
c.author_id as id,
c.author_name as name
c.author_email as email
from github_commits c
UNION ALL
SELECT
c.committer_id as id,
c.committer_name as name
c.committer_email as email
from github_commits c
UNION ALL
SELECT
c.signer_id as id,
c.signer_name as name
c.signer_email as email
from github_commits c
) x
-- left join github_users u on x.id = c.author_id
group by author_id
```
```
select c.committer_id, array_agg(c.committer_name) as names, array_agg(c.committer_email) as emails
from github_commits c
--join github_users u on u.github_id = c.author_id
group by author_id
```