# FEQ - Frequently Executed Queries - database is around 150gb big, please use reasonable limits. - summaries are created as materialized views every couple days... - Keybase (see table) cryptographically verifies identities via links to social media profiles. ## quick size estimation ```explain select count(1) from table``` ## find all java devs ``` select login, x.name, array_agg(full_name) projects, array_agg(distinct lang), sum(score) as score from ( select u.login, p.full_name, max(u.name) as name, max(p.lang) lang, sum(rs.commits) commits, max(p.stars) stars, case when 'java' = lower(max(p.lang)) or 'kotlin' = lower(max(p.lang)) or 'scala' = lower(max(p.lang)) then (log(10, max(p.stars) + 9) * (sum(rs.commits) + sum(rs.pullrequests)*10 + sum(rs.reviews)*100)) else 0 end as score from angellist_users a, unnest(urls) as url join github_users u on u.login = substring(url, 20) join github_contribution_summary s on s.github_id = u.github_id join github_contribution_summary_by_repository rs on rs.summary_id = s.id join github_projects p on rs.repository_id = p.github_id group by u.login, p.full_name where url like 'https://github.com/%' ) x group by login, x.name order by score desc limit 100 ``` - semi random scoring algorithm. - strip github username from angellist url - estimated runtime 3min ## Emails from github commits - group by emails and aliases by user - not all commits are associated to an github account - author, committer and signer can be 3 different ppl but are usually the same - cases where this is not the case: - pair-programming - one works on multiple machines and pushes work using the wrong account - a someone else eg. bot or release manager (committer) publishes someones (author) work - one user uses a gpg key to cryptographically sign a commit but does not update his gpg key when he changed his email. ``` SELECT id, ARRAY_AGG(distinct name) aliases, ARRAY_AGG(distinct email) emails FROM ( SELECT c.author_id as id, c.author_name as name c.author_email as email from github_commits c UNION ALL SELECT c.committer_id as id, c.committer_name as name c.committer_email as email from github_commits c UNION ALL SELECT c.signer_id as id, c.signer_name as name c.signer_email as email from github_commits c ) x -- left join github_users u on x.id = c.author_id group by author_id ``` ``` select c.committer_id, array_agg(c.committer_name) as names, array_agg(c.committer_email) as emails from github_commits c --join github_users u on u.github_id = c.author_id group by author_id ```