# 106模擬題
## 999004 - 使用者個人資料
> userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
```
欄位名稱
使用者id userid chararray
性別 gender chararray
年齡 age int
國家 country chararray
註冊時間 registered chararray
```
### (1)總註冊人數最多的五個國家? (2)總註冊人數最少的國家?
> userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid:chararray, gender:chararray,age:int,country:chararray,registered:chararray);
gp_userorofile = GROUP userprofile BY $3;
count_userprofile = FOREACH gp_userorofile GENERATE $0, COUNT($1);
desc_userprofile = ORDER count_userprofile BY $1 DESC;
dump desc_userprofile;
```
(1)
(Canada,60)
(Germany,35)
(China,34)
(Taiwan,20)
(Poland,19)
(2)
(Armenia,1)
(British Indian Ocean Territory,1)
(Bulgaria,1)
(Canada Minor Outlying Islands,1)
(Estonia,1)
(Ireland,1)
(Japan,1)
(Latvia,1)
(Morocco,1)
(Nicaragua,1)
(Peru,1)
(Thailand,1)
(Venezuela,1)
```
### (1)註冊時間最早的三位使用者?(2)註冊時間最晚的三位使用者?
> (1)
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid:chararray, gender:chararray,age:int,country:chararray,registered:chararray);
foreach_profile = FOREACH userprofile GENERATE $0, STRSPLIT($4, '-', 3);
filter_profile = FILTER foreach_profile BY $1.$2 == 4;
dump filter_profile;
```
(user_000210,(12,Feb,04))
(user_000168,(18,Apr,04))
(user_000217,(2,Aug,04))
```
> (2)
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid:chararray, gender:chararray,age:int,country:chararray,registered:chararray);
foreach_profile = FOREACH userprofile GENERATE $0, STRSPLIT($4, '-', 3);
filter_profile = FILTER foreach_profile BY $1.$2 ==7;
dump filter_profile;
```
(user_000246,(2,Sep,07))
(user_000017,(27,Aug,07))
(user_000651,(31,Jul,07))
(user_000289,(31,Jul,07))
```
## 999005 - 使用者點聽紀錄
> userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
```
欄位名稱
使用者id userid chararray
點聽時間 time chararray
歌手名稱 artname chararray
歌曲名稱 traname chararray
```
## 999004 跟 999005 均會用到
### (1)點播總次數最多的三位使用者?(2)點播總次數最少的三位使用者?
> userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
join1 = JOIN userdemand BY $0, userprofile BY $0;
foreach_join1 = FOREACH join1 GENERATE $0, $3;
gp_join1 = GROUP foreach_join1 BY $0;
foreach_join2 = FOREACH gp_join1 GENERATE $0,COUNT($1);
desc_join1 = ORDER foreach_join2 BY $1 DESC;
dump desc_join1;
```
(1)
(user_000233,3459)
(user_000349,3427)
(user_000033,2873)
(2)
(user_000282,6)
(user_000364,3)
(user_000248,1)
(user_000101,1)
(user_000301,1)
```
### (1)被點播總次數最多的歌手? (2)被點播總次數最少的歌手?
> (1)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
join1 = JOIN userdemand BY $0, userprofile BY $0;
gp_join1 = GROUP join1 BY $2;
foreach_join1 = FOREACH gp_join1 GENERATE $0, COUNT($1);
desc_join1 = ORDER foreach_join1 BY $1 DESC;
limit_join1 = LIMIT desc_join1 5;
dump limit_join1;
```
(Radiohead,997)
```
> (2)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
join1 = JOIN userdemand BY $0, userprofile BY $0;
gp_join1 = GROUP join1 BY $2;
foreach_join1 = FOREACH gp_join1 GENERATE $0, COUNT($1);
asc_join1 = ORDER foreach_join1 BY $1 ASC;
limit_join1 = LIMIT asc_join1 5;
dump limit_join1;
```
("Bruno Coulais, The ChildrenS Choir Of Nice, Teri Hatcher, Bernard Paganotti, Hungrarian Symphony Orchestra Budapest & Laurent Petitgirard",1)
(3,1)
(O,1)
(P,1)
(12,1)
```
### (1)男性會員中,哪首歌曲被點播總次數最高?(2)女性會員中,哪首歌曲被點播總次數最高?
> (1)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
join1 = JOIN userdemand BY $0, userprofile BY $0;
foreach_join1 = FOREACH join1 GENERATE $0, $3, $5;
filter_join1 = FILTER foreach_join1 BY $2 == 'm';
gp_join1 = GROUP filter_join1 BY $1;
foreach_join2 = FOREACH gp_join1 GENERATE $0,COUNT($1);
desc_join1 = ORDER foreach_join2 BY $1 DESC;
limit_join1 = LIMIT desc_join1 5;
dump limit_join1;
```
(Intro,120)
```
> (2)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
join1 = JOIN userdemand BY $0, userprofile BY $0;
foreach_join1 = FOREACH join1 GENERATE $0, $3, $5;
filter_join1 = FILTER foreach_join1 BY $2 == 'f';
gp_join1 = GROUP filter_join1 BY $1;
foreach_join2 = FOREACH gp_join1 GENERATE $0,COUNT($1);
desc_join1 = ORDER foreach_join2 BY $1 DESC;
limit_join1 = LIMIT desc_join1 5;
dump limit_join1;
```
(?????,71)
```
### (1)18歲 ~ 24歲的會員中 (包括18歲與24歲),哪首歌曲被點播總次數最高? (2)25歲 ~ 34歲的會員中 (包括25歲與34歲),哪首歌曲被點播總次數最高?
> (1)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
filter_profile = FILTER userprofile BY $2 > 17 AND NOT $2 > 24;
join1 = JOIN userdemand BY $0, filter_profile BY $0;
gp_join1 = GROUP join1 BY $3;
count_join1 = FOREACH gp_join1 GENERATE $0, COUNT($1);
desc_join1 = ORDER count_join1 BY $1 DESC;
limit_join1 = LIMIT desc_join1 5;
dump limit_join1;
```
(Intro,97)
```
> (2)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
filter_profile = FILTER userprofile BY $2 > 24 AND NOT $2 > 34;
join1 = JOIN userdemand BY $0, filter_profile BY $0;
gp_join1 = GROUP join1 BY $3;
count_join1 = FOREACH gp_join1 GENERATE $0, COUNT($1);
desc_join1 = ORDER count_join1 BY $1 DESC;
limit_join1 = LIMIT desc_join1 5;
dump limit_join1;
```
(Intro,58)
```
### (1)美國熱門歌曲排行前5名?(2)英國熱門歌曲排行前5名?
> (1)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
filter_profile = FILTER userprofile BY $3 == 'United States';
join1 = JOIN userdemand BY $0, filter_profile BY $0;
gp_join1 = GROUP join1 BY $3;
count_join1 = FOREACH gp_join1 GENERATE $0, COUNT($1);
desc_join1 = ORDER count_join1 BY $1 DESC;
limit_join1 = LIMIT desc_join1 5;
dump limit_join1;
```
(Atlantis To Interzone,20)
("As Above, So Below",19)
(Golden Skans,17)
(GravityS Rainbow,17)
(Vengeance,16)
```
> (2)
userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
filter_profile = FILTER userprofile BY $3 == 'United Kingdom';
join1 = JOIN userdemand BY $0, filter_profile BY $0;
gp_join1 = GROUP join1 BY $3;
count_join1 = FOREACH gp_join1 GENERATE $0, COUNT($1);
desc_join1 = ORDER count_join1 BY $1 DESC;
limit_join1 = LIMIT desc_join1 20;
dump limit_join1;
```
(Dex End Credit / Dtw End,5)
(Mopedgang,5)
(Me Gustas Tu,4)
(Das Lied Mit Den Suggestivfragen,4)
(La Parade,4)
```