1. 請根據資料集,查詢出在目前會員人數男性與女性的會員人數分別是多少?
> userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
gp_userprofile = GROUP userprofile BY $1;
count_userprofile = FOREACH gp_userprofile GENERATE $0, COUNT($1);
dump count_userprofile;
```
(f,137)
(m,182)
```
2. 請根據資料集查詢出每個國家的會員數量,並顯示前五名的國家與會員人數。
>userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
gp_profile = GROUP userprofile BY $3;
count_profile = FOREACH gp_profile GENERATE $0, COUNT($1);
desc_profile = ORDER count_profile BY $1 DESC;
limit_profile = LIMIT desc_profile 5;
dump limit_profile;
```
(Canada,60)
(Germany,35)
(China,34)
(Taiwan,20)
(Poland,19)
```
3. 在會員資料表中,年紀在20至30歲的會員人數共有多少位?
> userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
filter_profile = FILTER userprofile BY $2 > 20 AND NOT $2 > 30;
gp_profile = GROUP filter_profile ALL;
foreach_profile = FOREACH gp_profile GENERATE COUNT($1);
dump foreach_profile;
```
(193)
```
4. 在會員資料表中,年齡最大的會員為幾歲?
> userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
desc_age = ORDER userprofile BY $2 DESC;
limit_age = LIMIT desc_age 1;
foreach_age = FOREACH limit_age GENERATE $2;
dump foreach_age;
```
(75)
```
5. 請依據資料集,查詢出熱門歌曲排行榜中,第7名歌曲名稱為何?
> userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
gp_traname = GROUP userdemand BY $3;
count_traname = FOREACH gp_traname GENERATE $0, COUNT($1);
desc_traname = ORDER count_traname BY $1 DESC;
limit_traname = LIMIT desc_traname 8;
dump limit_traname;
```
(Intro,169)
(?????,80)
(Love Lockdown,74)
(Say You Will,74)
(Welcome To Heartbreak (Feat. Kid Cudi),71)
(Paranoid (Feat. Mr. Hudson),68)
(Heartless,66)
(Bad News,65)
```
- Ans:Bad News
6. 請依據資料集,查詢出點播次數小於2的歌手有幾位?
> userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
gp_artname = GROUP userdemand BY $2;
count_artname = FOREACH gp_artname GENERATE $0, COUNT($1);
filter_artname = FILTER count_artname BY $1 < 2;
gp_userdemand = GROUP filter_artname ALL;
count_userdemand = FOREACH gp_userdemand GENERATE COUNT($1);
dump count_userdemand;
```
(8764)
```
7. 國名B開頭的各國熱門歌曲第一名的名稱分別是什麼,請以國家名稱由A至Z排序?
> userdemand = LOAD '/dataset/999005/999005.tsv' USING PigStorage('\t') AS (userid: chararray, time:chararray, artname: chararray, traname:chararray);
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
> filter_country = FILTER userprofile BY $3 matches 'B.*';
gp_country = GROUP filter_country BY $3;
count_traname = FOREACH gp_country GENERATE $1.$0,$0, COUNT($1);
> join1 = JOIN userdemand BY $0, userprofile BY $0;
foreach_join1 = FOREACH join1 GENERATE $3, $7;
filter_country = FILTER foreach_join1 BY $1 matches 'B.*';
gp_country = GROUP filter_country BY $1;
count_coutry = FOREACH gp_country GENERATE $0,COUNT($1);
> (Brazil,8722)
(Belgium,2967)
(British Indian Ocean Territory,649)
(Bulgaria,364)
> filter_country = FILTER foreach_join1 BY $1 == 'Brazil';
gp_traname = GROUP filter_country BY $0;
count_traname = FOREACH gp_traname GENERATE $0, COUNT($1);
desc_traname = ORDER count_traname BY $1 DESC;
limit_traname = LIMIT desc_traname 2;
dump limit_traname;
```
(Intro,17)
```
Ans: (Brazil,Intro)
> filter_country = FILTER foreach_join1 BY $1 == 'Belgium';
gp_traname = GROUP filter_country BY $0;
count_traname = FOREACH gp_traname GENERATE $0, COUNT($1);
desc_traname = ORDER count_traname BY $1 DESC;
limit_traname = LIMIT desc_traname 2;
dump limit_traname;
```
(Propane Nightmares,22)
```
Ans: (Belgium, Propane Nightmares)
> filter_country = FILTER foreach_join1 BY $1 == 'British Indian Ocean Territory';
gp_traname = GROUP filter_country BY $0;
count_traname = FOREACH gp_traname GENERATE $0, COUNT($1);
desc_traname = ORDER count_traname BY $1 DESC;
limit_traname = LIMIT desc_traname 2;
dump limit_traname;
```
(Howl,6)
```
Ans: (British Indian Ocean Territory,Howl)
> filter_country = FILTER foreach_join1 BY $1 == 'Bulgaria';
gp_traname = GROUP filter_country BY $0;
count_traname = FOREACH gp_traname GENERATE $0, COUNT($1);
desc_traname = ORDER count_traname BY $1 DESC;
limit_traname = LIMIT desc_traname 2;
dump limit_traname;
(Swallowed,4)
Ans: (Bulgaria,Swallowed)
(Brazil,Intro)
(Belgium, Propane Nightmares)
(British Indian Ocean Territory,Howl)
(Bulgaria,Swallowed)
8. 請問女性的會員數是多少?
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
filter_profile = FILTER userprofile BY $1 == 'f';
gp_userprofile = GROUP filter_profile BY $1;
count_userprofile = FOREACH gp_userprofile GENERATE $0, COUNT($1);
dump count_userprofile;
(f,137)
9. 在會員資料表中,年齡最小的會員為幾歲?
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
asc_age = ORDER userprofile BY $2 ASC;
limit_age = LIMIT asc_age 1;
foreach_age = FOREACH limit_age GENERATE $2;
dump foreach_age;
(3)
10. 請依據資料集,查詢出會員數量排行,並顯示出人數第2多的國家為何?
userprofile = LOAD '/dataset/999004/999004.tsv' USING PigStorage('\t') AS (userid: chararray, gender:chararray, age: int, country:chararray, registered: chararray);
gp_country = GROUP userprofile BY $3;
count_coutry = FOREACH gp_country GENERATE $0, COUNT($1);
desc_country = ORDER count_coutry BY $1 DESC;
limit_country = LIMIT desc_country 2;
dump limit_country;
(Canada,60)
(Germany,35)
Ans: Germany