# IDS Final Project [TOC] ## Introduction In this project, we want to predict the selected players from the nominees for **All-MLB Team**. For more information, please refer to [all-mlb-team | Wikipedia](https://en.wikipedia.org/wiki/All-MLB_Team). ## Data The data is splitted into two sets: training and testing, which are 2019 to 2022 and 2023, respectively. Hence, we need to fetch - The players' performance from 2019 to 2023 - The nominees and the winners of All-MLB Team from 2019 to 2023 - Note that we don't have the data of 2023 since it's not announced yet when doing this project, but the results will be revealed when we present, so we still list it as testing data ### Introduction We obtain the nominees and members of All-MLB team by copying and pasting from web page since the number of player is few and we think it does not require a script to do this. - [Results](https://en.wikipedia.org/wiki/All-MLB_Team) - [2022 Nominees](https://www.mlb.com/news/all-mlb-team-2022-nominees-vote) It is rearranged as JSON format, and a sample is as below ```jsonld ... "2022": { "nominees": { "C": [661388, 575929, 518595, 672386, 669221, 663728, 592663, 668939, 669257, 624431], "1B": [547989, 624413, 650333, 518692, 502671, 665489, 663993, 621566, 572233], "2B": [514888, 630105, 669242, 665926, 643446, 663898, 543760, 650402], "SS": [642715, 666182, 593428, 621043, 596019, 665161, 608369, 621020, 607208, 677951], "3B": [571448, 608324, 656305, 646240, 592273, 592518, 608070, 663586, 553993], "DH": [670541, 547180, 660271, 405395, 596129], "OF": [605141, 621439, 666969, 671739, 606192, 592450, 680757, 516782, 607043, 592669, 668804, 677594, 623993, 656941, 665742, 543807, 545361, 663656, 662139, 621493], "SP": [645261, 542881, 669456, 669203, 656302, 543037, 641482, 506433, 608331, 668678, 664062, 664299, 666201, 663556, 605400, 660271, 607074, 453286, 675911, 628711, 664285, 434378, 657140], "RP": [453268, 605130, 642585, 656271, 661403, 664747, 621242, 666808, 661395, 664854, 521230, 605280, 445276, 621345, 662253, 672335, 623465, 519151, 502085, 605447, 657024, 642207] }, "members": { "C": [592663, 669257], "1B": [502671, 518692], "2B": [514888, 665926], "SS": [607208, 596019], "3B": [592518, 571448], "DH": [670541, 660271], "OF": [605141, 592450, 545361, 656941, 677594, 663656], "SP": [645261, 666201, 660271, 664285, 434378, 656302, 608331, 605400, 453286, 628711], "RP": [661403, 621242, 664854, 519151] } }, ... ``` Next, we get the players' performance and other information (e.g., `playerId`) from MLB API, which is used in our previous assignments. The API and websites include - https://statsapi.mlb.com/api/v1/sports/1/players - https://bdfed.stitch.mlbinfra.com/bdfed/stats/player?stitch_env=prod&season=2023&sportId=1&stats=season&group=pitching&gameType=R&limit=1000&offset=0 - https://www.fangraphs.com/leaders/major-league And we write a simple script to map the players' name to players' ID (or reversely). ```python s = sys.argv[1:] for player in s: try: player_id = int(player) except: player_id = None i = None for p in r: if p["fullName"] == player or p["id"] == player_id: i = p break if i: print(i["id"], i["fullName"], i["primaryPosition"]["name"]) res.append(i["id"]) else: print("None") res.append("None") ``` The statistical data I choose are listed and explained below (the meaning and reasons for some data), and they are selected by our domain knowledge, so it will be re-examined in EDA. Note that each position has different measurements, so each position will have a independent model. Also, the data will be **normalized to z-score** year by year since sample size each year is nearly 1000, which is large enough for us to regrad them as normal by $\text{Central Limit Theorem}$. - Batter - OPS - Calculated by $SLG+OBP$, and it is the easiest and most common way to evaluate whether a batter is good - AVG - Calculated by $\frac{H}{AB}$ - The most straight-forward metrics to evaluate a batter, but it is not considered to be a good measurement since 1B and HR have the same influence - SB - The number of stolen bases - HR - The number of home run - IBB - The number of being intentionally walked - That means the batter is too strong and the pitcher does not want to face him in an intensive situation - Walkoff - The number of a walkoff hits - It may mean the ability to make a "critical hit" - Barrel% - The rate of "hard hit" - It requires an exit velocity of at least 98 mph when the launch angle is 26 to 30 degree - xwOBA - Expected Weighted On-base Average - It uses model to calculate what the hit ball "should" be expectedly - For example, a hard hit ball can be a double in normal case, however, the center fielder may perform very very well and amazingky catchs that ball. In this case, it is unfair to the batter, but wxOBA can solve this by model - wRC+ - Weighted Runs Created - It represents how many scores can a batter make for his team - UBR - Ultimate Base Running - Quantifying the way any one player runs the bases - It involves a runner's speed, judgement of timing, IQ in playing baseball, etc. - DRS - For defenser - How many scores a defenser saved for his team - It takes errors, range, arm strength, etc. into account - Pitcher - WHIP - Walks plus Hits per Inning Pitched - Basically highly associated with ERA - ERA - The most general way to evaluate a pitcher's performance - But with many problems like ignoring the defense and the differences between ball parks - K/9 - Strikeouts every 9 innings - Not directly associated with the performance, but still useful - BB/9 - Walks every 9 innings - BB/K - Simply $\frac{BB}{K}$ - K% - The rate of a batter being strikeout - TB/9 - Total bases every 9 innings - It can be interpreted as the number weighted hits - xFIP - $\frac{(13\times(\text{flyballs}\times\text{League-average HR/FB rate}))+3\times(BB+HBP)-(2\times K)}{IP} + \text{constant}$ - Only considering the part a pitcher can control and eliminate the affect of ball parks, better than ERA - QS/G - For starting pitcher - Quality Start / Games - QS is defined as pitching at least six innings and allows three earned runs or fewer - IR-A% - For relief pitcher - Inherited Runs Allowed Percentage - The number of runners left by previous pitcher can score after a relief pitcher go on the mound - Clutch - Basically for relief pitcher - It shows the performance of a pitcher when he is in high-pressure situation - For example, a base-loaded 2-out situation has higher pressure for pitcher than no-one-on-bases situation - General - WAR - Wins Above Replacement - In MLB, there are lots of good players, but some of them are "replaceable". - WAR calculates how much a player can give more than a replacement-level player - WPA - Win Probability Added - The contribution a player can make for the win of his team ### Data Fetching & Preprocessing We do a few things in this stage, including - Fetching data - Handling outliers - Type conversion - Normalization As mentioned above, the data are from a few APIs by MLB. I also do a simple outlier handling since I am going to normalize them later, so I need to qualify the players for fear that the outliers make bad affect on the overall data. The handling are as below - Choosing `batter.plateAppearances >= 100` - Choosing `pitcher.inningsPitched >= 10` The outlier handling also solve the issue of missing data. Part of the logic of fetching data is as below ```python # hitting url = "https://bdfed.stitch.mlbinfra.com/bdfed/stats/player?stitch_env=prod&season={year}&sportId=1&stats=season&group=hitting&gameType=R&limit=1000&offset=0&playerPool=ALL" for year in range(2019, 2023 + 1): print(f"fetching {year} hitting stats from MLB API") r = requests.get(url.format(year=year)).json()["stats"] for p in r: if int(p["plateAppearances"]) < 100: continue stats[year][int(p["playerId"])]["player_id"] = int(p["playerId"]) stats[year][int(p["playerId"])]["player_name"] = p["playerName"] stats[year][int(p["playerId"])]["batting"] = 1 stats[year][int(p["playerId"])]["ops"] = float(p["ops"]) ... ``` As for normalization, we use **z-score**, which is $z=\frac{x-avg(X)}{std(X)}$. Part of the logic is as below ```python mean = {} std = {} for p in stats[year]: for f in stats[year][p]: if f in ["player_id", "player_name", "batting", "pitching"]: continue if f not in mean: mean[f] = 0 std[f] = 0 mean[f] += stats[year][p][f] std[f] += stats[year][p][f] ** 2 for f in mean: mean[f] /= len(stats[year]) std[f] = np.sqrt(std[f] / len(stats[year]) - mean[f] ** 2) # calculate z-score for each stat for each player for p in stats[year]: for f in stats[year][p]: if f in ["player_id", "player_name", "batting", "pitching"]: continue stats[year][p][f] = (stats[year][p][f] - mean[f]) / std[f] ``` The process is in `data.py`, and after running this, we can get a file named `stats.json`, which contains the metrics we need. ## Exploratory Data Analysis For each position, we conduct exploratory data analysis on them to get better insight of the dataset. The workflow is listed below. ### Overview After processing data in the above section, the datasets we use only contain historical data of players who were chosen to be nominees in every position. The processed datasets contain three components: - Player ID (used as key for every individual) - Numeric statistical data (mentioned before) - A boolean variable indicating the player was selected or not. ### Descriptive statistics For every position, we can firstly look their descriptive statistics to get preliminary insight. Take `1B` for example: | 1B | Median | Mean | Standard deviation | NA | | ---|---|---|---|---| |ops|1.7553|1.7756|0.2074|0| |sb|0.0779|0.3303|0.7333|0| |hr|3.190|3.1167|1.1622|0| |ibb|1.7081|3.0529|2.9835|0| |avg|1.6224|1.6521|0.2402|0| |iso|2.2184|2.0482|0.5753|0| |walkoff|1.6003|1.4843|1.7061|0| |bb/k|1.9154|2.0082|1.0769|0| |k%|0.9249|0.8826|0.4064|0| |ubr|0.0709|-0.5132|1.8973|0| |wRC+|2.0676|2.1274|0.3329|0| |barrel%|2.1988|2.0280 |0.7893|0| |xwOBA|1.7255|1.7130|0.2115|0| |hitting-WAR|2.6280|2.8142|1.1646|0| |hitting-WPA|2.9126|2.8400|1.4906|0| |DRS|0.6415|0.7185|1.4501|0| From the table above, we can observe that - Standard deviation some columns decrease, which is lower than 1, all-player standard deviation. As a result, we can infer that outstanding players have high tendency to share near value of those statistics. - There's no obvious differences between median and mean in most of these statistics except for `ibb` and `ubr`. Thus, we can infer that `ibb` might be positive skewed and `ubr` might be negative skewed. Here's their histograms: ![image](https://hackmd.io/_uploads/r1Wm-snSp.png) ![image](https://hackmd.io/_uploads/By27binHa.png) - Some statistics like `hr`, `hitting-WAR`, and `hitting-WPA` have relative high mean comparing with all-player mean, which is 0 after being normalized. They may be regarded as important indicator of choosing nominees from all players. Also, we can take a look at the added boolean variable in each position. |Win or not|TRUE|FALSE|Total|TRUE ratio| |---|---|---|---|---| |C|8|23|31|26%| |1B|8|24|32|25%| |2B|8|22|30|27%| |3B|8|26|34|24%| |OF|24|43|67|36%| |SS|8|29|37|22%| |DH|8|12|20|40%| |SP|40|42|82|49%| |RP|16|45|61|26%| We observe that - The gaps of the number of `TRUE` and `FALSE` don't differ too much, so there's no serious data imbalance and it's not necessary to handle that with oversmaling or undersmapling. - Sample size in each position is relatively small. With regard to this, we wnat to use ensemble learning method like bagging or adaboost to reinforce the performance of model we build. ### Normality and distribution check In order to satisfy the basic assumption of some analysis methods before building models, we conduct hypothesis test on every column to check their distributions. For `OF`, `SP`, and `RP`, we use **K-S test** to check if they follow normal distribution since their sample sizes exceed 50. On the other hand, data of other positions will be tested by **S-W test**. We'll take `RP` and `1B` as example here. - `BB/9` and `TB/9` in `RP` From last part (not shown), we find that `BB/9` and `TB/9` share different median and mean, which might be skewed data. So we use K-S test to check the distribution of `BB/9` and `TB/9`, and the results are as follows: ``` Asymptotic one-sample Kolmogorov-Smirnov test data: RP$bb9 D = 0.11349, p-value = 0.4118 alternative hypothesis: two-sided ``` ``` Exact one-sample Kolmogorov-Smirnov test data: RP$tb9 D = 0.12809, p-value = 0.2476 alternative hypothesis: two-sided ``` Since the result are all non-reject, we can still view them as normal at significance level of 0.05. - `OPS+` in `1B` We find that some columns like `ops` are not normal but reverse after taking logarithm. The results are shown below: ``` Shapiro-Wilk normality test data: `1B`$ops W = 0.92497, p-value = 0.02842 ``` ``` Shapiro-Wilk normality test data: log(`1B`$ops) W = 0.94915, p-value = 0.1363 ``` Although the `log(ops)` is normal, we remain all the column as origin since - Additional procession would destroy the meaning of z-socre. - Addition procession destroy the scale of columns. - We can just choose non-parametric method to analyze them in stead of forcing them to fit intensive assumption by destroying the features of data. ### Feature selection From box plot of each column in every position, we learn that there's no statistics that can fully discriminate winners and other nominees. Here is an example: ![image](https://hackmd.io/_uploads/ByDamIIrT.png) As a result, we still want to know if statistics in winners and other nominees exist significant differences. If they do exist, they can be important element to find winner among nominees. In this part, we use **Univariate Feature Selection** to find features in each position. For each statistics, we use **one-way AVOVA test** for normal columns and **Kruskal-Wallis rank sum test** for ones aren't normal. - `HR` in `1B`: ``` Call: aov(formula = hr ~ winOrNot, data = `1B`) Df Sum Sq Mean Sq F value Pr(>F) winOrNot 1 3.45 3.450 2.694 0.111 Residuals 30 38.42 1.281 ``` The ANOVA table tells that there's no significant difference between winner and other nominees. Hence, it will not be feature used, and we can use scatter plot to better realize it. ![image](https://hackmd.io/_uploads/H1QC4onHa.png) - `OPS+` in `1B` Since it is not normally distributed, we conduct Kruskal-Wallis test on it. The result is: ``` Kruskal-Wallis rank sum test data: ops by winOrNot Kruskal-Wallis chi-squared = 9.2803, df = 1, p-value = 0.002316 ``` Test result also tells that there's significant difference between winner and other nominees, so `ops` will be the feature when predicting the `1B` members. ![image](https://hackmd.io/_uploads/SJkUK6kr6.png) Results(non-normal columns be highlighted): |Position|Features| |---|---| |C|hitting-WAR| |1B|==OPS+==, AVG, wRC+, xwOBA, hitting-WAR, hitting-WPA | |2B|OPS+, HR, ISO, wRC+, hitting-WAR| |3B|OPS+, ==SB==, ==HR==, AVG, ISO, wRC+, hitting-WAR| |OF|OPS+, SB, HR, ==IBB==, AVG, ISO, BB/K, wRC+, barrel%, xwOBA, hitting-WAR, hitting-WPA| |SS|OPS+, HR, ISO, wRC+, xwOBA, hitting-WAR, hitting-WPA| |DH|OPS+, ==IBB==, HR, AVG, ISO, UBR, wRC+, barrel%, xwOBA, hitting-WAR | |SP|ERA, WHIP, BB/9, QS/G, xFIP, pitching-WAR, pitching-WPA| |RP|ERA, WHIP, K/9, BB/9, TB/9, SV/SVO, xFIP, pitching-WAR| ## Building Model After obtaining all the data, we can start to train the models. Since the features do NOT obey normal distribution, we only use non-parametric models. We use both classification and regression methods to predict the winners. The models we use are - Classification - KNN - Naive Bayes - Random Forest (ensemble) - Regression - Decision Tree - SVR - Gradient Boosting (ensemble) We use Python to train the model, and here is the pseudo code ```python # training for year in range(2019, 2022 + 1): nominees = get_nominees(year) for p in nominees: X.append(p.performance.get(year)) y.append(p.win_or_not.get(year)) model.fit(X, y) # predicting # classifier for p in get_nominees(2023): # predict probability res = model.predict_proba(p.performance.get(2023)) # regression for p in get_nominees(2023): res = model.predict(p.performance.get(2023)) ``` It worth noticing that we use both classifier and regression methods to do prediction, and - Classifier uses **probability** since we need fixed number of results - Regression predict a number while its training data is either 0 or 1 ## Results ``` Classifier results: ╒═══════╤════════════════════════╤═══════════════════╤══════════════════════════╕ │ pos │ KNeighborsClassifier │ GaussianNB │ RandomForestClassifier │ ╞═══════╪════════════════════════╪═══════════════════╪══════════════════════════╡ │ C │ William Contreras │ William Contreras │ Adley Rutschman │ │ │ Adley Rutschman │ Adley Rutschman │ William Contreras │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ 1B │ Freddie Freeman │ Freddie Freeman │ Freddie Freeman │ │ │ Matt Olson │ Matt Olson │ Matt Olson │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ 2B │ Marcus Semien │ Jose Altuve │ Jose Altuve │ │ │ Ozzie Albies │ Marcus Semien │ Marcus Semien │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ 3B │ Jose Ramirez │ Austin Riley │ Austin Riley │ │ │ Isaac Paredes │ Rafael Devers │ Jake Burger │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ SS │ Corey Seager │ Corey Seager │ Corey Seager │ │ │ J.P. Crawford │ Francisco Lindor │ J.P. Crawford │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ OF │ Ronald Acuna Jr. │ Ronald Acuna Jr. │ Ronald Acuna Jr. │ │ │ Mookie Betts │ Aaron Judge │ Mookie Betts │ │ │ Corbin Carroll │ Mookie Betts │ Juan Soto │ │ │ Aaron Judge │ Juan Soto │ Aaron Judge │ │ │ Luis Robert Jr. │ Kyle Tucker │ Kyle Tucker │ │ │ Julio Rodriguez │ Corbin Carroll │ Luis Robert Jr. │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ DH │ Shohei Ohtani │ Shohei Ohtani │ Shohei Ohtani │ │ │ Yordan Alvarez │ Yordan Alvarez │ Yordan Alvarez │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ SP │ Gerrit Cole │ Gerrit Cole │ Gerrit Cole │ │ │ Zac Gallen │ Logan Webb │ Blake Snell │ │ │ Kevin Gausman │ Zack Wheeler │ Kevin Gausman │ │ │ Sonny Gray │ Sonny Gray │ Sonny Gray │ │ │ Justin Steele │ Justin Steele │ Justin Steele │ │ │ Logan Webb │ Kevin Gausman │ Logan Webb │ │ │ Zack Wheeler │ Zac Gallen │ Zack Wheeler │ │ │ Chris Bassitt │ George Kirby │ Zac Gallen │ │ │ Zach Eflin │ Kyle Bradish │ Zach Eflin │ │ │ Blake Snell │ Zach Eflin │ Spencer Strider │ ├───────┼────────────────────────┼───────────────────┼──────────────────────────┤ │ RP │ Felix Bautista │ Felix Bautista │ Felix Bautista │ │ │ Tanner Scott │ Tanner Scott │ Devin Williams │ │ │ Camilo Doval │ Devin Williams │ Josh Hader │ │ │ Josh Hader │ Evan Phillips │ Tanner Scott │ ╘═══════╧════════════════════════╧═══════════════════╧══════════════════════════╛ Regression results: ╒═══════╤═════════════════════════╤═══════════════════╤═════════════════════════════╕ │ pos │ DecisionTreeRegressor │ SVR │ GradientBoostingRegressor │ ╞═══════╪═════════════════════════╪═══════════════════╪═════════════════════════════╡ │ C │ William Contreras │ William Contreras │ William Contreras │ │ │ Jonah Heim │ Adley Rutschman │ Adley Rutschman │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ 1B │ Yandy Diaz │ Matt Olson │ Yandy Diaz │ │ │ Freddie Freeman │ Freddie Freeman │ Freddie Freeman │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ 2B │ Ozzie Albies │ Marcus Semien │ Jose Altuve │ │ │ Jose Altuve │ Jose Altuve │ Marcus Semien │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ 3B │ Austin Riley │ Austin Riley │ Austin Riley │ │ │ Nolan Arenado │ Jake Burger │ Isaac Paredes │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ SS │ Corey Seager │ Corey Seager │ Corey Seager │ │ │ Orlando Arcia │ Francisco Lindor │ J.P. Crawford │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ OF │ Ronald Acuna Jr. │ Mookie Betts │ Mookie Betts │ │ │ Mookie Betts │ Aaron Judge │ Ronald Acuna Jr. │ │ │ Corbin Carroll │ Ronald Acuna Jr. │ Juan Soto │ │ │ Adolis Garcia │ Corbin Carroll │ Kyle Tucker │ │ │ Aaron Judge │ Luis Robert Jr. │ Aaron Judge │ │ │ Juan Soto │ Kyle Tucker │ Corbin Carroll │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ DH │ Yordan Alvarez │ Shohei Ohtani │ Yordan Alvarez │ │ │ Shohei Ohtani │ Yordan Alvarez │ Shohei Ohtani │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ SP │ Gerrit Cole │ Gerrit Cole │ Blake Snell │ │ │ Zach Eflin │ Zack Wheeler │ Gerrit Cole │ │ │ Zac Gallen │ Logan Webb │ Zach Eflin │ │ │ Blake Snell │ Zac Gallen │ Justin Steele │ │ │ Justin Steele │ Sonny Gray │ Logan Webb │ │ │ Logan Webb │ Justin Steele │ Kyle Bradish │ │ │ Zack Wheeler │ Kevin Gausman │ Zack Wheeler │ │ │ Chris Bassitt │ George Kirby │ Kevin Gausman │ │ │ Kyle Bradish │ Spencer Strider │ Chris Bassitt │ │ │ Corbin Burnes │ Blake Snell │ Sonny Gray │ ├───────┼─────────────────────────┼───────────────────┼─────────────────────────────┤ │ RP │ Felix Bautista │ Felix Bautista │ Felix Bautista │ │ │ Alexis Diaz │ Tanner Scott │ Emmanuel Clase │ │ │ Josh Hader │ Devin Williams │ Tanner Scott │ │ │ Hector Neris │ Shawn Armstrong │ Aroldis Chapman | ╘═══════╧═════════════════════════╧═══════════════════╧═════════════════════════════╛ ``` ### Actual Results(Revealed in 12/16) ![image](https://hackmd.io/_uploads/SkgJ5e0I6.png) ### Performance Evaluation ||First Team Recall|Second Team Recall|Total Recall|Total Accuracy| |---|---|---|---|---| |KNeighborsClassifier|0.8125|0.4375|0.625|0.8032 |RandomForestClassifier|==0.875==|0.375|0.625|0.8032| |DecisionTreeRegressor|0.75|0.375|0.5625|0.7705| |SVR|==0.875==|0.4375|0.6563|==0.8197==| |GradientBoostingRegressor|0.75|0.375|0.5625|0.7705| * Prediction performance of first team and second team differ a lot, meaning that players performing well usually have top popularities However, performance cannot well reflect popularity when their performances are not that outstanding. * ### Defect - Disregarding player's personal charm and his past performance - For example, Mike Trout, one of the best batters in the world and the leader of Team USA in WBC, may not be selected by the model, but he is likely to be voted as one of the winners - But actually, Mike Trout is not in the nominees list this year since he got injuried during the season - The results should only depend on the performance during regular season (excluding postseason), but since it's voted by fans after postseason, we cannot make sure they do not take postseason into account - A player can have ability to defense multiple positions. For example, Yu Chang can defense 1B, 2B, 3B and SS with good performance. But when voting, there is only 1 position. But we only have overall data instead of data for each position. This leads to two issues - The data may be wrong since a player can have huge differences between the performace he made in two positions - The standard may be wrong since for different position, and the distribution of performance can vary. For example, a catcher is not very good at batting since he should pay more attention on leading the pitcher and determining the situation on field, which is exhausting - Since we do not have the actual number of votes, we cannot tell the gap between every selected memmbers and other nominees. |GaussianNB|0.75|0.4375|0.5938|0.7869|