EDAMI - LAB1 - HackMD

# EDAMI - LAB1 ## Content The experiment should answer the following questions: - what makes a rule interesting (definition based on available rule parameters)? - support, confidence, lift, improvement There are 3 basic measures that help us find an interesting rule: relative support, confidence and lift. A rule is interesting, when its confidence and relative support is higher than minimum treshold and the lift is not equal to 1. High relative support of a rule means, that it occurs frequently in the database. High confidence implies high probability that when the aniticident of a rule occurs, its consequent occurs as well. When lift is equal to 1, it means that there is no relation between antecedent and consequent of the rule. We additionally applied the "improvement" measure. It helps us establish minimal improvement in the confidence of the rule comparing to confidence of any more general rule. - how to choose the best rule(s)? (Razem) - x - what is a potential practical application of the discovered rules? (B) - The most obvious association rule that has practical application is the prediction whether a mushroom is poisonous or edible using some of the features that describe it. The examples below show which itemsets provide the best result (in terms of the previously described criteria). ```r= # Edible - top edible01 <- apriori(Mushroom, parameter = list(support=0.4, confidence = 0.8, minlen = 3), appearance = list(rhs = c("Class=edible"), default="lhs")) inspect(edible01) # Edible have bruises edible02 <- apriori(Mushroom, parameter = list(support=0.3, confidence = 0.3, minlen = 2), appearance = list(lhs = c("Bruises=bruises"), rhs = c("Class=edible"))) inspect(edible02) ``` - Another possible application is helping us to determine mushroom's habitat. For example it turns out that mushrooms with bulbous root tend to grow in woods. ```r= rulesWithRHS <- apriori(Mushroom, parameter = list(support=0.3, confidence = 0.6, minlen = 2), appearance = list(rhs = c("Habitat=woods"))) inspect(head(rulesWithRHS,1)) ``` ## Experiments ### Tasks - Discovering rules for different values of **minimum support** -- how does it impact the number of detected rules (B) - Minimum support determines ```r= # Let's start with support = 0.8 # The following minSupImpact <- apriori(Mushroom, parameter = list(support=0.8, confidence = 0.8, minlen = 4)) inspect(minSupImpact) minSupImpact <- apriori(Mushroom, parameter = list(support=0.9, confidence = 0.8, minlen = 4)) inspect(minSupImpact) minSupImpact <- apriori(Mushroom, parameter = list(support=0.7, confidence = 0.8, minlen = 4)) inspect(minSupImpact) ``` - Assessment of discovered rules with the **lift parameter** less than 1 and greater than 1 in terms of their usefulness for practical use (Razem) - Comparison of the average confidence value of "short" rules and "long" rules - policzyc średnią ### Requirements for the report - The report is a R script file that can be executed in R-Studio. The text of the report should be included in the R file as comments. - The report should consist of - Code that allows the experiment to be repeated - Results/sample results obtained in the experiment (the code generating the results or presentations is not enough) - The script should execute without any errors ## Conclusions - Should be based on conducted experiments - Should be non-trivial (e.g. an indication of which experiment and for which parameters the best result was obtained is not enough) - bruises ? ```R ######################### # Loading libraries library(arules) library(arulesViz) # Loading dataset data(Mushroom) # Basic dataset analysis ?Mushroom dim(Mushroom) summary(Mushroom) Mushroom@itemInfo$labels inspect(head(Mushroom)) # The dataset contains information about 8124 mushrooms in transactional form. # The data contains 22 nominal features plus the class attribure (edible or not). These features were translated into 114 items. # Item frequency freqTable = itemFrequency(Mushroom, type = "relative") freqTable = sort(freqTable, decreasing= TRUE) # Feature VeilType=partial is common for all elements, so we have decied to remove it # Elements with support >= 60% print(freqTable[freqTable > 0.6]) # Plot elements with support >= 60% itemFrequencyPlot(Mushroom, type ="relative", support= 0.6) edible <- apriori(Mushroom, parameter = list(support=0.2, confidence = 0.4, minlen = 2), appearance = list(none = c("VeilType=partial"),lhs = c("Bruises=bruises"), rhs = c("Class=poisonous"))) poisonous <- apriori(Mushroom, parameter = list(support=0.4, confidence = 0.1, minlen = 2), appearance = list(none = c("VeilType=partial"),rhs = c("Class=edible"))) poisonous <- apriori(Mushroom, parameter = list(support=0.4, confidence = 0.1, minlen = 2), appearance = list(rhs = c("Class=edible"))) inspect(head(sort(poisonous[size(poisonous)>4], by="lift", decreasing = FALSE),10)) ``` poisonous <- apriori(mushroomTR, parameter = list(support=0.01, confidence = 0.01, minlen = 2), appearance = list(lhs = c("Bruises=bruises"), rhs = c("Class=poisonous"))) edible <- apriori(mushroomTR, parameter = list(support=0.3, confidence = 0.3, minlen = 2), appearance = list(lhs = c("Bruises=bruises"), rhs = c("Class=edible"))) ---- quality(aRules) <- cbind(quality(aRules), improvement = interestMeasure(aRules, "improvement", transactions = Mushroom)) impr <- subset(aRules, subset = improvement <= 1) impr <- sort(impr, by="improvement", decreasing = TRUE) inspect(head(impr)) ----- ``` lhs rhs support confidence lift count improvement [1] {GillSize=broad, SurfaceBelowRing=smooth} => {Class=edible} 0.3919252 0.9364706 1.807958 3184 0.23796738 [2] {GillSize=broad, SurfaceAboveRing=smooth} => {Class=edible} 0.4155588 0.9398664 1.814514 3376 0.23662062 [3] {GillSize=broad, StalkShape=tapering} => {Class=edible} 0.3072378 0.8965517 1.730890 2496 0.19804852 [4] {GillSize=broad, ColorBelowRing=white} => {Class=edible} 0.3032989 0.8725212 1.684497 2464 0.17401804 [5] {GillSize=broad, ColorAboveRing=white} => {Class=edible} 0.3032989 0.8725212 1.684497 2464 0.17401804 [6] {GillSize=broad, RingType=pendant} => {Class=edible} 0.3643525 0.8915663 1.721265 2960 0.09721143 [7] {Bruises=bruises, GillSize=broad} => {Class=edible} 0.3269325 0.8806366 1.700164 2656 0.06547073 [8] {SurfaceAboveRing=smooth, SurfaceBelowRing=smooth} => {Class=edible} 0.3845396 0.7516843 1.451208 3124 0.04843856 [9] {Bruises=bruises, SurfaceAboveRing=smooth} => {Class=edible} 0.3387494 0.8514851 1.643884 2752 0.03631927 [10] {Odor=none, StalkShape=tapering} => {Class=edible} 0.3072378 1.0000000 1.930608 2496 0.03401361 [11] {Bruises=bruises, SurfaceBelowRing=smooth} => {Class=edible} 0.3151157 0.8421053 1.625775 2560 0.02693939 [12] {SurfaceAboveRing=smooth, RingType=pendant} => {Class=edible} 0.3682915 0.8165939 1.576523 2992 0.02223905 [13] {GillSize=broad, SurfaceAboveRing=smooth, SurfaceBelowRing=smooth, RingNumber=one} => {Class=edible} 0.3229936 0.9732938 1.879049 2624 0.01897905 [14] {Odor=none, RingNumber=one} => {Class=edible} 0.3545052 0.9836066 1.898959 2880 0.01762016 [15] {GillSpace=close, SurfaceBelowRing=smooth, RingNumber=one, RingType=pendant} => {Class=edible} 0.3111768 0.8359788 1.613948 2528 0.01756451 [16] {GillSpace=close, SurfaceAboveRing=smooth, RingNumber=one, RingType=pendant} => {Class=edible} 0.3348104 0.8457711 1.632853 2720 0.01685548 [17] {Odor=none, GillSize=broad, RingNumber=one} => {Class=edible} 0.3308715 1.0000000 1.930608 2688 0.01639344 [18] {GillSize=broad, SurfaceBelowRing=smooth, RingNumber=one} => {Class=edible} 0.3466273 0.9513514 1.836687 2816 0.01488076 [19] {GillSize=broad, SurfaceAboveRing=smooth, RingNumber=one} => {Class=edible} 0.3702610 0.9543147 1.842408 3008 0.01444835 [20] {GillSpace=close, RingNumber=one, RingType=pendant} => {Class=edible} 0.3348104 0.8095238 1.562873 2720 0.01412151 [21] {Odor=none, SurfaceAboveRing=smooth, RingNumber=one} => {Class=edible} 0.3042836 0.9967742 1.924381 2472 0.01316764 [22] {GillSize=broad, SurfaceAboveRing=smooth, SurfaceBelowRing=smooth} => {Class=edible} 0.3594289 0.9530026 1.839875 2920 0.01313624 [23] {GillSpace=close, SurfaceAboveRing=smooth, RingType=pendant} => {Class=edible} 0.3387494 0.8289157 1.600312 2752 0.01232178 [24] {Odor=none, GillSize=broad} => {Class=edible} 0.3958641 0.9781022 1.888332 3216 0.01211580```