Mining Subjective Properties on the Web === **Title**: Mining Subjective Properties on the Web **Authors**: Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi **Year**: 2015 **Link**: <https://dl.acm.org/citation.cfm?id=2750548> Answering queries from structured data is the best! But what if you don't have a structured data respository for some type of queries? > Cute cat > Kid-friendly apartments > Safe cities > Boring sports All of these are subjective queries for which we want to find a *dominanat opinion* which represents the "crowd's wisdom" or "common-sense" for an $<\text{entity}, \text{property}>$ pair. The paper presents their **SURVEYOR** system which attempts to solve this problem from wild-web data. 1. Scrape/parse web data to find (entity, property) pairs and thier "sentiment" or "opinion" -- this is limited to positive, negative or None (where you did not find any opinion for a particular pair). 2. Just aggregating the pos/neg instances will not cut it because of hidden biases in the opinions: - Cute animals tend to have more citations. - People don't usually call out when things are functioning well, which is why "UNSAFE city" will see some score but "SAFE city" might not see anything. - People don't always express their opinion. 3. To overcome these, you pass the data to a MLE model which tries to infer the "true" dominant opinion by maximizing the likelihood of observing the counts of the pos/neg statements. More contributions: - Ran on web-scale data. - Parameter learning is iterative EM; updates are closed formed. - Matches the "common-sense" opinion when evaluated against AMT. ![](https://i.imgur.com/1s3ivo5.png) ## SYSTEM OVERVIEW SURVEYOR assumes it has access to a large-scale web-corpus and a KB which has an ontology of entities -- only entites present in *this* KB are considered; their *properties* are extracted on the fly from the web documents in the corpus. It also has access to SOTA (Stanford-NLP like) text parsing and entity disambiguation techniques. 1. **Extracting Evidence.** For all entities in the KB, get the entity,property statement with polarity. Then aggregate for each pair the number of pos/neg statements. 2. **Evidence Interpretation.** Interpret the evidence to infer the dominant opinion for every entity-property pair. ## EXTRACTING EVIDENCE - Given a sentence, run a dependency parser on the sentence. - Extract entities (either NER with entity-disambiguation). - Properties are adjectives, optionally associated with adverbs. Properties are extracted using patterns. - Considered the following patterns to begin with: - ![](https://i.imgur.com/RdXRyyC.png) - Some examples of the type of relations these extract: - ![](https://i.imgur.com/oSQbN42.png) Patterns are not designed to ignore "non-intrinsic properties". - The sentence "NEW YORK is BAD for PARKING" should be ignored because it refers to a specific part of NEW YORK. - This notion is exploited while filtering non-intrinsic statements -- *they refer only to a specific aspect of an entity*. - Search for sub-trees in the dependency tree that could represent constrictions. - Search for sub-trees that have a specific position relative to the detected pattern. - If such a sub-tree is found, then ignore that statement. - Another strategy is to detect sentences where the adjective modifier pattern was detected to be co-referential. Polairty. - Exploit negation detected during parsing. - Given the sub-tree for an entity, start from the property at the leaf node with polarity $+1$. - For every negation, change the sign. ## MODELING USER BEHAVIOR (or Evidence Interpretation) > ... the simple approach to es- timating the probability of the dominant opinion based on majority vote counting does not work very well because it fails to model the different types of bias that underlie au- thoring on the Web. > All discussion is for a single property-entity combination. The "observation/evidence" is the output of `EVIDENCE EXTRACTION` -- a tuple of $<C^+_i, C^-_i>$ -- total counts of positive statements and negative statements for a pair. The model assumes that each tuple is draw from **two** possible distribution: - the first assumes that the dominant opinion applies to the entity - the second assumes *it does not.* **The distribution applies to the *complete tuple* and not the individual pos/neg counts!** If we knew the distributions, then we could calculte the probability of a tuple being draw from both distributions. A simple framework for generating a web-statement could be as follows: 1. There exists an *underlying* dominant opinion. 2. There is some probability that the user agrees with that opinion. 3. There is some probability with which he will express his opinion online. Thus, to model the probability of the observed number of pos/neg statements, we must model the two probs above. ![](https://i.imgur.com/aqxBF4S.png) - The top node $\text{Dominant Opinion}$ is what we're trying to estimate from the evidence (in green, the polarity of the user statement). - That opinion could be positive or negative, respectively represented by $Pr(D_i = +), Pr(D_i = -)$. - The user could either agree or disagree with the opinion. This accounts for a user's subjectivity w.r.t. an opinion, modelled by $p_A, 1-p_A$ -- the $A$ is for agree :smile: - Finally, the user could choose to express his opinion with a statement. The probability of expressing/**observing** a positive statement is $p^+_S$. The Bayes net with the random variables can be represented like so: ![](https://i.imgur.com/Y64XcoT.png) **Notation:** $$ \begin{align} & D_i \ : \text{Random var -- dominant opinion for entity type $i$} \\ & O_{iw} : \text{User's opinion on $i$ for document $w$ -- $p_A$ if he believes the dominant opinion; $1 - p_A$ if he does not} \\ & S_{iw} : \text{Decides to voice his opinion or not write anything -- $p^+_S$ if he decides to write the positive opinion; $1 - p^+_S$ if decides *not to write*} \\ & C^+_i, C^-_i : \text{The observed pos/neg count} \end{align} $$ Refer to :arrow_up: Figure 8 above :arrow_up: to understand how the model parameters fit with the variable state assignments. Our goal is to estimate the distribution over $D_i$ given observed counts: $$ \text{Pr}(D_i | C^+_i, C^-_i) $$ ## PARAMETER ESTIMATION Some tricks and simplifications: - We can expand the conditional $P(D_i | C^+_i, C^-_i)$ as $P(C^+_i, C^-_i | D_i)\ . P(D_i)$ and set the prior $P(D_i)$ to 0.5 - For a given $D_i = +, C^+_i = a, C^-_i = b$, the likelihood can be assumed to follow a multinomial distribution. $P(C^+_i = a, C^-_i = b | D_i = +) =$ - ![](https://i.imgur.com/T4k1rYd.png) - This multinomial can be approximated as a product of two Poisson distributions. $$ P(C^+_i = a, C^-_i = b | D_i = +) = P(C^+_i = a | D_i = +)\ . P(C^-_i = b | D_i = +) $$ - A similar expression for $D_i = -$; in total, there will be 4 poisson parameters to fit: for +/- opinion $D_i$ and two counts ($a, b$) each. Rest of Section.6 derives EM update equations. ## EXPERIMENTAL RESULTS AMT workers tested on following entity-property combinations. ![](https://i.imgur.com/b9kTGvB.png) Significantly outperforms other systems and the majority-vote baselines.