Study Notes from http://users.eecs.northwestern.edu/~choudhar/Publications/MiningMillionsofReviewsATechniqueToRankProductsBasedOnImportanceofReviews.pdf
Abstract
- Mine Reviews
- Get Score – present a product ranking model that applies weights to product review factors to calculate a products ranking score. Rank reviews and products based on score.
- Sort reviews and products based on score.
Introduction
- Too many reviews
- Reviews may contain –ve feedback about the seller and not the product. Those need to be filtered.
- A Review's credibility can be based on -
- Date or Age of the review
- Number of Helpful votes/ number of votes
- Ratings provided can have a personal bias.
Methodology
- Summary: Sentiment analysis for each relevant sentence
- While calculating the product ranking scores, reviews should not be equally weighted, they should be mined and given proportional weights.
- 3 stages are proposed in evaluating a review's weight
- filter out irrelevant sentences.
- use helpfulness votes and age to derive the review's weight.
- calculate the product's ranking score based on the review weights.
Filtering Irrelevant sentences
- This is treated as a binary classification problem
- Use Support Vector Machines (SVM) to train a hypothesis function h. - sentence gives h(sentence)
- The sentence is translated to a vector X
- The SVM uses linear regression – h(X) = Beta-Transpose * X + b (Just like you used gradient descent to implement linear regression, use SVM here)
- 1000 sentences are collected manually and used as the training set
- 10 fold cross validation is used.
Calculating the Review's weight
- Helpfulness Vote - H
- bare minimum, it is X out of Y people found it useful.
- to beat the bias in ratings do the following
- ignore reviews with less than 10 votes
- use simple X out of Y ratio for items with 10 – 200 reviews
- if number of items is greater than 200, multiply by a gaining factor, > 1
- Age of review and durability - T
- Younger reviews have a greater weight.
- younger reviews will naturally have less number of votes
- newer versions of the product will match with newer reviews.
- Calculating the age based metric = e^ Decay rate * ( time of review – time of product release) + initializing factor.
- Sentence splitter and part of speech tagging:-
- Split reviews into sentences, using MXTERMINATOR
- Assign positive or negative sentiments to sentences.
- Use part of speech tagger to do assess sentiment.
- Sentences are saved with PART-OF-SPEECH tags.
Product Ranking Score Function:-
1. Calculate the sentiment or the polarity of the review. Use polarity, H and T to calculate the product ranking score.
2. Calculating review sentiment
1. Manually pick a set of common adjectives/ adverbs as a seed list.
2. Augment it with synonyms and antonyms
3. If a sentence has an adjective or adverb from a positive set then it is positive.
4. Negative sentences are handled similarly
5. If a sentence has more than 1 sentiment, then you Polarity = Sum of positives + sum of negatives.
3. Final Score of a product is – For all of its reviews, Sum of all ( Polarity * H * T) / Sum of all H * Sum of all T
Evaluation and Analysis
- Compare review rank with sales rank (Amazon specific) - how well the product is sold within its category
- Mean Average Precision or MAP
- Spearman's coefficient for a set of products between both human ranking and the above algorithm.
- Results
- Filtering out irrelevant sentences improves performance
- giving weights to reviews is an additional improvement
- weights + age of reviews is even better
Effects of Individual Features
- To get the features that contribute the most to the ranking, the correlation between the ranking by a feature and overall ranking can be calculated.
Future work, Consider these additional attributes
- Reviewer Credibility
- Prioritizing features
- Look for Sarcasm :)
- Filter out Spam.
- Data from other sources.