My experiments with code: Mining millions of reviews

Abstract

Mine Reviews
Get Score – present a product ranking model that applies weights to product review factors to calculate a products ranking score. Rank reviews and products based on score.
Sort reviews and products based on score.

Introduction

Too many reviews
Reviews may contain –ve feedback about the seller and not the product. Those need to be filtered.
A Review's credibility can be based on -

Methodology

Summary: Sentiment analysis for each relevant sentence
While calculating the product ranking scores, reviews should not be equally weighted, they should be mined and given proportional weights.
3 stages are proposed in evaluating a review's weight

Filtering Irrelevant sentences

This is treated as a binary classification problem
Use Support Vector Machines (SVM) to train a hypothesis function h. - sentence gives h(sentence)
The sentence is translated to a vector X
The SVM uses linear regression – h(X) = Beta-Transpose * X + b (Just like you used gradient descent to implement linear regression, use SVM here)
1000 sentences are collected manually and used as the training set
10 fold cross validation is used.

Calculating the Review's weight

Calculating the age based metric = e^ Decay rate * ( time of review – time of product release) + initializing factor.

Product Ranking Score Function:-

1. Calculate the sentiment or the polarity of the review. Use polarity, H and T to calculate the product ranking score.

2. Calculating review sentiment

1. Manually pick a set of common adjectives/ adverbs as a seed list.

2. Augment it with synonyms and antonyms

3. If a sentence has an adjective or adverb from a positive set then it is positive.

4. Negative sentences are handled similarly

5. If a sentence has more than 1 sentiment, then you Polarity = Sum of positives + sum of negatives.

3. Final Score of a product is – For all of its reviews, Sum of all ( Polarity * H * T) / Sum of all H * Sum of all T

Evaluation and Analysis

Compare review rank with sales rank (Amazon specific) - how well the product is sold within its category
Mean Average Precision or MAP

Spearman's coefficient for a set of products between both human ranking and the above algorithm.

Effects of Individual Features

To get the features that contribute the most to the ranking, the correlation between the ranking by a feature and overall ranking can be calculated.

Future work, Consider these additional attributes

My experiments with code