Content-Based Similarity Measures of Weblog Authors (bibtex)
by Wienberg, Christopher, Roemmele, Melissa and Gordon, Andrew S.
Abstract:
With recent research interest in the confounding roles of homophily and contagion in studies of social influence, there is a strong need for reliable content-based measures of the similarity between people. In this paper, we investigate the use of text similarity measures as a way of predicting the similarity of prolific weblog authors. We describe a novel method of collecting human judgments of overall similarity between two authors, as well as demographic, political, cultural, religious, values, hobbies/interests, personality, and writing style similarity. We then apply a range of automated textual similarity measures based on word frequency counts, and calculate their statistical correlation with human judgments. Our findings indicate that commonly used text similarity measures do not correlate well with human judgments of author similarity. However, various measures that pay special attention to personal pronouns and their context correlate significantly with different facets of similarity.
Reference:
Content-Based Similarity Measures of Weblog Authors (Wienberg, Christopher, Roemmele, Melissa and Gordon, Andrew S.), In ACM Web Science Conference, 2013.
Bibtex Entry:
@inproceedings{wienberg_content-based_2013,
	address = {Paris, France},
	title = {Content-{Based} {Similarity} {Measures} of {Weblog} {Authors}},
	url = {http://ict.usc.edu/pubs/Content-Based%20Similarity%20Measures%20of%20Weblog%20Authors.PDF},
	abstract = {With recent research interest in the confounding roles of homophily and contagion in studies of social influence, there is a strong need for reliable content-based measures of the similarity between people. In this paper, we investigate the use of text similarity measures as a way of predicting the similarity of prolific weblog authors. We describe a novel method of collecting human judgments of overall similarity between two authors, as well as demographic, political, cultural, religious, values, hobbies/interests, personality, and writing style similarity. We then apply a range of automated textual similarity measures based on word frequency counts, and calculate their statistical correlation with human judgments. Our findings indicate that commonly used text similarity measures do not correlate well with human judgments of author similarity. However, various measures that pay special attention to personal pronouns and their context correlate significantly with different facets of similarity.},
	booktitle = {{ACM} {Web} {Science} {Conference}},
	author = {Wienberg, Christopher and Roemmele, Melissa and Gordon, Andrew S.},
	month = may,
	year = {2013},
	keywords = {The Narrative Group, UARC}
}
Powered by bibtexbrowser