Title:
|
SCRAPING NEWS SITES AND SOCIAL NETWORKS FOR PREJUDICE TERM ANALYSIS |
Author(s):
|
Pedro Rangel Henriques, Cristiana Araújo, Isabel Ermida and Idalete Dias |
ISBN:
|
978-989-8533-95-1 |
Editors:
|
Hans Weghorn |
Year:
|
2019 |
Edition:
|
Single |
Keywords:
|
Information Extraction, Social Web, Web Scraping, Computer-Mediated Communication |
Type:
|
Full Paper |
First Page:
|
179 |
Last Page:
|
189 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Computer-Mediated Communication (CMC) has paved the way for new patterns of linguistic aggravation. Hidden behind
the screen, anyone can comment on any other person's opinion using an offensive or injurious tone. Besides, types of
prejudice such as homophobia, sexism, racism, xenophobia, anticlericalism, body/addiction shaming, among others, are
easily found nowadays in social networks and other forms of interactive Web sites potentiated by Web 2.0. This increasing
violence deserves further investigation from different academic perspectives, among which Sociolinguistics stands out.
This paper is concerned with the design and development of a set of computer-based tools to collect articles and posts with
the respective comment threads that can be used as sources to extract prejudice terms and allow different analyses to be
conducted. These prejudice terms were devised using a sociolinguistic variable stratificat ion approach. We will focus on
the filters used to extract the relevant fields from the Web pages collected, and on the converters used to adapt formats to
obtain a common format for information representation. We will also introduce the statistical analysis processor that
explores the extracted data, in that format, to output a set of indicators. |
|
|
|
|