Projects & Collaborations 1 foundShow per page10 10 20 50 Mining Goodreads: a text similarity-based approach to measure reader absorption Research Project | 3 Project MembersOver the last decades, with technological advancements, growing digitalization and the development of social media, the act of reading has transformed into a more social interaction (Cordon Garcia, Alonso Arevalo, Gomez Diaz, & Linder, 2013; Merga, 2015), or rather has returned to its once social origins (Nation, 2018). Social media platforms like Goodreads are online environments where millions of people come to share their love for the written word. Members come together to discuss what they read, what they classify as good or bad literature, and they recommend books to one another or even try their hand at writing fan fiction. Thus, in the digital age the act of reading, which has since the latter half of the 18th century been construed as a mostly solitary, immersive act, has started to involve a social component that goes far beyond that of a real-life book club or public poetry reading. First of all, because of the scale on which this takes place and second because of the new opportunities that online platforms offer in terms of social interactions. This project focuses on the growing phenomenon of online social reading. It is an exploratory study that exploits a new data source and develops new methodologies. Goodreads holds a wealth of qualitative data about reading experience, text evaluation, and social interactions about reading. It would take an experimental researcher an entire career to gather and analyze even just a fraction of the data that is readily available on this website. So far, this treasure trove of data has not been empirically investigated, and this is partly due to the fact that new methodologies have to be developed to extract the data from the website in a meaningful way. This is exactly the gap that our project aims to fill. In order to investigate meaningful ways in which such a reader review corpus might then be used, we are also developing computational linguistics methods to mine the extracted corpus with a specific reader response in mind, namely absorption - the feeling of being lost in a book (Nell, 1988; Kuijpers, 2014). By analyzing reader reviews on Goodreads using textual entailment and text reuse detection (methods from computational linguistics) and comparing them to statements on the Story World Absorption Scale (SWAS; Kuijpers, Hakemulder, Tan & Doicaru, 2014), we will investigate: (1) the potential of converting Goodreads into an extensive qualitative corpus for the computational analyses of reader responses; (2) the validation of the SWAS through comparison with reviews on Goodreads; and (3) the comparison of readers' absorption across different genres. It is important to study these online social reading phenomena, as they are becoming exceedingly popular and provide new ways for people of all ages to acquire storytelling and literacy skills (Coiro, Knobel, Lankshear, & Leu, 2014). The potential impact of this project is widespread as it will construct a new corpus of interest to researchers from different fields and develop methodologies that can be fine-tuned to be used on various other online corpora that are made up of natural language. 1 1
Mining Goodreads: a text similarity-based approach to measure reader absorption Research Project | 3 Project MembersOver the last decades, with technological advancements, growing digitalization and the development of social media, the act of reading has transformed into a more social interaction (Cordon Garcia, Alonso Arevalo, Gomez Diaz, & Linder, 2013; Merga, 2015), or rather has returned to its once social origins (Nation, 2018). Social media platforms like Goodreads are online environments where millions of people come to share their love for the written word. Members come together to discuss what they read, what they classify as good or bad literature, and they recommend books to one another or even try their hand at writing fan fiction. Thus, in the digital age the act of reading, which has since the latter half of the 18th century been construed as a mostly solitary, immersive act, has started to involve a social component that goes far beyond that of a real-life book club or public poetry reading. First of all, because of the scale on which this takes place and second because of the new opportunities that online platforms offer in terms of social interactions. This project focuses on the growing phenomenon of online social reading. It is an exploratory study that exploits a new data source and develops new methodologies. Goodreads holds a wealth of qualitative data about reading experience, text evaluation, and social interactions about reading. It would take an experimental researcher an entire career to gather and analyze even just a fraction of the data that is readily available on this website. So far, this treasure trove of data has not been empirically investigated, and this is partly due to the fact that new methodologies have to be developed to extract the data from the website in a meaningful way. This is exactly the gap that our project aims to fill. In order to investigate meaningful ways in which such a reader review corpus might then be used, we are also developing computational linguistics methods to mine the extracted corpus with a specific reader response in mind, namely absorption - the feeling of being lost in a book (Nell, 1988; Kuijpers, 2014). By analyzing reader reviews on Goodreads using textual entailment and text reuse detection (methods from computational linguistics) and comparing them to statements on the Story World Absorption Scale (SWAS; Kuijpers, Hakemulder, Tan & Doicaru, 2014), we will investigate: (1) the potential of converting Goodreads into an extensive qualitative corpus for the computational analyses of reader responses; (2) the validation of the SWAS through comparison with reviews on Goodreads; and (3) the comparison of readers' absorption across different genres. It is important to study these online social reading phenomena, as they are becoming exceedingly popular and provide new ways for people of all ages to acquire storytelling and literacy skills (Coiro, Knobel, Lankshear, & Leu, 2014). The potential impact of this project is widespread as it will construct a new corpus of interest to researchers from different fields and develop methodologies that can be fine-tuned to be used on various other online corpora that are made up of natural language.