This work package provides a systematic mass media content analysis and insight into how the press in different national contexts prioritises and frames different EU topics and how this can shape the attitude towards the EU decision-makers or policies involved in the national public spheres
The data collection started in the first period of the project continued during 2016 and was finalized at the beginning of 2017. The collection implied several revisions of the search queries and of the list of keywords in each of the 10 languages, following a cyclical process based on:
– Several language and grammar checks (i.e. all possible forms of the keywords in each language, in order to be able to subtract suffixes and identify the root word).
– Consultations with native speakers.
– Running the queries on existing news archives (i.e. Lexis Nexis) and the individual keywords on the newspapers’ websites to see how many (and how relevant) results they produce.
– Analyzing the results of the test searches to filter out irrelevant keywords.
We have devised complex search queries using multiple logical operators with the aim to filter out irrelevant content. Once the keywords search queries list for all ten languages has been finalized, we started collecting the data through a twofold approach, depending on the accessibility of the content.
The data collection process produced a dataset with 121,170 cases (news articles metadata for each of the 4 topics: Brexit, Immigration, Economy, Security) and 11 variables (Country, Outlet, Archive Date, Topic, Article Headline, Publication Date, Sample Sentence, Wordcount, Article links, Article ID, Source). The dataset is easy to operate and navigate, necessitating only basic Excel skills, and is compatible with most other software commonly used in statistics or programming (ex. R, STATA, SPSS, Python). However, the dataset does not include the full text content of the news articles collected, which is subject to copyright laws, but does include the metadata that allows for replication. This metadata dataset can be publicly shared, to facilitate the replication of the online media analysis and to increase the transparency of the research. The 121,170 text files resulted from the data collection represent the corpus for the text analysis that is currently carried out in the last part of the project.
We are writing scholarly papers on the interactions between media content and public opinion using automated text analysis of the corpus in conjunction with survey and social media data from EUENGAGE and other sources.