Authors:
(1) Yagci, Nurce, HAW Hamburg, Germany & nurce.yagci@haw-hamburg.de;
(2) Sünkler, Sebastian, HAW Hamburg, Germany & sebastian.suenkler@haw-hamburg.de;
(3) Häußler, Helena, HAW Hamburg, Germany & helena.haeuessler@haw-hamburg.de;
(4) Lewandowski, Dirk, HAW Hamburg, Germany & dirk.lewandowski@haw-hamburg.de.
Table of Links
Objectives and Research Questions
Conclusion, Research Data, Acknowledgments, and References
DISCUSSION
Our study examined whether there are differences in the sources of the top search results between Google and alternative search engines based on popular search queries. An evaluation of root domains of the most popular sources shows only a small overlap between Google and the alternative search engines (RQ1). Overall, we found an overlap of 27% to 28% between Google and the alternatives in German results, and in the US, the overlap ranges from 24% to 25%. There is a significantly higher overlap between the alternative search engines of about 63% to 70% in German results and 62% to 65% in US results. This may be explained by all three alternative search engines using Bing's index at least in part. However, our findings show that Metager consistently has the highest overlap with Bing, going up to 78% and only overlapping as much as 64% with DuckDuckGo. The lower overlap of Google with the alternative search engines had already been shown in the studies of Agrawal et al. (2016) and Makhortykh et al. (2020). Our study provides further evidence for this.
When looking at the uniqueness of sources across all German queries, we found that the search engines returned a similar total number of sources, ranging from 2,693 to 2,841. However, for the US queries, the variety in Google was noticeably higher, with over 4,000 total sources compared to around 3,500 of the alternatives.
The most popular domain was Wikipedia, followed by sources we classified as News services (RQ2). This is likely exacerbated by selecting Google Trends as the source of our search queries, which usually includes queries related to popular news stories, sports, and celebrities. Still, it is consistent with previous studies' findings (Steiner et al., 2022) which already showed that Wikipedia and news made up the majority of search results. Furthermore, in our research, Wikipedia and news sources were the top domains across all results and the most frequent sources for each search engine individually.
When comparing the result sets of the top 10 results from Google, Bing, DuckDuckGo, and Metager, using 3,537 queries generated from Google Trends from Germany and the US, it is interesting to see that news sources are far more prevalent in German results, with social media being very infrequent. On the other hand, the US results had more social media websites. Interestingly, in the US results, YouTube was in the top 10 most popular domains in all search engines but Google. The same was the case for the top 50 domains in German results. This is unexpected because YouTube is a subsidiary of Google. However, this finding may be explained by the fact that we only collected organic search results, and Google might be using universal search results to display YouTube results. Another interesting difference is the greater preference of Wikipedia in Google in US results (1,892, 10,2% of all domains) compared to German results (658, 4% of all domains). Furthermore, the second and third most popular sources on Google are Instagram and Facebook for the US results, while they are not even in the top 10 of German Google sources. Finally, in terms of what is missing from Google in the US results, it is notable that Fox News is not found in the top 50 sources, while it is present in all of the alternatives.
The concentration of sources and source diversity showed a tendency for only a few root domains to make up a large share of search results. The Gini Index values of 0.73 and 0.79 in Germany and the United States, respectively, are a clear indicator (RQ3). This is consistent with findings from previous studies (Höchstötter & Lewandowski, 2009) that showed that only a few top sources dominate the search results in search engines.
Of course, our study is not without limitations. First, the selection of search queries is the most significant factor in compiling the data that was evaluated. Even though the number of queries is high and there is some diversity in the queries, the topics are almost always focused on news, celebrities, and sports, which inevitably leads to many news sources in the search results. A refined approach to selecting search queries would be appropriate for a more accurate evaluation of source diversity. For example, this could be achieved by focusing on socially controversial topics and choosing the queries accordingly.
Further limitations arise in the evaluation of source types. The classification we used in this study is very broad. A more precise classification would help make statements about the actual kinds of sources and thus to determine even more precisely which preferences and biases exist in different search engines. For example, grouping sources according to seriousness and reliability could serve as an explanation for the selection of sources to be displayed in search results. Regarding the search results collected in this study, another limitation is that only organic search results were considered. We did not consider advertisements or universal search results, although these have a strong influence on what users see on the SERP. The relevance of the search results was also not considered.
This paper is available on arxiv under CC 4.0 license.
Lead image by Edho Pratama on Unsplash