A Comparison of Source Distribution and Result Overlap in Web Search Engines

cover
11 May 2024

Authors:

(1) Yagci, Nurce, HAW Hamburg, Germany & nurce.yagci@haw-hamburg.de;

(2) Sünkler, Sebastian, HAW Hamburg, Germany & sebastian.suenkler@haw-hamburg.de;

(3) Häußler, Helena, HAW Hamburg, Germany & helena.haeuessler@haw-hamburg.de;

(4) Lewandowski, Dirk, HAW Hamburg, Germany & dirk.lewandowski@haw-hamburg.de.

Abstract and Introduction

Literature Review

Objectives and Research Questions

Methods

Results

Discussion

Conclusion, Research Data, Acknowledgments, and References

ABSTRACT

When it comes to search engines, users generally prefer Google. Our study aims to find the differences between the results found in Google compared to other search engines. We compared the top 10 results from Google, Bing, DuckDuckGo, and Metager, using 3,537 queries generated from Google Trends from Germany and the US. Google displays more unique domains in the top results than its competitors. Wikipedia and news websites are the most popular sources overall. With some top sources dominating search results, the distribution of domains is also consistent across all search engines. The overlap between Google and Bing is always under 32%, while Metager has a higher overlap with Bing than DuckDuckGo, going up to 78%. This study shows that the use of another search engine, especially in addition to Google, provides a wider variety in sources and might lead the user to find new perspectives.

KEYWORDS

Web search; search engine; web scraping; Google; source comparison

INTRODUCTION

Why should there be more than one search engine? While users may prefer one search engine over others for its usability, specialized features, or a more convenient integration into their technical environment, the question that interests us in this research is whether a user will benefit from using another search engine than Google when it comes to finding results from different sources. Our starting point is the fact that Google is the most-used search engine by far (StatCounter, 2022), that user to a large degree trust search engines to provide them relevant and useful results (European Commission, 2016; Purcell et al., 2012), and that only some users use another search engine in addition to Google (Schultheiß & Lewandowski, 2021).

Users place great trust in search engines. This is reflected by the 91% of US users who said they find what they are looking for always or most of the time, and the 66% who believe search engines are a fair and unbiased source of information (Purcell et al., 2012). Furthermore, 78% of European internet and online platform users said they trust that their search engine results are the most relevant results (European Commission, 2016). Globally, users trust search engines more than any other source (including traditional news outlets) when it comes to news (Edelman Trust Institute, 2022) and users trust news found via search significantly more than news found on social media

(Newman et al., 2021).

As the Web is enormous and different search engines might prefer different sources, it is interesting to see whether the top sources shown in search results differ from one search engine to the other. It might be that an alternative search engine prefers results from "alternative" sources, e.g., in terms of political leaning or preferring noncommercial content providers. This all comes down to whether alternative search engines are actually alternatives in regards to the results they display. If they were, possible benefits of using a search engine other than Google include finding different results, finding additional results, and finding more relevant results. No matter which of these goals a user aims to achieve, they will need other results than Google's. Therefore, it is interesting to see whether other search engines provide users with such results.

There has been an ongoing discussion on alternative search engines and how Google's dominance in the search engine market can be broken. Approaches range from establishing single alternative search engines to building infrastructures for such alternatives (e.g., Lewandowski, 2019); also see Mager, 2014). With Google dominating the search engine market (StatCounter, 2022), it often seems that there are no alternatives at all. On the other hand, the number of alternative (or simply "other") search engines is often overestimated. Many seem-to-be search engines are merely search portals displaying results from a partner instead of generating the results from their own index. For instance, Yahoo and Ecosia get their results from Bing and can therefore not be considered search engines in their own right. But still, there may be other reasons for using a search engine without its own index. Some of the unique benefits alternative search engines advertise are privacy (e.g., Startpage and DuckDuckGo) or being a company investing its profits in environmental projects (e.g., Ecosia). Another type of search engine is the meta search engine (e.g., Metager). Such an engine sends the queries to several other search engines, then aggregates and re-ranks the top results. We deem it especially interesting whether such an approach will lead to a wider variety of search results, i.e., results from a more diverse set of sources. So, in the context of our research, we will consider any search engine that either has its own index or provides a unique selection and re-ranking of results from one or more indexes as an alternative search engine. We are especially interested in the differences in the source distribution; the relevance of the results is out of the scope of our research.

More than 20 years ago, Introna & Nissenbaum (2000) argued that search engines as commercial operations tend to prefer big websites and, therefore, a portion of the Web, i.e., the smaller sites, remain hidden from view. Studies measuring what users select seem to confirm this: Goel et al. (2010) found that within Yahoo, only 10,000 websites account for approximately 80% of result clicks. It is important to note that this does not merely result from user preference for particular sources but that users predominantly select from the top results shown by the search engine. What is out of the immediate view of users will not be chosen (Lewandowski & Kammerer, 2021).

It is striking that few studies have compared the results between different search engines in recent years. Older studies (see literature review section) overall found that top results from different search engines did not overlap too much. In this paper, we address how the top results of Google differ from alternatives and, therefore, whether it is worthwhile for a user to consider these alternatives. If a search engine other than Google produced very similar results to Google, a user would not benefit much from using that search engine when source variety is considered.

This paper is available on arxiv under CC 4.0 license.

Lead image by Justin Morgan on Unsplash