Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages

This dataset contains the data produced for the dissertation ""User Interface Variations in Search Engine Results Pages Across Types of Search Queries and Search Engines"". The project was conducted by student Adelaide Miranda Santos at FEUP, University of Porto, as part of the Masters in Informatics and Computing Engineering.

The primary objective of this work is to study interface variations in search engine results pages (SERPs) across different search engines and types of search queries. To this end, nearly 8,000 SERPs were captured using the ORCAS-I-gold dataset across six leading web search engines: Google, Microsoft Bing, Yandex, Yahoo!, Baidu, and DuckDuckGo. For each captured SERP, the number of occurrences of each interface element was recorded. Additionally, to analyze how the language of a search query affects SERP composition in Yandex and Baidu, the original English queries were translated into Russian and Simplified Chinese."

The dataset is organized in the following folders:

Search Query Dataset Translation Contains the search queries from the ORCAS-I-gold dataset translated into Russian and Simplified Chinese. The translation was made using ChatGPT-4o and verified by native speakers. In addition to the translated queries, the complete original ORCAS-I-gold dataset is also included as an independent resource.

SERP Captures Includes HTML files of the search engine results pages collected from Baidu, Microsoft Bing, DuckDuckGo, Google, Yahoo!, and Yandex. Each top-level subfolder is named after the respective search engine. Within each of these, there are folders named according to the language and the query intent associated with the search query. These folders contain the corresponding SERP HTML files. File names represent the search queries and may be either encoded or displayed as in the original dataset.

Occurrence of Elements per SERP

For each captured SERP, we recorded the frequency of each interface element. This data is organized in a relational database structure composed of the following CSV files: - elements.csv: Lists all identified SERP elements along with their corresponding IDs, categories, types, and subtypes (if applicable). - identifiers.csv: Contains the selectors or identifiers used for automatic detection of each element, along with their associated element ID, identifier ID, and the corresponding search engine ID. - intents.csv: Maps query intent names to their corresponding intent IDs. - search-engines.csv: Maps search engine names to their corresponding IDs. - main.csv: Records the frequency of each element in each captured SERP. Each row represents an observation and includes the following fields: element ID, identifier ID, search engine ID, query language, intent ID, query ID (as defined in the ORCAS-I-gold dataset), and the number of occurrences.

Data and Resources

Additional Info

Mező Érték
Szerző Adelaide Miranda Santos & Carla Teixeira Lopes
Last Updated július 24, 2025, 13:35 (UTC)
Created július 22, 2025, 13:06 (UTC)
Citation Santos, A., & Teixeira Lopes, C. (2025). Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages [Data set]. INESC TEC. https://doi.org/10.25747/R7EW-WH96
Creation Date 2025-06
DOI https://doi.org/10.25747/r7ew-wh96
Data Collection Method Python and Selenium Webdriver for web scraping the SERPs, ChatGPT-4o to translate the search queries and BeautifulSoup library to automatically detect the SERP elements in the captured files.
Formátum CSV and HTML
Language English, Simplified Chinese and Russian
Temporal Coverage Between the 7th and 25th of November 2024