Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages

This dataset contains the data produced for the dissertation ""User Interface Variations in Search Engine Results Pages Across Types of Search Queries and Search Engines"". The project was conducted by student Adelaide Miranda Santos at FEUP, University of Porto, as part of the Masters in Informatics and Computing Engineering.

The primary objective of this work is to study interface variations in search engine results pages (SERPs) across different search engines and types of search queries. To this end, nearly 8,000 SERPs were captured using the ORCAS-I-gold dataset across six leading web search engines: Google, Microsoft Bing, Yandex, Yahoo!, Baidu, and DuckDuckGo. For each captured SERP, the number of occurrences of each interface element was recorded. Additionally, to analyze how the language of a search query affects SERP composition in Yandex and Baidu, the original English queries were translated into Russian and Simplified Chinese."

The dataset is organized in the following folders:

Search Query Dataset Translation Contains the search queries from the ORCAS-I-gold dataset translated into Russian and Simplified Chinese. The translation was made using ChatGPT-4o and verified by native speakers. In addition to the translated queries, the complete original ORCAS-I-gold dataset is also included as an independent resource.

SERP Captures Includes HTML files of the search engine results pages collected from Baidu, Microsoft Bing, DuckDuckGo, Google, Yahoo!, and Yandex. Each top-level subfolder is named after the respective search engine. Within each of these, there are folders named according to the language and the query intent associated with the search query. These folders contain the corresponding SERP HTML files. File names represent the search queries and may be either encoded or displayed as in the original dataset.

Occurrence of Elements per SERP

For each captured SERP, we recorded the frequency of each interface element. This data is organized in a relational database structure composed of the following CSV files: - elements.csv: Lists all identified SERP elements along with their corresponding IDs, categories, types, and subtypes (if applicable). - identifiers.csv: Contains the selectors or identifiers used for automatic detection of each element, along with their associated element ID, identifier ID, and the corresponding search engine ID. - intents.csv: Maps query intent names to their corresponding intent IDs. - search-engines.csv: Maps search engine names to their corresponding IDs. - main.csv: Records the frequency of each element in each captured SERP. Each row represents an observation and includes the following fields: element ID, identifier ID, search engine ID, query language, intent ID, query ID (as defined in the ORCAS-I-gold dataset), and the number of occurrences.

Data and Resources

Search Query Dataset TranslationCSV
Contains the search queries from the ORCAS-I-gold dataset translated into...
Explore
- More information
- Download
SERP Captures (1)HTML
Includes HTML files of the search engine results pages collected from Baidu,...
Explore
- More information
- Download
SERP Captures (2)HTML
Includes HTML files of the search engine results pages collected from Yahoo!,...
Explore
- More information
- Download
Occurrence of Elements per SERPCSV
For each captured SERP, we recorded the frequency of each interface element....
Explore
- More information
- Download

Additional Info

Mező	Érték
Szerző	Adelaide Miranda Santos & Carla Teixeira Lopes
Last Updated	július 24, 2025, 13:35 (UTC)
Created	július 22, 2025, 13:06 (UTC)
Citation	Santos, A., & Teixeira Lopes, C. (2025). Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages [Data set]. INESC TEC. https://doi.org/10.25747/R7EW-WH96
Creation Date	2025-06
DOI	https://doi.org/10.25747/r7ew-wh96
Data Collection Method	Python and Selenium Webdriver for web scraping the SERPs, ChatGPT-4o to translate the search queries and BeautifulSoup library to automatically detect the SERP elements in the captured files.
Formátum	CSV and HTML
Language	English, Simplified Chinese and Russian
Temporal Coverage	Between the 7th and 25th of November 2024