Accepted Tutorials
Presenters: Bhawna Piryani, Avishek Anand, and Adam Jatowt
Description: Time plays a crucial role in how we retrieve, interpret, and reason over information. As knowledge on the Web continuously evolves, information retrieval (IR) and question answering (QA) systems must recognize not only what is relevant but also when it is valid. This tutorial provides a comprehensive overview of Temporal Information Retrieval (TIR) and Temporal Question Answering (TQA), two closely related fields that address temporal relevance, reasoning, and adaptation in information access. We trace the evolution of temporal methods from early rule-based and probabilistic approaches to modern transformer and large language model (LLM) architectures, highlighting how temporal modeling, reasoning, and retrieval-augmented generation (RAG) are reshaping the field. Participants will learn the fundamental principles of temporal IR/QA, explore pre-LLM and neural methods, and examine recent advances in temporal RAG and temporal reasoning over evolving knowledge. The tutorial concludes with open challenges and future directions for building temporally robust and adaptive AI systems. By bridging classical IR concepts with modern LLM-based reasoning, this tutorial offers a timely and unified perspective on temporal information access for the evolving Web.
Duration: 1/2 day
Presenters: Xin Wang, Yuwei Zhou, Zirui Pan, and Wenwu Zhu
Description: This tutorial aims to disseminate and promote recent research advancements in multi-modal generative AI, focusing on two dominant families of techniques: multi-modal large language models (MLLMs) for understanding and diffusion models for visual generation. We will provide a systematic discussion of MLLMs and diffusion models, covering their probabilistic modeling methods, architectures, and multi-modal interaction mechanisms. In dynamic and open environments, shifting data distributions, emerging concepts, and evolving complex application scenarios create significant obstacles for multi-modal generative models. This tutorial explores solutions and future directions to address these challenges from two aspects: one is generalizable post-training techniques to adapt multi-modal generative models to new concepts, and the other is the unified multi-modal generation and understanding framework for complex multi-modal tasks.
Duration: 1/2 day
Presenters: Chen Xu, Clara Rus, Yuanna Liu, Marleen de Jonge, Jun Xu, and Maarten de Rijke
Description: Fairness is a crucial aspect of a responsible Web, as systematic algorithms may discriminate against certain groups and harm the overall web ecosystem. To address this issue, numerous fairness-aware information retrieval (IR) models and evaluation metrics have been proposed. However, the inherent complexity of both fairness and IR systems makes it difficult to systematically summarize the progress achieved so far. This complexity calls for a more structured and novel perspective to re-examine and guide future directions in fairness-aware IR research. The field of economics has a long history of studying fairness, providing a rich theoretical and empirical foundation. Similarly, the Web ecosystem can be viewed as a specialized economic market, where a system-oriented perspective enables the integration of IR fairness into a broader and more structured framework. In this tutorial, we begin by drawing parallels between the components of IR systems and those of economic markets, illustrating how IR systems can be understood as a form of economic system. Next, we organize fairness algorithms within an economic cube, where each dimension represents a distinct fairness taxonomy: macro vs. micro, demand vs. supply, and short-term vs. long-term. Finally, we demonstrate how this economic framework can be applied to most fairness algorithms and a variety of real-world IR applications. Unlike previous fairness-aware tutorials, our tutorial not only offers a clear and novel perspective on fairness but also encourages the use of economic tools to address fairness challenges. We hope it provides a fresh and comprehensive outlook on building a responsible Web, while highlighting open problems and promising directions for future research.
Duration: 1/2 day
Presenters: Gayoung Jeon, Cameron Moy, and Deen Freelon
Description: This hands-on tutorial provides researchers with practical tools and frameworks for TikTok data collection for Computational Social Science. Recent work systematically testing three TikTok data collection techniques (currently under review at the Web Conference) reveals TikTok data collection method dramatically alters research results. Participants in our tutorial will learn how to use web-scraping data collection methods (Pyktok and Apify) as well as the official TikTok Research API. This tutorial will explore best practices for data collection from three endpoints—Users, Hashtags, and Comments—using strategies identified through stress testing that: 1) reduce algorithmic selection bias in data collection; 2) substitute or fill missing data by combining multiple tools for a more complete dataset; 3) improve collection efficiency by balancing resources and dataset size (including API recommendations such as stratified sampling and use of the is_random and has_more hyperparameter, and web-scraping strategies for optimal time windows to minimize resource waste). Lastly, we introduce a checklist for reporting data collection procedures and results to increase the transparency, replicability, and generalizability of TikTok research. By engaging in this tutorial, researchers will be equipped with actionable methods to obtain high-quality TikTok datasets and decision-making criteria for optimizing collection settings and parameters to answer their research questions with consideration to available resources.
Duration: 1/2 day
Presenters: Shweta Garg, Behrooz Omidvar-Tehrani, Shengyu Fu, Gauthier Guinet, and Baishakhi Ray
Description: This tutorial explores the next generation of AI-powered software development, where large language models (LLMs) function as collaborative agents that plan, code, test, and review alongside human developers. Using GitHub Copilot, Mistral Code and Kiro as running exemplars, we synthesize the latest advances in multi-agent coordination, reflective collaboration, long-term memory and context maintenance, and tool-integrated verification. We show how role-specialized agent teams, manager–tool routing, and spec-driven development are replacing single-model code completion with structured, repeatable workflows that close the loop from generation to assurance, while addressing critical concerns around evaluation rigor, security guardrails, and safe deployment. Attendees will leave with a practical blueprint for designing and evaluating collaborative coding agents in real engineering environments. The tutorial blends concepts and live demonstrations, covering task decomposition, memory and retrieval, autonomous debugging with runtime feedback, and seamless integration into IDEs, version control, and CI/CD pipelines. We distill proven "do and don't" patterns for reliability, security, cost, and scalability, and map open research challenges to concrete experimental setups and metrics. The result is a concise, research-ready guide for building agentic systems that deliver measurable productivity and quality gains in modern software development.
Duration: 1/2 day
Presenters: Zijian Zhang, Hao Miao, Yuxuan Liang, Yan Zhao, and Irwin King
Description: Spatio-temporal (ST) data, including geo-social media check-ins, location-based services (LBS), and transportation records, offers unprecedented opportunities for analyzing human mobility and urban dynamics. The emergence of Foundation Models and Large Language Models (LLMs) has initiated a profound paradigm shift in this field, moving from traditional, task-specific deep learning toward universal, generalizable intelligence. This lecture-style tutorial provides a timely and comprehensive overview of LLM-Enhanced Web-Centric Spatio-Temporal Intelligence, systematically presenting the methods, applications, and frontier research in the LLM era. Our tutorial is innovatively organized according to three levels of ST analysis: Location-level intelligence focuses on examining spatial activities within specific geographic locations. Region-level intelligence broadens the scope to include large-scale spatial patterns and flows across regions. The tutorial aims to provide the participants with a comprehensive roadmap of the current state and future potential of applying LLMs and FMs to Web-centric spatio-temporal data, making it an invaluable resource for this evolving field.
Duration: 1/2 day
Presenters: Dongping Liu, Aoyu Zhang, and Luyao Zhang
Description: The 2025 Nobel Prize in Physics recognized groundbreaking advances in quantum information science, underscoring the transformative potential of quantum technologies for computation and communication. As these developments accelerate, they simultaneously pose profound challenges to classical cryptographic foundations—particularly the public-key algorithms securing blockchain systems, digital signatures, and distributed consensus. However, despite the clear urgency to address quantum-induced vulnerabilities, the translation of theoretical quantum computing breakthroughs from laboratory research into robust, real-world applications for blockchain security remains limited. Building on this momentum, our tutorial explores how quantum computing and blockchain can jointly redefine the trust, efficiency, and intelligence of next-generation Web systems. We introduce the principles of quantum computing and their implications for secure, scalable blockchain architectures, emphasizing post-quantum and quantum-assisted cryptography. The tutorial situates these technologies within broader contexts of privacy, human–computer interaction, and ethical data governance, highlighting how interdisciplinary collaboration across computer science, economics, and human-centered computing can enable a more trustworthy Web. We further discuss how quantum techniques can enhance computational efficiency, randomness, and resilience for decentralized systems, while open data standards provide a foundation for transparency, interoperability, and reproducibility. The session culminates in an immersive experience where participants engage with cloud-based quantum computation through Amazon Braket, comparing multiple quantum backends to witness firsthand how such tools advance secure and efficient blockchain innovation. By integrating insights from quantum computing, blockchain applications, and artificial intelligence, the session provides a forward-looking roadmap toward establishing the next generation of trustworthy, quantum-ready Web standards.
Duration: 1/2 day
Presenters: Chuan Meng, Fengran Mo, Mohammad Aliannejadi, Jeff Dalton, and Jian-Yun Nie
Description: Conversational search enables multi-turn interactions between users and systems to fulfill users' complex information needs. During this interaction, the system should understand the users' search intent within the conversational context and then return the relevant information through a flexible, dialogue-based interface. Large language models (LLMs) with capacities of instruction following, content generation, and reasoning, attract significant attention and advancements, providing new opportunities and challenges for building up conversational search systems. More recently, LLMs have begun to drive search systems towards agentic paradigms, acting as autonomous entities that can plan strategies, execute dynamic retrieval, and support a wide range of autonomous behaviours. This tutorial aims to connect fundamentals with recent agentic paradigms in conversational search. It is designed for students, researchers, and practitioners from both academia and industry. Participants will gain a comprehensive understanding of both the fundamental principles and the latest developments enabled by LLMs, equipping them with the knowledge to contribute to the next generation of conversational search systems.
Duration: 1/2 day
Presenters: Mahdis Saeedi, Ziad Kobti, and Hossein Fani
Description: Team recommendation involves selecting skilled experts to form an almost surely successful collaborative team, or refining the team composition to maintain and/or excel at performance. To address the tedious and error-prone manual process, various computational approaches have been proposed, especially for the web-scale social networks and widespread online collaboration and diversity of interactions. In this tutorial, with a brief overview of pioneering subgraph optimization approaches and their shortfalls, we deliberately focus on the recent learning-based approaches, with a particular in-depth exploration of graph neural network-based methods. More importantly, and for the first of its kind, we then discuss team refinement, which involves structural adjustments or expert replacements to enhance team performance in dynamic environments. Finally, we discuss training strategies, benchmarking datasets, and open-source libraries, along with future research directions and real-world applications. Further resources are at https://fani-lab.github.io/OpeNTF/tutorial/www26.
Duration: 1/2 day
Presenters: Djallel Bouneffouf, and Raphael Feraud
Description: This tutorial offers a comprehensive guide on using multiarmed bandit (MAB) algorithms to improve Large Language Models (LLMs), with a special focus on enabling agentic behavior—where LLMs act with autonomy, make decisions, and adapt based on feedback. As Natural Language Processing (NLP) tasks grow, efficient, adaptive, and agentic language generation systems are increasingly needed. MAB algorithms, which balance exploration and exploitation under uncertainty, are especially promising for enhancing the decision-making capabilities of such systems. The tutorial covers foundational MAB concepts, including the exploration-exploitation trade-off and strategies like epsilon-greedy, UCB (Upper Confidence Bound), and Thompson Sampling. It then explores integrating MAB with LLMs, focusing on designing architectures that treat text generation options as arms in a bandit problem. Practical aspects like reward design, exploration policies, scalability, and the emergence of agentic traits in LLMs are discussed. Real-world case studies demonstrate the benefits of MAB-augmented, agentic LLMs in content recommendation, dialogue generation, and personalized content creation, showing how these techniques improve relevance, diversity, and user engagement.
Duration: 1/2 day
Presenters: Qiang Sun, Yihao Ding, Sirui Li and Wei Liu
Description: The Web is overflowing with unstructured content, ranging from scientific papers, enterprise documents to social media posts. Unlocking the knowledge hidden in these sources is critical for next-generation Web intelligence, enabling semantic search, advanced reasoning, and deep collaboration with Large Language Models (LLMs). This tutorial presents a comprehensive overview of methods for transforming unstructured Web content into structured Knowledge Graphs (KGs), addressing key challenges in information extraction across multiple dimensions, including entities and relations, events, spatio-temporal indices, visual layouts, and metadata. We then discuss methods for constructing and curating high-quality, multi-perspective KGs at scale. Participants will gain a systematic understanding of state-of-the-art methods, including recent advances in document analysis, document-to-KG approaches, and hybrid systems combining LLMs with structured knowledge, such as LLM-driven knowledge graph construction from unstructured documents, RAG over enterprise knowledge bases, KG-augmented LLMs for grounded reasoning, and neuro-symbolic reasoning pipelines. We will cover the paradigm shift of knowledge graph construction from supervised deep learning models to LLM-assisted knowledge engineering while highlighting open challenges such as scalability, factual consistency, and evaluation. We will demonstrate KG's transformative potential through practical applications in web search, question answering, causal reasoning and scientific discovery.
Duration: 1/2 day
Presenters: Ramesh Raskar, and Pradyumna Chari
Description: The agentic web—where billions of autonomous AI agents discover, communicate, and coordinate across organizational boundaries—requires new foundations spanning technical infrastructure, economic mechanisms, and societal coordination. Just as DNS and HTTP shaped the traditional web's evolution, the architectural decisions we make today for agent registries, protocols, and reputation systems will determine what forms of distributed intelligence emerge tomorrow. This lecture-style tutorial provides a comprehensive framework for understanding the agentic web across three development phases: Foundations (discovery, identity, protocols), Agentic Economy (pricing, reputation, markets), and Agentic Society (population dynamics, governance, coordination). Drawing on recent advances in registry architectures, protocol standards, and resolution mechanisms, this tutorial equips participants with conceptual frameworks and practical insights for designing infrastructure that enables safe, scalable, and sustainable agent ecosystems. The tutorial emphasizes forward-thinking perspectives on open challenges and research opportunities while building on web-native standards.
Duration: 1/2 day
Presenters: Peng Cui, Xingxuan Zhang, Han-Jia Ye, Jintai Chen, and Shuyang Li
Description: Structured data constitutes one of the most ubiquitous data modalities in web-scale and enterprise applications, supporting tasks such as recommendation, forecasting, and user behavior analysis. Conventional modeling paradigms—ranging from generalized linear models and gradient boosting to deep structured networks—have provided strong baselines for predictive analytics and decision support. However, the recent emergence of foundation models and in-context learning (ICL) has sparked a new paradigm for structured modeling, moving from dataset-specific training toward universal, adaptable inference. Emerging structured foundation models illustrate how large-scale pretraining, synthetic data generation, and ICL-based inference can extend foundation-model principles to structured data. These developments open new possibilities for multi-task learning, zero-shot inference, and knowledge transfer across diverse structured settings. Yet, the space of structured foundation models remains largely unexplored, with open questions surrounding data generation, multi-task setting, pretraining objectives, and evaluation standards. This tutorial will provide a structured overview of both conventional modeling and recent ICL-based approaches. Participants will gain a comprehensive understanding of established methods, current advances in foundation models, and open research challenges. In particular, we will offer an in-depth introduction to structured ICL and review the most representative foundation models in this field. Several key topics in this field will be discussed, including pretraining data generation, multi-task learning, and other emerging directions in the modeling of structured data. This tutorial aims to bridge conventional machine learning and the emerging foundation-model paradigm, providing attendees with conceptual and practical insights into structured data modeling in the era of generalist foundation models.
Duration: 1/2 day
Presenters: Xiang Ao, Yang Liu, Guansong Pang, Yuanhao Ding, Hezhe Qiao, Dawei Cheng, and Qing He
Description: Graph learning is transforming web intelligence, powering applications from recommender systems to anomaly detection. However, most existing approaches implicitly assume ideal conditions where training and testing data are accurate, complete, and free from manipulation. In reality, web environments rarely exhibit such stability. Dynamic user behavior, incomplete or outdated content, adversarial interference, and sudden distribution shifts can all erode the reliability of even state-of-the-art models, leading to biased or unsafe outcomes. This tutorial provides a comprehensive survey of emerging strategies for robust graph learning on the web. We first present a structured taxonomy of the principal robustness threats specific to web contexts. Next, we categorize current robust graph learning approaches, spanning data-level preprocessing to model-level adaptation and generalization, and discuss representative techniques in detail. We then showcase real-world case studies illustrating how robustness challenges emerge and how targeted methods can mitigate them in the web system. By integrating theoretical foundations with practical web applications, this tutorial offers researchers, engineers, and platform developers actionable strategies to safeguard graph-based AI in dynamic, high-impact online environments. The video teaser is available here.
Duration: 1/2 day
Presenters: Manali Sharma and Ayush Garg
Description: This tutorial is designed for practitioners who want AI output to stand up to real-world Web constraints. In a browser-only setting with ChatGPT (free tier), it teaches a clear, repeatable workflow for three common tasks: PII-safe summarization, meta-/FAQ generation for researcher profiles, and moderation of user-generated content (UGC). Coverage includes the core prompting techniques from the deck: zero-shot vs. few-shot, chain-of-thought (hidden reasoning), role prompts, output formatting and constraint setting, multi-turn prompt chaining, proactive clarification, and reverse prompting. Participants begin with baseline prompts, add guardrails, and run a mini-evaluation with a small gold set and red-team probes, emphasizing privacy and consent, logging minimization, accessibility, and publish or no-publish thresholds so outputs are deployable and defensible. To illustrate tool-augmented prompting, the session includes publishing an n8n endpoint and, where available, demonstrating invocation from ChatGPT Developer Mode or MCP to produce grounded cited answers. The use of the deep research mode and the agent mode in ChatGPT will also be explained, including how to frame instructions for source quality, action log, and safety. All participants leave with prompt cards, a scoring rubric, and a concise deployment readiness one-pager documenting metrics, failure modes, and limitations, suitable for research, product, agency, and public-sector contexts.
Duration: 1/2 day
Presenters: Ricardo Baeza-Yates, Elizabeth Churchil, and Nicoletta Tantalaki
Description: This tutorial aims to enlighten the ties between human heuristics and AI harms in today's mixed human-AI decision making. A bowtie model and an AI sociotechnical harms taxonomy will explore the role of cognitive science principles as causes of AI harms and consequences, particularly in the context of the Web. Attendees will be encouraged to recognize risk events which challenge responsible human and AI preferences and will gain a deeper understanding of emerging practices through real world examples of varying risk levels, such as in justice, job hiring, financial lending, and health decision making domains.
Duration: 1/2 day