ScholarAssist: Agentic Data Management for Scholarly Corpora

An agentic system for multi-step reasoning, synthesis, and knowledge generation over large-scale scholarly data.

Overview

ScholarAssist is an agentic scholarly data management system that manages and performs multi-step reasoning over large-scale scholarly corpora. It supports users in addressing open-ended and complex scholarly queries, going beyond simple retrieval to deep understanding, synthesis, and knowledge generation.

Which Queries Do We Support?

Our system supports scholarly analysis through three progressively more challenging query type. It first enables paper retrieval to gather relevant works. Building on this, it performs information extraction and synthesis, from single-paper understanding to multi-paper comparison and synthesis. At the highest level, it achieves knowledge discovery and generation, identifying milestone papers, trends, and research opportunities.

Three tiers of scholarly queries supported by AgenticScholar

Figure 1: The three tiers of queries supported by ScholarAssist

System Architecture

To address these queries, we build an agentic scholarly data management system that autonomously interprets user intents, plans complex analytical pipelines, and executes them end-to-end within the rich semantic context of a scholarly ecosystem.

AgenticScholar System Architecture and Performance Goals

Figure 2: The architecture of ScholarAssist, featuring Knowledge Representation, Hybrid Planning, and Unified Execution layers.