ScholarAssist: Agentic Data Management for Scholarly Corpora
An agentic system for multi-step reasoning, synthesis, and knowledge generation over large-scale scholarly data.
Overview
ScholarAssist is an agentic scholarly data management system that manages and performs multi-step reasoning over large-scale scholarly corpora. It supports users in addressing open-ended and complex scholarly queries, going beyond simple retrieval to deep understanding, synthesis, and knowledge generation.
Which Queries Do We Support?
Our system supports scholarly analysis through three progressively more challenging query type. It first enables paper retrieval to gather relevant works. Building on this, it performs information extraction and synthesis, from single-paper understanding to multi-paper comparison and synthesis. At the highest level, it achieves knowledge discovery and generation, identifying milestone papers, trends, and research opportunities.
Figure 1: The three tiers of queries supported by ScholarAssist
System Architecture
To address these queries, we build an agentic scholarly data management system that autonomously interprets user intents, plans complex analytical pipelines, and executes them end-to-end within the rich semantic context of a scholarly ecosystem.
Figure 2: The architecture of ScholarAssist, featuring Knowledge Representation, Hybrid Planning, and Unified Execution layers.