Transforming Statistical Data Research with StatGPT
Unlock the potential of Natural Language Processing to enhance access to official statistics.
90%
Quality rate of automated data search2x
Increase in self-service research by business users4x
Faster access to cross-dataset statistical informationChallenge
Statistical agencies often struggle to deliver quality data to consumers and internal researchers. The data can be difficult to access, requiring complex queries spanning several datasets to retrieve what researchers need.
Off-the-shelf AI solutions can answer some questions about these statistics, but often, they hallucinate or fail to reference the source data. By providing a unified, AI-ready format for statistical data known as SDMX, entities like the International Monetary Fund (IMF) have been able to standardize how AI tools access their data.
Custom-built agentic AI tools optimized for the domain – like StatGPT – enable democratization of access to this data.
Industry
EconomyDev team size
10 developersDuration
2 yearsSolution
StatGPT is a multi-stage solution that begins with the ingestion of data sources and their descriptive metadata, forming a semantic layer powered by the QuantHub ecosystem.
Once the data is onboarded, users can interact with an agentic chatbot capable of retrieving statistical data by processing natural language queries. The StatGPT portal extends the user exploration journey by providing a convenient interface for reviewing charts and advanced data query editing.
StatGPT operates as an intelligent orchestration framework that recognizes user intent consequently forming and activating corresponding agentic chains including a domain-specific knowledge-base agent, a query-constructing agent, a dataset-exploring agent, a general information agent, and several others. Importantly, before serving any knowledge or data requests, StatGPT employs a guardrail mechanism to ensure that all user interactions comply with organizational guidelines.
The main purpose of the application is to help researchers access the data they need from multiple datasets by transforming natural language requests into specific SDMX queries that retrieve and visualize data, thereby overcoming the limitations of traditional faceted-search data explorers.
Multilingual Natural Language Interface
Researchers and non-technical users can retrieve statistical data in a conversational manner. In addition to English, StatGPT is customizable to support other languages.
Custom Interface for Advanced Query Editing
A so-called hybrid workflow, enables advanced users to further expand and specify data queries and dynamically review data charts to enhance the AI data search.
Human in the Loop Validation
The agentic system continuously requests feedback and asks for clarification from the user to ensure correct interpretation of request or fill information gaps, and increase user satisfaction.
Results
Data retrieval without hallucinations
Chatbot answers are always grounded in the data and can be validated by a manual faceted search.
Collaborative ecosystem
StatGPT enables users to share their research conversations with colleagues.
Index of certified AI-ready statistics
StatGPT is the main engine behind Global Trusted Data Commons initiative that aims to build up a single source of official statistics worldwide and democratize access to data for general public.

Our Interface
StatGPT UI
StatGPT UI
StatGPT UI



Used Components
DIAL Ecosystem
Set of tools, agents and applications for building AI-powered business solutions
DIAL Core
Primary system component and integration center