About the Knowledge Fabric
The RMBL Knowledge Fabric is a unified search and discovery platform for environmental research at the Rocky Mountain Biological Laboratory in Gothic, Colorado. It connects scientific publications, community documents, research datasets, news stories, and a knowledge graph of species, concepts, protocols, and places studied at one of the longest-running field biology stations in North America.
At a Glance
Frequently Asked Questions
What is the RMBL Knowledge Fabric?
The Knowledge Fabric is a search and discovery tool that brings together the scientific output of RMBL and the Gunnison Basin into one searchable platform. It includes peer-reviewed publications dating back to 1928, community and policy documents from the Sustainable Living Library, and research datasets from multiple repositories. A knowledge graph connects these resources through shared species, concepts, research methods, and geographic locations.
Who is this for?
The Hub is designed for researchers, students, land managers, community members, and policymakers interested in the environmental research and stewardship of the Gunnison Basin. It is equally useful for scientists looking for related work and for community members exploring how research connects to local policy issues.
What are Knowledge Neighborhoods?
Knowledge Neighborhoods are research communities detected automatically by analyzing the connections in the knowledge graph. Using a community-detection algorithm (Louvain), the system identifies clusters of tightly connected authors, publications, species, concepts, and places. Each neighborhood represents a distinct research theme — from marmot behavioral ecology to watershed biogeochemistry to federal land management policy. Many neighborhoods include AI-generated research primers that summarize the key findings and cite specific publications.
How do I use the API or MCP server?
The Hub provides a REST API at /api/v1/ with endpoints for search, publication detail, entity lookup, related works, and more. Add ?format=text to any endpoint for LLM-friendly plain text. For AI assistants like Claude Desktop, an MCP server is available — see the MCP documentation for setup instructions. See /llms.txt for a machine-readable index of available endpoints.
How can I help improve the data?
Every detail page has a “Report an issue” link below the title. Click it to flag a record that has incorrect data, is a duplicate, is missing information, or has other problems. You can describe what’s wrong and suggest corrections — no account needed.
Flags are reviewed by RMBL administrators through the Payload CMS admin panel. You can optionally include your email address if you’d like to be notified when the issue is resolved.
For technical issues with the site itself (bugs, broken features), please submit an issue on the GitHub repository.
AI Integration
The Knowledge Fabric can be queried by AI assistants via the REST API or the MCP (Model Context Protocol) server. This allows tools like Claude Desktop, ChatGPT, and custom scripts to search publications, explore research neighborhoods, and access the knowledge graph programmatically.
REST API
All API endpoints are at /api/v1/ and support ?format=text for LLM-friendly plain text. See /llms.txt for a complete list. Examples:
# Search for publications about alpine pollination curl "https://rmblknowledgefabric.org/api/v1/search?q=alpine+pollination&format=text" # Get publication details curl "https://rmblknowledgefabric.org/api/v1/publications/13?format=text" # Explore a research neighborhood with primer curl "https://rmblknowledgefabric.org/api/v1/neighborhoods/620?format=text" # Look up a species curl "https://rmblknowledgefabric.org/api/v1/entities/species/8426?format=text" # Find related works curl "https://rmblknowledgefabric.org/api/v1/related/publications/13?format=text"
MCP Server for Claude Desktop (recommended)
The easiest way to connect: add the Knowledge Fabric as a Custom Connector in Claude Desktop. No installation required — just a URL.
Option A: Remote connector (no install):
- Open Claude Desktop → Settings → Connectors
- Click Add custom connector
- Enter URL:
https://www.rmblknowledgefabric.org/api/mcp - 8 Knowledge Fabric tools are immediately available
Option B: Local server (for development):
git clone https://github.com/ikb-rmbl/RMBL_knowledge_fabric.git cd RMBL_knowledge_fabric/mcp npm install && npm run build
Then add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"rmbl-knowledge-fabric": {
"command": "node",
"args": ["/path/to/RMBL_knowledge_fabric/mcp/dist/index.js"],
"env": {
"RMBL_API_URL": "https://www.rmblknowledgefabric.org"
}
}
}
}Try asking:
- “Search for publications about marmot hibernation at RMBL”
- “What is research neighborhood 620 about?”
- “Find works related to publication 13”
- “Look up the species Marmota flaviventer”
Note: The MCP server currently supports Claude Desktop and other clients that use the Streamable HTTP transport. OpenAI/ChatGPT requires the older SSE transport with long-lived connections, which is not compatible with our serverless hosting. We plan to add OpenAI support when they adopt the Streamable HTTP standard. In the meantime, ChatGPT users can access the same data via the REST API with ?format=text.
Available MCP Tools
| Tool | Description |
|---|---|
| search_rmbl | Full-text search across all collections |
| get_publication | Publication detail with authors, abstract, entities, citations |
| get_dataset | Dataset detail with creators and entities |
| get_document | Document detail with entities and stakeholders |
| get_entity | Entity lookup (species, concept, protocol, place, stakeholder) |
| find_related | Related works via semantic similarity, shared entities, co-authorship, citations |
| explore_neighborhood | Research neighborhood detail with primer |
| list_neighborhoods | Browse or search 154 research neighborhoods |
Technical Deep-Dive
The sections below describe how data flows into the Knowledge Fabric and how the knowledge graph is constructed.
Data Sources
Publications are sourced from the RMBL publications database, with additional discovery via OpenAlex and CrossRef. Each record is enriched with metadata from CrossRef (authors, DOIs, abstracts, citation counts) and Unpaywall (open access links). Full text is extracted from PDFs using pdftotext with OCR fallback via Tesseract.
Datasets are discovered from eight repository sources including EDI, DataONE, Dryad, Zenodo, USGS ScienceBase, Pangaea, NCBI, and Figshare. Each dataset is enriched with EML/DataCite metadata including temporal and spatial coverage, creator information, and licensing.
Documents come from the Sustainable Living Library, a collection of community and policy documents relevant to the Gunnison Basin. These include management plans, environmental impact statements, water quality reports, and local planning documents.
Stories are news articles about RMBL and the Gunnison Basin from local newspapers (Crested Butte News, Gunnison Country Times) and national/international outlets via LexisNexis. Full text is stored for search indexing and entity extraction but is not displayed on detail pages to respect copyright. Each story links to its original source when available.
Author Deduplication
Authors are deduplicated across all collections using a two-phase process. First, authors with matching ORCID identifiers are merged. Then, authors sharing the same family name are compared by given name initials, with checks to prevent false merges when middle initials differ (e.g., “R. J. Smith” is kept separate from “R. A. Smith”). Author ordering on publications is repaired from CrossRef metadata to ensure correct first-author attribution.
Entity Extraction & Knowledge Graph
Entities (species, concepts, protocols, places, and stakeholders) are extracted from publication and document full text using Claude vision models (VLM extraction). Each entity mention is linked to its source item with a confidence score and extraction method. Entities are then deduplicated using embedding-based clustering (Voyage AI voyage-4, 1024 dimensions) with type-specific similarity thresholds.
Species names are validated against the ITIS (Integrated Taxonomic Information System) database. Places are enriched with coordinates from GNIS (Geographic Names Information System) and organized into a parent-child hierarchy.
The resulting knowledge graph has 135,205 entity mentions linking items to entities, plus 143,289 citation references with internal cross-links between publications.
Community Detection & Primers
Knowledge Neighborhoods are detected using the Louvain community detection algorithm on the unified knowledge graph. The graph includes all entities and items as nodes, with edges from co-occurrence in publications, co-authorship, and citations. Edge weights are boosted for structural relationships (co-authorship ×5, citations ×3) to ensure that social and citation structure drives community boundaries rather than just shared terminology.
Research primers are generated for the largest neighborhoods using Claude (Opus model) with tiered context assembly: landmark papers (full abstracts + key findings), frontier papers (2020+), breadth papers (single best finding each), and entity context (species, concepts, methods, places). Each primer includes parenthetical citations linked to specific publications in the Hub. Policy-focused neighborhoods receive primers with document citations instead.
Search & Similarity
Full-text search uses PostgreSQL tsvector with weighted ranking (title > abstract > full text) and stemmed query matching. Search results include highlighted snippets via ts_headline.
Related works are found using four signals: semantic similarity (pgvector cosine distance on Voyage AI embeddings), shared entity mentions (at least 3 shared entities), co-authorship (shared authors across publications), and citation links (from the references_cited table). Signals are merged with a multi-signal bonus for items connected by multiple pathways.
Technology Stack
The Knowledge Fabric is built with Next.js and Payload CMS on PostgreSQL with pgvector. Graph visualizations use Sigma.js (WebGL). The data pipeline is a set of TypeScript scripts for scraping, enrichment, entity extraction, and graph construction. Vector embeddings are generated by Voyage AI (voyage-4, 1024 dimensions). The site is hosted on Vercel with the database on Neon (serverless PostgreSQL).
The project is open source at github.com/ikb-rmbl/RMBL_knowledge_fabric.
Feedback & Contact
The Knowledge Fabric is an evolving platform and we welcome feedback from the community. If you notice missing publications, incorrect data, broken links, or have ideas for new features, there are two ways to get in touch:
- Report an issue on GitHub: github.com/ikb-rmbl/RMBL_knowledge_fabric/issues — best for bug reports, data corrections, and feature requests.
- Contact the developer: Ian Breckheimer — ikb@rmbl.org
Acknowledgments
The RMBL Knowledge Fabric was developed with support from the Clark Family Foundation. Built by RMBL using data from CrossRef, OpenAlex, Unpaywall, ITIS, GNIS, and multiple data repositories.