A personal academic paper manager. Upload PDFs, ingest papers from URLs, chat with papers using AI, explore a knowledge graph of authors and topics, and track references β all in a local web app backed by Neo4j and Google Drive.
π Full documentation: https://niklasabraham.github.io/PaperManager/
Browse the docs by section:
| Document | Description | |βββ-|ββββ-| | Getting Started | Installation, configuration, first run | | Ingesting Papers | PDF upload, URL/DOI ingest, bulk import | | Library | Browsing, searching, filtering, sorting | | Paper Detail | Metadata, PDF viewer, figures, notes, chat, references | | Knowledge Features | Graph, knowledge chat, Cypher editor | | MCP Server | Claude Desktop integration |
| Document | Description | |βββ-|ββββ-| | Architecture | System design with module interaction diagrams | | Backend | FastAPI routers, services, DB queries | | Frontend | React pages and components | | Data Model | Neo4j graph schema | | AI Pipelines | Metadata extraction, summarisation, chat | | API Reference | All REST endpoints | | Decisions Log | Architecture decision record |
# 1. Clone and enter
git clone <repo> && cd PaperManager
# 2. Create conda env
conda create -n papermanager python=3.11 -y
conda activate papermanager
pip install -r backend/requirements.txt
# 3. Install frontend
cd frontend && npm install && cd ..
# 4. Copy and fill in your .env
cp .env.example .env
# Edit .env β see Configuration section
# 5. Start everything
./start.sh
# Opens http://localhost:5173
start.sh starts the FastAPI backend (port 8000), the Vite frontend (port 5173), and optionally Ollama. Logs go to /tmp/papermanager-backend.log and /tmp/papermanager-frontend.log.
βββββββββββββββββββββββ HTTP / SSE ββββββββββββββββββββββββββββ
β React Frontend β ββββββββββββββββββββΊ β FastAPI Backend β
β (Vite, port 5173) β β (uvicorn, port 8000) β
βββββββββββββββββββββββ ββββββββββββ¬ββββββββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββ
β β β
ββββββββΌβββββββ ββββββββββΌβββββββ βββββββββΌββββββββ
β Neo4j Aura β β Google Drive β β Anthropic / β
β (graph DB) β β (PDF storage) β β Ollama (AI) β
βββββββββββββββ βββββββββββββββββ βββββββββββββββββ
Backend: Python 3.11, FastAPI, Neo4j driver, httpx, Anthropic SDK, Ollama SDK, Google Drive API, Docling (PDF parsing).
Frontend: React + TypeScript, Vite, Tailwind CSS, react-force-graph (WebGL graph), react-dropzone.
Database: Neo4j Aura (cloud) β stores paper metadata, authors, topics, tags, projects, notes, and the full citation graph. PDFs and figures are stored in Google Drive; only the Drive file ID is kept in Neo4j.
Create a .env file at the project root:
# ββ Neo4j ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
NEO4J_URI=neo4j+s://xxxx.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
# ββ Google Drive βββββββββββββββββββββββββββββββββββββββββββββββ
# OAuth desktop app credentials from Google Cloud Console
GOOGLE_CLIENT_ID=xxxx.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=xxxx
GOOGLE_DRIVE_FOLDER_ID=xxxx # Folder where PDFs will be stored
# ββ Anthropic βββββββββββββββββββββββββββββββββββββββββββββββββ
ANTHROPIC_API_KEY=sk-ant-xxxx
# Anthropic Foundry enterprise gateway (optional)
ANTHROPIC_WORK_API_KEY=xxxx
ANTHROPIC_WORK_BASE_URL=https://your-foundry-gateway.com/...
# ββ App ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
BACKEND_PORT=8000
FRONTEND_URL=http://localhost:5173
# ββ Ollama βββββββββββββββββββββββββββββββββββββββββββββββββββββ
OLLAMA_MODEL=llama3.2:3b # Local model for metadata extraction / tag suggestion
# ββ Corporate network (optional) ββββββββββββββββββββββββββββββ
SSL_VERIFY=true
SSL_CA_BUNDLE=/path/to/corporate-ca.pem
Google Drive auth: On first use the backend opens a browser window for OAuth. Credentials are saved to backend/token.json and reused on subsequent runs.
Drag a PDF onto the Library page or click the + button.
What happens automatically:
llama3.2:3b) on the first 3 000 charsABSTRACT_RE regex β Claude Haiku if regex failsConfirmation modal β Review extracted metadata before committing. You can override any field. A duplicate check runs against existing papers.
Person nodespdf-uploadUpload modal options:
| Option | Default | Description |
|---|---|---|
| Source step | on | Record how you found the paper (person, LinkedIn, Twitter, conference, etc.) |
| Summary prompt step | on | Edit the AI summary instructions before upload |
| Auto-save references | off | Skip reference review and save all automatically |
| Tags step | on | Review AI-suggested tags before saving |
| Summary instructions | built-in | Custom Claude prompt prepended to summarisation |
Click the + button β URL / DOI tab. Paste any of:
| Input format | Example |
|---|---|
| arXiv URL | https://arxiv.org/abs/1706.03762 |
| arXiv ID | 1706.03762 or arXiv:1706.03762 |
| DOI URL | https://doi.org/10.1038/nature14539 |
| Bare DOI | 10.1038/nature14539 |
| PubMed URL | https://pubmed.ncbi.nlm.nih.gov/12345678/ |
| bioRxiv URL | https://www.biorxiv.org/content/10.1101/... |
| medRxiv URL | https://www.medrxiv.org/content/10.1101/... |
Metadata is fetched from the source API (arXiv Atom, Semantic Scholar, CrossRef, PubMed eUtils, bioRxiv). If a real DOI is found, Semantic Scholar is queried for richer data including citation count and affiliations. Paper is auto-tagged from-url. No PDF is stored.
Go to Bulk Import in the nav bar. Upload or paste a JSON file:
{
"fetch_pdf": true,
"project_id": "optional-project-uuid",
"papers": [
{"url": "https://arxiv.org/abs/1706.03762"},
{"arxiv": "1810.04805"},
{"doi": "10.1038/nature14539"},
{"url": "https://pubmed.ncbi.nlm.nih.gov/30082513/"},
{"title": "AlphaFold protein structure prediction"},
{"title": "CRISPR-Cas9 genome editing", "fetch_pdf": false}
]
}
Each entry needs at least one of: url, arxiv, doi, title. You can mix formats freely.
Resolution order per entry:
url β existing URL resolver (arXiv, DOI, PubMed, bioRxiv)arxiv β arXiv APIdoi β Semantic Scholar β CrossReftitle β Semantic Scholar title search β arXiv title search β Ollama-improved arXiv searchPDF fetching (fetch_pdf: true):
arxiv.org/pdf/{id}Progress is shown as a live log stream. Papers that already exist by DOI are reported as βskippedβ. All imported papers are auto-tagged bulk-import.
The main page shows all papers in your library.
Search & filter:
View options:
Paper cards show:
βπ² Surpriseβ button β opens a random paper from your library, great for rediscovering papers.
Dashboard (when no filters active):
Click any paper to open its detail view.
Left panel β shows title, authors, year, DOI, venue, citation count, metadata source, abstract, and AI summary. The Edit button opens a modal to update any field including venue, reading status, and color label.
The top bar includes:
Author and topic chips are clickable β clicking an author opens the People page; clicking a topic filters the library.
The PDF is streamed from Google Drive and rendered inline in the browser. Available when a PDF was uploaded (not for URL-only papers).
Figures are extracted from the PDF and stored on Drive. Each figure shows:
To extract (or re-extract) figures: click Extract Figures and choose a caption method.
A markdown editor attached to each paper. Supports:
@PersonName β links to a Person node (created if not found)#TopicName β links to a Topic node (created if not found)@mentions and #topics become graph relationships (Note -[:MENTIONS]-> Person/Topic)Ask questions about the paperβs full text. Three model options:
| Model | When to use |
|---|---|
| Claude (Opus 4.6) | Best quality, uses personal API key |
| Claude Work | Enterprise Anthropic Foundry gateway |
| Ollama (local) | Fully offline, uses llama3.2:3b |
The full raw_text extracted from the PDF is included in context (truncated to model limits).
Outgoing references (papers this paper cites): extracted on-demand, shown in a review list, then saved as CITES relationships.
Incoming citations (papers in your library that cite this paper): automatically maintained as you import papers.
Reference extraction pipeline:
/references API (requires DOI)REFERENCES section detection)Each saved reference creates a Paper stub node (title + DOI), tagged from-references, and linked with CITES. If you later import the full paper by DOI, the stub is enriched rather than duplicated.
Track authors, collaborators, and colleagues.
AUTHORED_BY (author), INVOLVES (with custom role: shared_by, working_on, collaborating, supervisor, feedback_needed)SPECIALIZES relationship to Topic nodes β used to discover who works on whatGroup papers into named collections.
RELATED_TO)Tags are free-form labels. Topics are research areas (more structured, used in the knowledge graph and specialties).
157 tags are seeded on startup across categories:
pdf-upload, from-url, from-references, bulk-import, from-linkedin, from-twitter, from-email, from-conference, from-newsletter, from-google-scholar, from-colleagueto-read, reading, read, important, revisit, needs-review, relevant, in-bibliography, reproduced, code-availablereview, benchmark, dataset, method, theory, negative-result, foundational, highly-cited, sotaAI tag suggestion (Ollama): the upload modal offers AI-suggested tags based on title and abstract. You can accept or skip.
Research area nodes in the graph. Topics are linked to papers via ABOUT relationships and to people via SPECIALIZES. Claude Haiku suggests 3β6 topics per paper during upload (title-case, e.g. Protein Structure Prediction). Topics can be renamed β all relationships move to the new name automatically.
Go to Graph for an interactive WebGL visualization (powered by react-force-graph).
Node types and colours:
| Colour | Node type |
|---|---|
| Purple | Paper |
| Blue | Person |
| Green | Topic |
| Orange | Tag |
| Pink | Project |
| Grey | Note |
Controls:
Graph modes available via API (GET /graph?mode=...):
full β everything (up to 500 nodes)papers β Papers, Persons, Topics onlypaper β single paper with all direct neighboursGo to Knowledge for graph-aware conversation across your entire library.
Ask questions about multiple papers at once. Use @mentions to bring specific papers into context:
@tag:deep-learning What are the main architectural differences across these papers?
@topic:Protein Folding How has the approach changed from RoseTTAFold to AlphaFold?
@project:my-phd-papers Summarise the key open problems.
@paper:Attention is All You Need What positional encoding does this use?
Without mentions, the 10 most recently added papers are used as context.
Features:
Go to Cypher for direct access to the Neo4j database.
Schema browser: live view of all node labels, relationship types, and property keys.
Query editor: write and run raw Cypher. Results shown in a table with mutation counters (nodes created/deleted, relationships created/deleted, properties set). Maximum 500 rows returned.
AI assist: describe what you want in plain English β Ollama generates the Cypher query.
Example queries:
-- Papers citing a specific paper
MATCH (a:Paper)-[:CITES]->(b:Paper {title: "..."})
RETURN a.title, a.year
-- Most connected authors
MATCH (p:Person)<-[:AUTHORED_BY]-(paper:Paper)
RETURN p.name, count(paper) AS papers ORDER BY papers DESC LIMIT 10
-- Papers without summaries
MATCH (p:Paper) WHERE p.summary IS NULL RETURN p.title, p.year
-- All papers on a topic
MATCH (p:Paper)-[:ABOUT]->(t:Topic {name: "Transformers"})
RETURN p.title, p.year ORDER BY p.year DESC
All settings are persisted to localStorage.
| Setting | Options | Default |
|---|---|---|
| Default view | grid / list | grid |
| Default sort | date desc/asc, year desc, title asc | date desc |
| Abstract preview | on / off | on |
| Papers per page | 20 / 50 / 100 / all | 20 |
| Setting | Default | Description |
|---|---|---|
| Source step | on | Ask how you found the paper (person, social media, conferenceβ¦) |
| Summary prompt step | on | Edit AI summary instructions before upload |
| Auto-save references | off | Skip reference review, save all automatically |
| Tags step | on | Review AI-suggested tags before saving |
| Default summary instructions | built-in | Pre-filled Claude prompt; reset button available |
Choose the caption method used when extracting figures from PDFs:
| Setting | Default |
|---|---|
| Default graph mode | full |
| Node size | 4 |
| Show node labels | on |
| Show edge labels | off |
GET /export/bibtex) β downloads a .bib file containing all papersRun from the Settings page or directly via API:
| Operation | Endpoint | Description |
|---|---|---|
| Backfill topics | POST /backfill/topics |
Run Claude Haiku topic suggestion on all papers without topics |
| Backfill summaries | POST /backfill/summary |
Generate AI summaries for papers that have raw_text but no summary |
| Backfill figures | POST /backfill/figures |
Extract figures from all papers that have a PDF but no figures yet |
Each returns {processed, skipped, errors}.
PaperManager ships with an MCP (Model Context Protocol) server at backend/mcp_server.py that lets Claude Desktop interact with your library directly.
Configure in ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"paperManager": {
"command": "/path/to/conda/envs/papermanager/bin/python",
"args": ["/path/to/PaperManager/backend/mcp_server.py"]
}
}
}
Available MCP tools:
| Tool | Description |
|---|---|
search_papers |
Search by keyword, tag, topic, project, person, year range, reading status, or bookmark |
get_paper_detail |
Full paper metadata including reading_status, rating, bookmarked, venue |
chat_with_paper |
Ask a question about a paperβs content |
add_note |
Write/update a paperβs markdown note |
get_note |
Read a paperβs note |
tag_paper_with |
Add a tag to a paper |
add_topic |
Link a research topic to a paper |
link_person_to_paper |
Link a person with a role |
add_paper_metadata |
Add a paper by metadata (no PDF) |
set_reading_status |
Set reading status: unread / reading / read |
rate_paper |
Rate a paper 1β5 stars |
bookmark_paper |
Bookmark or un-bookmark a paper |
get_random_paper |
Get a random paper (optionally filtered by reading_status) |
list_projects |
List all projects |
list_project_papers |
Papers in a project |
add_to_project |
Add a paper to a project |
list_tags |
All tags with counts |
list_topics |
All topics with counts |
list_people |
All people |
get_person_papers |
Papers associated with a person |
add_person |
Create a person node |
create_project |
Create a new project |
All endpoints served from http://localhost:8000.
| Method | Path | Description |
|---|---|---|
GET |
/papers |
List papers (?skip=&limit=) |
POST |
/papers |
Create paper (manual) |
GET |
/papers/random |
Get a random paper (?reading_status=) |
GET |
/papers/{id} |
Get paper detail |
PATCH |
/papers/{id} |
Update paper fields (incl. reading_status, rating, bookmarked, color, venue) |
DELETE |
/papers/{id} |
Delete paper + Drive file + figures |
POST |
/papers/parse |
Extract metadata from PDF (no save) |
GET |
/papers/check-duplicate |
Check by ?doi= or ?title= |
POST |
/papers/upload |
Upload PDF (multipart) |
POST |
/papers/from-url |
Ingest from URL/DOI/arXiv |
POST |
/papers/bulk-import |
Bulk import (SSE stream) |
POST |
/papers/{id}/chat |
Chat with paper |
GET |
/papers/{id}/pdf |
Stream PDF from Drive |
GET |
/papers/{id}/bibtex |
Download BibTeX entry for paper |
GET |
/papers/{id}/note |
Get markdown note |
PUT |
/papers/{id}/note |
Create/update note |
GET |
/papers/{id}/extract-references |
Extract references (no save) |
POST |
/papers/{id}/references |
Save reference list |
GET |
/papers/{id}/references |
List outgoing + incoming citations |
POST |
/papers/{id}/tags |
Add tag |
DELETE |
/papers/{id}/tags/{name} |
Remove tag |
POST |
/papers/{id}/topics |
Add topic |
DELETE |
/papers/{id}/topics/{name} |
Remove topic |
POST |
/papers/{id}/topics/suggest |
AI topic suggestion |
POST |
/papers/{id}/authors |
Link author (Person) |
DELETE |
/papers/{id}/authors/{person_id} |
Unlink author |
POST |
/papers/{id}/involves |
Link person with role |
DELETE |
/papers/{id}/involves/{person_id} |
Unlink person |
GET |
/papers/{id}/figures |
List figures |
GET |
/papers/{id}/figures/{fig_id}/image |
Get figure image (PNG) |
POST |
/papers/{id}/figures/extract |
Extract figures from PDF |
POST |
/papers/{id}/figures/{fig_id}/chat |
Vision chat about figure |
| Method | Path | Description |
|---|---|---|
GET |
/people |
List all people |
POST |
/people |
Create person |
POST |
/people/get-or-create |
Get or create by name |
GET |
/people/{id} |
Person detail + papers + specialties |
PATCH |
/people/{id} |
Update name / affiliation |
DELETE |
/people/{id} |
Delete person |
POST |
/people/{id}/specialties |
Add research specialty |
| Method | Path | Description |
|---|---|---|
GET |
/tags |
List tags with counts |
POST |
/tags |
Create tag |
DELETE |
/tags/{name} |
Delete tag |
POST |
/tags/suggest |
AI tag suggestion (Ollama) |
GET |
/tags/{name}/papers |
Papers with this tag |
| Method | Path | Description |
|---|---|---|
GET |
/topics |
List topics |
POST |
/topics |
Create topic |
DELETE |
/topics/{name} |
Delete topic |
PATCH |
/topics/{name} |
Rename topic |
GET |
/topics/{name}/papers |
Papers with this topic |
POST |
/topics/{a}/related/{b} |
Mark two topics as related |
| Method | Path | Description |
|---|---|---|
GET |
/projects |
List projects |
POST |
/projects |
Create project |
GET |
/projects/{id} |
Project + papers |
PATCH |
/projects/{id} |
Update project |
DELETE |
/projects/{id} |
Delete project |
POST |
/projects/{id}/papers |
Add paper to project |
DELETE |
/projects/{id}/papers/{paper_id} |
Remove paper |
POST |
/projects/{a}/related/{b} |
Link related projects |
| Method | Path | Description |
|---|---|---|
GET |
/search |
Full-text + filtered search (?q=&tag=&topic=&project_id=&person_id=&year_min=&year_max=&reading_status=&bookmarked=) |
GET |
/graph |
Graph data (?mode=full\|papers\|paper&id=) |
POST |
/graph/cypher |
Custom Cypher β graph nodes + links |
GET |
/stats |
Library statistics (counts, by year, top topics, recent, reading_status breakdown) |
| Method | Path | Description |
|---|---|---|
GET |
/cypher/schema |
Live Neo4j schema |
POST |
/cypher/run |
Run raw Cypher (max 500 rows) |
POST |
/cypher/assist |
Ollama generates Cypher from natural language |
DELETE |
/cypher/nodes/{id} |
Delete any node by id |
GET |
/export/bibtex |
Download BibTeX |
POST |
/backfill/topics |
Bulk AI topic assignment |
POST |
/backfill/summary |
Bulk AI summarisation |
POST |
/backfill/figures |
Bulk figure extraction |
POST |
/knowledge-chat/stream |
Multi-paper chat (SSE) |
GET |
/knowledge-chat/conversations |
List conversations |
GET |
/knowledge-chat/conversations/{id}/messages |
Conversation messages |
POST |
/knowledge-chat/conversations/{id}/compact |
Compact conversation history |
DELETE |
/knowledge-chat/conversations/{id} |
Delete conversation |
GET |
/health |
Health check |
| Label | Key properties |
|---|---|
Paper |
id (uuid), title, year, doi, abstract, summary, drive_file_id, raw_text, citation_count, metadata_source, venue, reading_status, rating, bookmarked, color, created_at, updated_at |
Person |
id, name, affiliation |
Topic |
id, name |
Tag |
id, name |
Project |
id, name, description, status, created_at |
Note |
id, content, created_at, updated_at |
Figure |
id, paper_id, figure_number, caption, drive_file_id, page_number |
| Relationship | Direction | Notes |
|---|---|---|
CITES |
Paper β Paper | Citation; target may be a stub |
AUTHORED_BY |
Paper β Person | Author |
INVOLVES |
Person β Paper | Non-author role stored on relationship |
TAGGED |
Paper β Tag | Idempotent MERGE |
ABOUT |
Paper β Topic | Research area |
CONTAINS |
Project β Paper | Project membership |
ABOUT |
Note β Paper | Note ownership |
MENTIONS |
Note β Person|Topic | @mention / #topic in note text |
RELATED_TO |
Topic β Topic | Bidirectional |
SPECIALIZES |
Person β Topic | Research specialty |
paper_search on Paper(title, abstract, summary)note_search on Note(content)metadata_source values| Value | Meaning |
|---|---|
semantic_scholar |
Fetched from Semantic Scholar API |
crossref |
Fetched from CrossRef API |
arxiv |
Fetched from arXiv Atom API |
pubmed |
Fetched from PubMed eUtils |
biorxiv / medrxiv |
Fetched from bioRxiv/medRxiv API |
llm |
Extracted by Ollama from PDF text |
heuristic |
Guessed from first lines of PDF |
bulk |
Added via bulk import |
| Model | Provider | Used for |
|---|---|---|
claude-opus-4-6 |
Anthropic | Paper summarisation, paper chat, knowledge chat |
claude-haiku-4-5-20251001 |
Anthropic | Abstract extraction, reference extraction, topic suggestion, conversation compaction |
llama3.2:3b |
Ollama (local) | Metadata extraction (layer 2), tag suggestion, arXiv query generation, figure captions, affiliation extraction, Cypher assist |
| Claude Vision | Anthropic | Figure chat, figure captioning (claude-vision mode) |
All Anthropic calls can be routed through an enterprise Foundry gateway by setting ANTHROPIC_WORK_API_KEY and ANTHROPIC_WORK_BASE_URL.
PDF bytes
ββ 1a. Find DOI/arXiv in text β Semantic Scholar API
β ββ fail β CrossRef API
ββ 1b. S2 title search (if title found, no DOI)
ββ 2. Ollama LLM on first 3 000 chars
ββ 3. Heuristics (first line = title, year regex)
ββ Abstract fallback: ABSTRACT_RE regex β Claude Haiku
Paper with raw_text
ββ Strategy A: Semantic Scholar /references API (needs DOI)
ββ Strategy B: Regex on REFERENCES section of raw_text
ββ Strategy C: Claude Haiku on last 30% of document
(when A+B give < 3 results)
User question
ββ Parse @mentions (@tag:, @topic:, @project:, @paper:)
ββ Mentions found β Cypher queries to fetch matching papers
ββ No mentions β 10 most recently added papers
ββ Assemble context (truncated to token budget per paper)
ββ Stream Claude Opus response via SSE
All prompts live in prompts/ and are loaded fresh on each call β edit without restarting the backend:
| File | Purpose |
|---|---|
summary.txt |
Paper summarisation β problem, method, findings, relevance |
topics.txt |
Research topic suggestion (3β6 title-case topics) |
chat_system.txt |
Single-paper Q&A system prompt |
knowledge_chat_system.txt |
Multi-paper synthesis system prompt |
figure_captions.txt |
Figure caption generation |
author_affiliations.txt |
Affiliation extraction from paper text |