PaperManager

Data Model

PaperManager uses Neo4j Aura (a cloud-hosted graph database) to store all entities and their relationships.

Node Labels

Paper

The central entity. Every ingested paper is a Paper node.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`title`	string	Paper title
`year`	integer	Publication year
`doi`	string	DOI if available
`abstract`	string	Original abstract
`summary`	string	AI-generated summary (Claude Opus)
`raw_text`	string	Full extracted PDF text
`drive_file_id`	string	Google Drive file ID for the PDF
`citation_count`	integer	From Semantic Scholar
`metadata_source`	string	How metadata was obtained (see below)
`created_at`	datetime	When added to the system
`updated_at`	datetime	Last modification time

Person

Authors, collaborators, colleagues.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`name`	string	Full name
`affiliation`	string	Institution or company
`email`	string	Optional contact

Topic

Formal research areas.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`name`	string	Topic name, title-case (e.g. `"Protein Structure Prediction"`)
`description`	string	Optional longer description

Tag

Free-form personal labels.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`name`	string	Label (e.g. `"to-read"`, `"from-karin"`)

Venue

Journal or conference.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`name`	string	Venue name
`type`	string	`"journal"` / `"conference"` / `"preprint"`

Note

Markdown note attached to a paper.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`content`	string	Markdown text
`created_at`	datetime	Creation time
`updated_at`	datetime	Last edit time

Project

Named collection of papers.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`name`	string	Project name
`description`	string	What this project is about
`status`	string	`"active"` / `"paused"` / `"done"`
`created_at`	datetime	Creation time

Figure

Extracted figure from a PDF.

Property	Type	Description
`id`	string (UUID)	Internal identifier
`paper_id`	string	ID of parent paper
`figure_number`	integer	Sequential figure number
`caption`	string	Auto-generated caption
`drive_file_id`	string	Google Drive file ID for the image
`page_number`	integer	PDF page the figure was on

Relationships

graph TD
    Paper -->|AUTHORED_BY| Person
    Paper -->|PUBLISHED_IN| Venue
    Paper -->|CITES| Paper2["Paper (stub)"]
    Paper -->|ABOUT| Topic
    Paper -->|TAGGED| Tag
    Paper -->|IN_PROJECT| Project
    Paper -->|HAS_NOTE| Note
    Paper -->|INVOLVES| Person2["Person"]

    Note -->|ABOUT| Paper3["Paper"]
    Note -->|MENTIONS| Person3["Person"]
    Note -->|MENTIONS| Topic2["Topic"]

    Person -->|SPECIALIZES| Topic3["Topic"]
    Project -->|RELATED_TO| Project2["Project"]
    Topic -->|RELATED_TO| Topic4["Topic"]

Bibliographic

Relationship	Direction	Description
`AUTHORED_BY`	Paper → Person	Author of the paper
`PUBLISHED_IN`	Paper → Venue	Journal/conference
`CITES`	Paper → Paper	Citation; target may be a stub node

Classification

Relationship	Direction	Description
`ABOUT`	Paper → Topic	Paper covers this research area
`TAGGED`	Paper → Tag	Free-form label applied to paper

Workflow

Relationship	Direction	Properties	Description
`IN_PROJECT`	Paper → Project	—	Paper belongs to project
`HAS_NOTE`	Paper → Note	—	Paper has a note
`INVOLVES`	Paper → Person	`role: string`	Non-author workflow relationship

INVOLVES roles

Role	Meaning
`shared_by`	This person shared the paper with you
`working_on`	This person is working on this topic/paper
`collaborating`	You are collaborating with this person
`feedback_needed`	You need feedback from this person
`supervisor`	Supervisor for this work

Notes

Relationship	Direction	Description
`ABOUT`	Note → Paper	Note belongs to paper
`MENTIONS`	Note → Person	`@PersonName` in note text
`MENTIONS`	Note → Topic	`#TopicName` in note text

People & Projects

Relationship	Direction	Description
`SPECIALIZES`	Person → Topic	Research specialty
`RELATED_TO`	Project ↔ Project	Bidirectional project link
`RELATED_TO`	Topic ↔ Topic	Bidirectional topic link

Fulltext Indexes

Index name	Nodes	Properties	Used by
`paper_search`	Paper	`title`, `abstract`, `summary`	`/search` endpoint
`note_search`	Note	`content`	Note search

metadata_source Values

Records which metadata extraction layer was used for each paper:

Value	Description
`semantic_scholar`	Fetched from Semantic Scholar API
`crossref`	Fetched from CrossRef API
`arxiv`	Fetched from arXiv Atom API
`pubmed`	Fetched from PubMed eUtils
`biorxiv` / `medrxiv`	Fetched from bioRxiv/medRxiv API
`llm`	Extracted by Ollama LLM from PDF text
`heuristic`	Guessed from first lines of PDF
`bulk`	Added via bulk import

Example Graph Fragment

(Paper "Attention Is All You Need")
  -[:AUTHORED_BY]→ (Person "Vaswani")
  -[:PUBLISHED_IN]→ (Venue "NeurIPS")
  -[:ABOUT]→ (Topic "Transformers")
  -[:ABOUT]→ (Topic "Natural Language Processing")
  -[:TAGGED]→ (Tag "arxiv")
  -[:TAGGED]→ (Tag "foundational")
  -[:IN_PROJECT]→ (Project "PhD thesis")
  -[:INVOLVES {role: "feedback_needed"}]→ (Person "Nele")
  -[:INVOLVES {role: "shared_by"}]→ (Person "Karin")
  -[:HAS_NOTE]→ (Note "Key insight: attention mechanism replaces RNNs…")
  -[:CITES]→ (Paper "Neural Machine Translation by…")

(Person "Jan")
  -[:SPECIALIZES]→ (Topic "Transformers")

(Project "PhD thesis")-[:RELATED_TO]→(Project "Collaboration TU Berlin")